Atopic dermatitis (AD; eczema) is characterized by a widespread abnormality in cutaneous barrier function and propensity to inflammation. Filaggrin is a multifunctional protein and plays a key role in skin barrier formation. Loss-of-function mutations in the gene encoding filaggrin (FLG) are a highly significant risk factor for atopic disease, but the molecular mechanisms leading to dermatitis remain unclear.
We sought to interrogate tissue-specific variations in the expressed genome in the skin of children with AD and to investigate underlying pathomechanisms in atopic skin.
We applied single-molecule direct RNA sequencing to analyze the whole transcriptome using minimal tissue samples. Uninvolved skin biopsy specimens from 26 pediatric patients with AD were compared with site-matched samples from 10 nonatopic teenage control subjects. Cases and control subjects were screened for FLG genotype to stratify the data set.
Two thousand four hundred thirty differentially expressed genes (false discovery rate, P < .05) were identified, of which 211 were significantly upregulated and 490 downregulated by greater than 2-fold. Gene ontology terms for “extracellular space” and “defense response” were enriched, whereas “lipid metabolic processes” were downregulated. The subset of FLG wild-type cases showed dysregulation of genes involved with lipid metabolism, whereas filaggrin haploinsufficiency affected global gene expression and was characterized by a type 1 interferon–mediated stress response.
These analyses demonstrate the importance of extracellular space and lipid metabolism in atopic skin pathology independent of FLG genotype, whereas an aberrant defense response is seen in subjects with FLG mutations. Genotype stratification of the large data set has facilitated functional interpretation and might guide future therapy development.
Atopic dermatitis; direct RNA sequencing; eczema; filaggrin; gene expression; single molecule; skin; tissue; transcriptome; AD, Atopic dermatitis; CILP, Cartilage intermediate layer protein gene; DRS, Direct RNA sequencing; eQTL, Expression quantitative trait loci; FDR, False discovery rate; FLG, Filaggrin gene; GO, Gene ontology; STAT, Signal transducer and activator of transcription
It has recently been shown that RNA 3′ end formation plays a more widespread role in controlling gene expression than previously thought. In order to examine the impact of regulated 3′ end formation genome-wide we applied direct RNA sequencing to A. thaliana. Here we show the authentic transcriptome in unprecedented detail and how 3′ end formation impacts genome organization. We reveal extreme heterogeneity in RNA 3′ ends, discover previously unrecognized non-coding RNAs and propose widespread re-annotation of the genome. We explain the origin of most poly(A)+ antisense RNAs and identify cis-elements that control 3′ end formation in different registers. These findings are essential to understand what the genome actually encodes, how it is organized and the impact of regulated 3′ end formation on these processes.
Here, we exploit the spatial separation of temporal events of neural differentiation in the elongating chick body axis to provide the first analysis of transcriptome change in progressively more differentiated neural cell populations in vivo. Microarray data, validated against direct RNA sequencing, identified: (1) a gene cohort characteristic of the multi-potent stem zone epiblast, which contains neuro-mesodermal progenitors that progressively generate the spinal cord; (2) a major transcriptome re-organisation as cells then adopt a neural fate; and (3) increasing diversity as neural patterning and neuron production begin. Focussing on the transition from multi-potent to neural state cells, we capture changes in major signalling pathways, uncover novel Wnt and Notch signalling dynamics, and implicate new pathways (mevalonate pathway/steroid biogenesis and TGFβ). This analysis further predicts changes in cellular processes, cell cycle, RNA-processing and protein turnover as cells acquire neural fate. We show that these changes are conserved across species and provide biological evidence for reduced proteasome efficiency and a novel lengthening of S phase. This latter step may provide time for epigenetic events to mediate large-scale transcriptome re-organisation; consistent with this, we uncover simultaneous downregulation of major chromatin modifiers as the neural programme is established. We further demonstrate that transcription of one such gene, HDAC1, is dependent on FGF signalling, making a novel link between signals that control neural differentiation and transcription of a core regulator of chromatin organisation. Our work implicates new signalling pathways and dynamics, cellular processes and epigenetic modifiers in neural differentiation in vivo, identifying multiple new potential cellular and molecular mechanisms that direct differentiation.
Neural differentiation; Transcriptome; Cell cycle; FGF signalling; Chromatin; Chick embryo
The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct and complete annotation in addition to the underlying genomic sequence is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3′ untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3′ polyadenylation sites to within +/− 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3′ UTR re-annotation (including extension of one 3′ UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental data.
Palmoplantar keratodermas (PPKs) are a group of disorders that are diagnostically and therapeutically problematic in dermatogenetics1-3. Punctate PPKs are characterized by circumscribed hyperkeratotic lesions on palms and soles with considerable heterogeneity. In 18 families with autosomal dominant punctate PPK (OMIM #148600), we report heterozygous loss-of-function mutations in AAGAB, encoding alpha- and gamma-adaptin binding protein p34, at a previously linked locus on 15q22. p34, a cytosolic protein with a Rab-like GTPase domain, was shown to bind both clathrin adaptor protein complexes, indicative of a role in membrane traffic. Ultrastucturally, lesional epidermis showed abnormalities in intracellular vesicle biology. Immunohistochemistry showed hyperproliferation within the punctate lesions. Knockdown of p34 in keratinocytes led to increased cell division, which was linked to greatly increased epidermal growth factor receptor (EGFR) protein expression and tyrosine phosphorylation. We hypothesize that p34 deficiency may impair endocytic recycling of growth factor receptors such as EGFR, leading to increased signaling and proliferation.
Atopic dermatitis (AD) is a major inflammatory condition of the skin caused by inherited skin barrier deficiency, with mutations in the filaggrin gene predisposing to development of AD. Support for barrier deficiency initiating AD came from flaky tail mice, which have a frameshift mutation in Flg and also carry an unknown gene, matted, causing a matted hair phenotype.
We sought to identify the matted mutant gene in mice and further define whether mutations in the human gene were associated with AD.
A mouse genetics approach was used to separate the matted and Flg mutations to produce congenic single-mutant strains for genetic and immunologic analysis. Next-generation sequencing was used to identify the matted gene. Five independently recruited AD case collections were analyzed to define associations between single nucleotide polymorphisms (SNPs) in the human gene and AD.
The matted phenotype in flaky tail mice is due to a mutation in the Tmem79/Matt gene, with no expression of the encoded protein mattrin in the skin of mutant mice. Mattft mice spontaneously have dermatitis and atopy caused by a defective skin barrier, with mutant mice having systemic sensitization after cutaneous challenge with house dust mite allergens. Meta-analysis of 4,245 AD cases and 10,558 population-matched control subjects showed that a missense SNP, rs6694514, in the human MATT gene has a small but significant association with AD.
In mice mutations in Matt cause a defective skin barrier and spontaneous dermatitis and atopy. A common SNP in MATT has an association with AD in human subjects.
Allergy; association; atopic dermatitis; atopy; eczema; filaggrin; flaky tail; Matt; mattrin; mouse; mutation; Tmem79; AD, Atopic dermatitis; DM, Double mutant; FLG, Filaggrin; HDM, House dust mite; hpf, High-power field; MAPEG, Membrane-associated proteins in eicosanoid and glutathione metabolism; OR, Odds ratio; SNP, Single nucleotide polymorphism; TEWL, Transepidermal water loss; WT, Wild-type
Alternative cleavage and polyadenylation influence the coding and regulatory potential of mRNAs and where transcription termination occurs. Although widespread, few regulators of this process are known. The Arabidopsis thaliana protein FPA is a rare example of a trans-acting regulator of poly(A) site choice. Analysing fpa mutants therefore provides an opportunity to reveal generic consequences of disrupting this process. We used direct RNA sequencing to quantify shifts in RNA 3′ formation in fpa mutants. Here we show that specific chimeric RNAs formed between the exons of otherwise separate genes are a striking consequence of loss of FPA function. We define intergenic read-through transcripts resulting from defective RNA 3′ end formation in fpa mutants and detail cryptic splicing and antisense transcription associated with these read-through RNAs. We identify alternative polyadenylation within introns that is sensitive to FPA and show FPA-dependent shifts in IBM1 poly(A) site selection that differ from those recently defined in mutants defective in intragenic heterochromatin and DNA methylation. Finally, we show that defective termination at specific loci in fpa mutants is shared with dicer-like 1 (dcl1) or dcl4 mutants, leading us to develop alternative explanations for some silencing roles of these proteins. We relate our findings to the impact that altered patterns of 3′ end formation can have on gene and genome organisation.
The ends of almost all eukaryotic protein-coding genes are defined by a poly(A) signal. When genes are transcribed into mRNA by RNA polymerase II, the poly(A) signal guides cleavage of the precursor mRNA at a particular site; this is accompanied by the addition of a poly(A) tail to the mRNA and termination of transcription. Many genes have more than one poly(A) signal and the regulated choice of which to select can effectively determine what the gene will code for, how the gene can be regulated and where transcription termination occurs. We discovered a rare example of a regulator of poly(A) site choice, called FPA, while studying flower development in the model plant Arabidopsis thaliana. Studying FPA therefore provides an opportunity to understand not only its roles in plant biology but also the generic consequences of disrupting alternative polyadenylation. In this study, we use a technique called direct RNA sequencing to quantify genome-wide shifts in poly(A) site selection in plants that lack FPA function. One of our most striking findings is that in the absence of FPA we detect chimeric RNAs formed between two otherwise separate and well-characterised genes.
RNA-binding proteins (RBPs) play an important role in plant host-microbe interactions. In this study, we show that the plant RBP known as FPA, which regulates 3′-end mRNA polyadenylation, negatively regulates basal resistance to bacterial pathogen Pseudomonas syringae in Arabidopsis. A custom microarray analysis reveals that flg22, a peptide derived from bacterial flagellins, induces expression of alternatively polyadenylated isoforms of mRNA encoding the defence-related transcriptional repressor ETHYLENE RESPONSE FACTOR 4 (ERF4), which is regulated by FPA. Flg22 induces expression of a novel isoform of ERF4 that lacks the ERF-associated amphiphilic repression (EAR) motif, while FPA inhibits this induction. The EAR-lacking isoform of ERF4 acts as a transcriptional activator in vivo and suppresses the flg22-dependent reactive oxygen species burst. We propose that FPA controls use of proximal polyadenylation sites of ERF4, which quantitatively limit the defence response output.
CudA, a nuclear protein required for Dictyostelium prespore-specific gene expression, binds in vivo to the promoter of the cotC prespore gene. A 14 nucleotide region of the cotC promoter binds CudA in vitro and ECudA, an Entamoeba CudA homologue, also binds to this site. The CudA and ECudA DNA-binding sites contain a dyad and, consistent with a symmetrical binding site, CudA forms a homodimer in the yeast two-hybrid system. Mutation of CudA binding sites within the cotC promoter reduces expression from cotC in prespore cells. The CudA and ECudA proteins share a 120 amino acid core of homology, and clustered point mutations introduced into two highly conserved motifs within the ECudA core region decrease its specific DNA binding in vitro. This region, the presumptive DNA-binding domain, is similar in sequence to domains in two Arabidopsis proteins and one Oryza protein. Significantly, these are the only proteins in the two plant species that contain an SH2 domain. Such a structure, with a DNA-binding domain located upstream of an SH2 domain, suggests that the plant proteins are orthologous to metazoan STATs. Consistent with this notion, the DNA sequence of the CudA half site, GAA, is identical to metazoan STAT half sites, although the relative positions of the two halves of the dyad are reversed. These results define a hitherto unrecognised class of transcription factors and suggest a model for the evolution of STATs and their DNA-binding sites.
Dictyostelium; CudA; Amoeboza; Plant STATs; SH2 domains
Small nucleolar RNAs (snoRNAs) function mainly as guides for the post-transcriptional modification of ribosomal RNAs (rRNAs). In recent years, several studies have identified a wealth of small fragments (<35 nt) derived from snoRNAs (termed sdRNAs) that stably accumulate in the cell, some of which may regulate splicing or translation. A comparison of human small RNA deep sequencing data sets reveals that box C/D sdRNA accumulation patterns are conserved across multiple cell types although the ratio of the abundance of different sdRNAs from a given snoRNA varies. sdRNA profiles of many snoRNAs are specific and resemble the cleavage profiles of miRNAs. Many do not show characteristics of general RNA degradation, as seen for the accumulation of small fragments derived from snRNA or rRNA. While 53% of the sdRNAs contain an snoRNA box C motif and boxes D and D′ are also common in sdRNAs (54%), relatively few (12%) contain a full snoRNA guide region. One box C/D snoRNA, HBII-180C, was analysed in greater detail, revealing the presence of C′ box-containing sdRNAs complementary to several pre-messenger RNAs (pre-mRNAs) including FGFR3. Functional analyses demonstrated that this region of HBII-180C can influence the alternative splicing of FGFR3 pre-mRNA, supporting a role for some snoRNAs in the regulation of splicing.
► Identifies key considerations in target selection and optimisation. ► Approaches to assign useful protein features and structure/function relationships. ► Comparison of latest crystallisation propensity predictors on nonredundant data. ► Discusses single point of reference target selection/optimisation resources. ► Guidance on using the SSPF Target Optimisation Utility (TarO).
Selection of protein targets for study is central to structural biology and may be influenced by numerous factors. A key aim is to maximise returns for effort invested by identifying proteins with the balance of biophysical properties that are conducive to success at all stages (e.g. solubility, crystallisation) in the route towards a high resolution structural model. Selected targets can be optimised through construct design (e.g. to minimise protein disorder), switching to a homologous protein, and selection of experimental methodology (e.g. choice of expression system) to prime for efficient progress through the structural proteomics pipeline.
Here we discuss computational techniques in target selection and optimisation, with more detailed focus on tools developed within the Scottish Structural Proteomics Facility (SSPF); namely XANNpred, ParCrys, OB-Score (target selection) and TarO (target optimisation). TarO runs a large number of algorithms, searching for homologues and annotating the pool of possible alternative targets. This pool of putative homologues is presented in a ranked, tabulated format and results are also visualised as an automatically generated and annotated multiple sequence alignment. The target selection algorithms each predict the propensity of a selected protein target to progress through the experimental stages leading to diffracting crystals. This single predictor approach has advantages for target selection, when compared with an approach using two or more predictors that each predict for success at a single experimental stage. The tools described here helped SSPF achieve a high (21%) success rate in progressing cloned targets to diffraction-quality crystals.
MSA, Multiple Sequence Alignment; PTM, Post Translational Modification; SSPF, Scottish Structural Proteomics Facility; MCC, Matthew’s correlation coefficient; AROC, Area Under the Receiver Operator Characteristic curve; Target selection; Crystallisation; Structural genomics; Structural biology; Bioinformatics; Construct design
Nucleolar localization sequences (NoLSs) are short targeting sequences responsible for the localization of proteins to the nucleolus. Given the large number of proteins experimentally detected in the nucleolus and the central role of this subnuclear compartment in the cell, NoLSs are likely to be important regulatory elements controlling cellular traffic. Although many proteins have been reported to contain NoLSs, the systematic characterization of this group of targeting motifs has only recently been carried out.
Here, we describe NoD, a web server and a command line program that predicts the presence of NoLSs in proteins. Using the web server, users can submit protein sequences through the NoD input form and are provided with a graphical output of the NoLS score as a function of protein position. While the web server is most convenient for making prediction for just a few proteins, the command line version of NoD can return predictions for complete proteomes. NoD is based on our recently described human-trained artificial neural network predictor. Through stringent independent testing of the predictor using available experimentally validated NoLS-containing eukaryotic and viral proteins, the NoD sensitivity and positive predictive value were estimated to be 71% and 79% respectively.
NoD is the first tool to provide predictions of nucleolar localization sequences in diverse eukaryotes and viruses. NoD can be run interactively online at http://www.compbio.dundee.ac.uk/nod or downloaded to use locally.
nucleolus; protein targeting signal; protein localization; NoD web server
Summary: JABAWS is a web services framework that simplifies the deployment of web services for bioinformatics. JABAWS:MSA provides services for five multiple sequence alignment (MSA) methods (Probcons, T-coffee, Muscle, Mafft and ClustalW), and is the system employed by the Jalview multiple sequence analysis workbench since version 2.6. A fully functional, easy to set up server is provided as a Virtual Appliance (VA), which can be run on most operating systems that support a virtualization environment such as VMware or Oracle VirtualBox. JABAWS is also distributed as a Web Application aRchive (WAR) and can be configured to run on a single computer and/or a cluster managed by Grid Engine, LSF or other queuing systems that support DRMAA. JABAWS:MSA provides clients full access to each application's parameters, allows administrators to specify named parameter preset combinations and execution limits for each application through simple configuration files. The JABAWS command-line client allows integration of JABAWS services into conventional scripts.
Availability and Implementation: JABAWS is made freely available under the Apache 2 license and can be obtained from: http://www.compbio.dundee.ac.uk/jabaws.
Staphylococcus aureus is a major human pathogen and strains resistant to existing treatments continue to emerge. Development of novel treatments is therefore important. Antimicrobial peptides represent a source of potential novel antibiotics to combat resistant bacteria such as Methicillin-Resistant Staphylococcus aureus (MRSA). A promising antimicrobial peptide is ranalexin, which has potent activity against Gram-positive bacteria, and particularly S. aureus. Understanding mode of action is a key component of drug discovery and network biology approaches enable a global, integrated view of microbial physiology, including mechanisms of antibiotic killing. We developed a systems-wide functional association network approach to integrate proteome and transcriptome profiles, enabling study of drug resistance and mode of action.
The functional association network was constructed by Bayesian logistic regression, providing a framework for identification of antimicrobial peptide (ranalexin) response modules from S. aureus MRSA-252 transcriptome and proteome profiling. These signatures of ranalexin treatment revealed multiple killing mechanisms, including cell wall activity. Cell wall effects were supported by gene disruption and osmotic fragility experiments. Furthermore, twenty-two novel virulence factors were inferred, while the VraRS two-component system and PhoU-mediated persister formation were implicated in MRSA tolerance to cationic antimicrobial peptides.
This work demonstrates a powerful integrative approach to study drug resistance and mode of action. Our findings are informative to the development of novel therapeutic strategies against Staphylococcus aureus and particularly MRSA.
The SWI/SNF complex acts to constrain distribution of the centromeric histone variant Cse4
The SWI/SNF complex has an important role in regulating chromatin structure during transcriptional activation and DNA repair. Here, the SWI/SNF complex is also involved in the organisation of centromeric chromatin and prevention of the ectopic deposition of centromeric histone variants.
In order to gain insight into the function of the Saccharomyces cerevisiae SWI/SNF complex, we have identified DNA sequences to which it is bound genomewide. One surprising observation is that the complex is enriched at the centromeres of each chromosome. Deletion of the gene encoding the Snf2 subunit of the complex was found to cause partial redistribution of the centromeric histone variant Cse4 to sites on chromosome arms. Cultures of snf2Δ yeast were found to progress through mitosis slowly. This was dependent on the mitotic checkpoint protein Mad2. In the absence of Mad2, defects in chromosome segregation were observed. In the absence of Snf2, chromatin organisation at centromeres is less distinct. In particular, hypersensitive sites flanking the Cse4 containing nucleosomes are less pronounced. Furthermore, SWI/SNF complex was found to be especially effective in the dissociation of Cse4 containing chromatin in vitro. This suggests a role for Snf2 in the maintenance of point centromeres involving the removal of Cse4 from ectopic sites.
centromere; chromatin; Cse4; nucleosome; SWI/SNF
Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data.
The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots.
PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.
Although primarily known as the site of ribosome subunit production, the nucleolus is involved in numerous and diverse cellular processes. Recent large-scale proteomics projects have identified thousands of human proteins that associate with the nucleolus. However, in most cases, we know neither the fraction of each protein pool that is nucleolus-associated nor whether their association is permanent or conditional.
To describe the dynamic localisation of proteins in the nucleolus, we investigated the extent of nucleolar association of proteins by first collating an extensively curated literature-derived dataset. This dataset then served to train a probabilistic predictor which integrates gene and protein characteristics. Unlike most previous experimental and computational studies of the nucleolar proteome that produce large static lists of nucleolar proteins regardless of their extent of nucleolar association, our predictor models the fluidity of the nucleolus by considering different classes of nucleolar-associated proteins. The new method predicts all human proteins as either nucleolar-enriched, nucleolar-nucleoplasmic, nucleolar-cytoplasmic or non-nucleolar. Leave-one-out cross validation tests reveal sensitivity values for these four classes ranging from 0.72 to 0.90 and positive predictive values ranging from 0.63 to 0.94. The overall accuracy of the classifier was measured to be 0.85 on an independent literature-based test set and 0.74 using a large independent quantitative proteomics dataset. While the three nucleolar-association groups display vastly different Gene Ontology biological process signatures and evolutionary characteristics, they collectively represent the most well characterised nucleolar functions.
Our proteome-wide classification of nucleolar association provides a novel representation of the dynamic content of the nucleolus. This model of nucleolar localisation thus increases the coverage while providing accurate and specific annotations of the nucleolar proteome. It will be instrumental in better understanding the central role of the nucleolus in the cell and its interaction with other subcellular compartments.
There are two main classes of small nucleolar RNAs (snoRNAs): the box C/D snoRNAs and the box H/ACA snoRNAs that function as guide RNAs to direct sequence-specific modification of rRNA precursors and other nucleolar RNA targets. A previous computational and biochemical analysis revealed a possible evolutionary relationship between miRNA precursors and some box H/ACA snoRNAs. Here, we investigate a similar evolutionary relationship between a subset of miRNA precursors and box C/D snoRNAs. Computational analyses identified 84 intronic miRNAs that are encoded within either box C/D snoRNAs, or in precursors showing similarity to box C/D snoRNAs. Predictions of the folded structures of these box C/D snoRNA-like miRNA precursors resemble the structures of known box C/D snoRNAs, with the boxes C and D often in close proximity in the folded molecule. All five box C/D snoRNA-like miRNA precursors tested (miR-27b, miR-16-1, mir-28, miR-31 and let-7g) bind to fibrillarin, a specific protein component of functional box C/D snoRNP complexes. The data suggest that a subset of small regulatory RNAs may have evolved from box C/D snoRNAs.
Although the nucleolar localization of proteins is often believed to be mediated primarily by non-specific retention to core nucleolar components, many examples of short nucleolar targeting sequences have been reported in recent years. In this article, 46 human nucleolar localization sequences (NoLSs) were collated from the literature and subjected to statistical analysis. Of the residues in these NoLSs 48% are basic, whereas 99% of the residues are predicted to be solvent-accessible with 42% in α-helix and 57% in coil. The sequence and predicted protein secondary structure of the 46 NoLSs were used to train an artificial neural network to identify NoLSs. At a true positive rate of 54%, the predictor’s overall false positive rate (FPR) is estimated to be 1.52%, which can be broken down to FPRs of 0.26% for randomly chosen cytoplasmic sequences, 0.80% for randomly chosen nucleoplasmic sequences and 12% for nuclear localization signals. The predictor was used to predict NoLSs in the complete human proteome and 10 of the highest scoring previously unknown NoLSs were experimentally confirmed. NoLSs are a prevalent type of targeting motif that is distinct from nuclear localization signals and that can be computationally predicted.
In this manuscript we describe the characterisation of human snoRNAs that co-purify with nucleoli and develop a new vector based system for targeted gene knock down. We demonstrate that this novel vector system (snoMEN) can deliver effective, sequence-specific knock down of endogenous cellular genes as well as GFP and GFP-fusion proteins.
Human small nucleolar RNAs (snoRNAs) that copurify with nucleoli isolated from HeLa cells have been characterized. Novel fibrillarin-associated snoRNAs were detected that allowed the creation of a new vector system for the targeted knockdown of one or more genes in mammalian cells. The snoMEN (snoRNA modulator of gene expressioN) vector technology is based on snoRNA HBII-180C, which contains an internal sequence that can be manipulated to make it complementary to RNA targets. Gene-specific knockdowns are demonstrated for endogenous cellular proteins and for G/YFP-fusion proteins. Multiplex snoMEN vectors coexpress multiple snoRNAs in one transcript, targeted either to different genes or to different sites in the same gene. Protein replacement snoMEN vectors can express a single transcript combining cDNA for a tagged protein with introns containing cognate snoRNAs targeted to knockdown the endogenous cellular protein. We foresee applications for snoMEN vectors in basic gene expression research, target validation, and gene therapy.
The Scottish Structural Proteomics Facility was funded to develop a laboratory scale approach to high throughput structure determination. The effort was successful in that over 40 structures were determined. These structures and the methods harnessed to obtain them are reported here. This report reflects on the value of automation but also on the continued requirement for a high degree of scientific and technical expertise. The efficiency of the process poses challenges to the current paradigm of structural analysis and publication. In the 5 year period we published ten peer-reviewed papers reporting structural data arising from the pipeline. Nevertheless, the number of structures solved exceeded our ability to analyse and publish each new finding. By reporting the experimental details and depositing the structures we hope to maximize the impact of the project by allowing others to follow up the relevant biology.
Electronic supplementary material
The online version of this article (doi:10.1007/s10969-010-9090-y) contains supplementary material, which is available to authorized users.
High-throughput; Protein crystallography; Structural proteomics; SSPF
MicroRNAs (miRNAs) and small nucleolar RNAs (snoRNAs) are two classes of small non-coding regulatory RNAs, which have been much investigated in recent years. While their respective functions in the cell are distinct, they share interesting genomic similarities, and recent sequencing projects have identified processed forms of snoRNAs that resemble miRNAs. Here, we investigate a possible evolutionary relationship between miRNAs and box H/ACA snoRNAs. A comparison of the genomic locations of reported miRNAs and snoRNAs reveals an overlap of specific members of these classes. To test the hypothesis that some miRNAs might have evolved from snoRNA encoding genomic regions, reported miRNA-encoding regions were scanned for the presence of box H/ACA snoRNA features. Twenty miRNA precursors show significant similarity to H/ACA snoRNAs as predicted by snoGPS. These include molecules predicted to target known ribosomal RNA pseudouridylation sites in vivo for which no guide snoRNA has yet been reported. The predicted folded structures of these twenty H/ACA snoRNA-like miRNA precursors reveal molecules which resemble the structures of known box H/ACA snoRNAs. The genomic regions surrounding these predicted snoRNA-like miRNAs are often similar to regions around snoRNA retroposons, including the presence of transposable elements, target site duplications and poly (A) tails. We further show that the precursors of five H/ACA snoRNA-like miRNAs (miR-151, miR-605, mir-664, miR-215 and miR-140) bind to dyskerin, a specific protein component of functional box H/ACA small nucleolar ribonucleoprotein complexes suggesting that these molecules have retained some H/ACA snoRNA functionality. The detection of small RNA molecules that share features of miRNAs and snoRNAs suggest that these classes of RNA may have an evolutionary relationship.
The major functions known for RNA were long believed to be either messenger RNAs, which function as intermediates between genes and proteins, or ribosomal RNAs and transfer RNAs which carry out the translation process. In recent years, however, newly discovered classes of small RNAs have been shown to play important cellular roles. These include microRNAs (miRNAs), which can regulate the production of specific proteins, and small nucleolar RNAs (snoRNAs), which recognise and chemically modify specific sequences in ribosomal RNA. Although miRNAs and snoRNAs are currently believed to be generated by different cellular pathways and to function in different cellular compartments, members of these two types of small RNAs display numerous genomic similarities, and a small number of snoRNAs have been shown to encode miRNAs in several organisms. Here we systematically investigate a possible evolutionary relationship between snoRNAs and miRNAs. Using computational analysis, we identify twenty genomic regions encoding miRNAs with highly significant similarity to snoRNAs, both on the level of their surrounding genomic context as well as their predicted folded structure. A subset of these miRNAs display functional snoRNA characteristics, strengthening the possibility that these miRNA molecules might have evolved from snoRNAs.
Asparagine-linked glycosylation is catalysed by oligosaccharyltransferase (OTase). In Trypanosoma brucei OTase activity is catalysed by single-subunit enzymes encoded by three paralogous genes of which TbSTT3B and TbSTT3C can complement a yeast Δstt3 mutant. The two enzymes have overlapping but distinct peptide acceptor specificities, with TbSTT3C displaying an enhanced ability to glycosylate sites flanked by acidic residues. TbSTT3A and TbSTT3B, but not TbSTT3C, are transcribed in the bloodstream and procyclic life cycle stages of T. brucei. Selective knockdown and analysis of parasite protein N-glycosylation showed that TbSTT3A selectively transfers biantennary Man5GlcNAc2 to specific glycosylation sites whereas TbSTT3B selectively transfers triantennary Man9GlcNAc2 to others. Analysis of T. brucei glycosylation site occupancy showed that TbSTT3A and TbSTT3B glycosylate sites in acidic to neutral and neutral to basic regions of polypeptide, respectively. This embodiment of distinct specificities in single-subunit OTases may have implications for recombinant glycoprotein engineering. TbSTT3A and TbSTT3B could be knocked down individually, but not collectively, in tissue culture. However, both were independently essential for parasite growth in mice, suggesting that inhibiting protein N-glycosylation could have therapeutic potential against trypanosomiasis.
glycosylation; oligosaccharyltransferase; STT3;
Sar2676, a pantothenate synthetase with a molecular weight of 31 419 Da from methicillin-resistant Staphylococcus aureus, has been expressed, purified and crystallized at 293 K.
Sar2676, a pantothenate synthetase with a molecular weight of 31 419 Da from methicillin-resistant Staphylococcus aureus, has been expressed, purified and crystallized at 293 K. The protein crystallizes in a primitive triclinic lattice, with unit-cell parameters a = 45.3, b = 60.5, c = 117.6 Å, α = 87.2, β = 81.2, γ = 68.4°. A complete data set has been collected to 2.3 Å resolution at the ESRF. Consideration of the likely solvent content suggested the asymmetric unit to contain four molecules. This has been confirmed by molecular-replacement phasing calculations, which give a solution with four monomers using a monomer of pantothenate synthetase from Escherichia coli (PDB code 1iho), which is 41% identical to Sar2676, as a search model.
Sar2676; pantothenate synthetase; methicillin-resistant Staphylococcus aureus
As part of work on S. aureus, the crystallization of Sar2028, a protein that is upregulated in MRSA, is reported.
Sar2028, an aspartate/tyrosine/phenylalanine pyridoxal-5′-phosphate-dependent aminotransferase with a molecular weight of 48 168 Da, was overexpressed in methicillin-resistant Staphylococcus aureus compared with a methicillin-sensitive strain. The protein was expressed in Escherichia coli, purified and crystallized. The protein crystallized in a primitive orthorhombic Laue group with unit-cell parameters a = 83.6, b = 91.3, c = 106.0 Å, α = β = γ = 90°. Analysis of the systematic absences along the three principal axes indicated the space group to be P212121. A complete data set was collected to 2.5 Å resolution.
Sar2028; Staphylococcus aureus; aminotransferases