Search tips
Search criteria

Results 1-22 (22)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  Identification of large intergenic non-coding RNAs in bovine muscle using next-generation transcriptomic sequencing 
BMC Genomics  2014;15(1):499.
The advent of large-scale gene expression technologies has helped to reveal in eukaryotic cells, the existence of thousands of non-coding transcripts, whose function and significance remain mostly poorly understood. Among these non-coding transcripts, long non-coding RNAs (lncRNAs) are the least well-studied but are emerging as key regulators of diverse cellular processes. In the present study, we performed a survey in bovine Longissimus thoraci of lincRNAs (long intergenic non-coding RNAs not overlapping protein-coding transcripts). To our knowledge, this represents the first such study in bovine muscle.
To identify lincRNAs, we used paired-end RNA sequencing (RNA-Seq) to explore the transcriptomes of Longissimus thoraci from nine Limousin bull calves. Approximately 14–45 million paired-end reads were obtained per library. A total of 30,548 different transcripts were identified. Using a computational pipeline, we defined a stringent set of 584 different lincRNAs with 418 lincRNAs found in all nine muscle samples. Bovine lincRNAs share characteristics seen in their mammalian counterparts: relatively short transcript and gene lengths, low exon number and significantly lower expression, compared to protein-encoding genes. As for the first time, our study identified lincRNAs from nine different samples from the same tissue, it is possible to analyse the inter-individual variability of the gene expression level of the identified lincRNAs. Interestingly, there was a significant difference when we compared the expression variation of the 418 lincRNAs with the 10,775 known selected protein-encoding genes found in all muscle samples. In addition, we found 2,083 pairs of lincRNA/protein-encoding genes showing a highly significant correlated expression. Fourteen lincRNAs were selected and 13 were validated by RT-PCR. Some of the lincRNAs expressed in muscle are located within quantitative trait loci for meat quality traits.
Our study provides a glimpse into the lincRNA content of bovine muscle and will facilitate future experimental studies to unravel the function of these molecules. It may prove useful to elucidate their effect on mechanisms underlying the genetic variability of meat quality traits. This catalog will complement the list of lincRNAs already discovered in cattle and therefore will help to better annotate the bovine genome.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-499) contains supplementary material, which is available to authorized users.
PMCID: PMC4073507  PMID: 24948191
Cattle; Muscle; RNA-Seq; Beef; Long non-coding RNA
2.  RNA at 92°C 
RNA Biology  2013;10(7):1211-1220.
The non-coding transcriptome of the hyperthermophilic archaeon Pyrococcus abyssi is investigated using the RNA-seq technology. A dedicated computational pipeline analyzes RNA-seq reads and prior genome annotation to identify small RNAs, untranslated regions of mRNAs, and cis-encoded antisense transcripts. Unlike other archaea, such as Sulfolobus and Halobacteriales, P. abyssi produces few leaderless mRNA transcripts. Antisense transcription is widespread (215 transcripts) and targets protein-coding genes that appear to evolve more rapidly than average genes. We identify at least three novel H/ACA-like guide RNAs among the newly characterized non-coding RNAs. Long 5′ UTRs in mRNAs of ribosomal proteins and amino-acid biosynthesis genes strongly suggest the presence of cis-regulatory leaders in these mRNAs. We selected a high-interest subset of non-coding RNAs based on their strong promoters, high GC-content, phylogenetic conservation, or abundance. Some of the novel small RNAs and long 5′ UTRs display high GC contents, suggesting unknown structural RNA functions. However, we were surprised to observe that most of the high-interest RNAs are AU-rich, which suggests an absence of stable secondary structure in the high-temperature environment of P. abyssi. Yet, these transcripts display other hallmarks of functionality, such as high expression or high conservation, which leads us to consider possible RNA functions that do not require extensive secondary structure.
PMCID: PMC3849170  PMID: 23884177
transcriptome; hyperthermophile; archaea; non-coding RNA
3.  Characterization of novel genomic alterations and therapeutic approaches using acute megakaryoblastic leukemia xenograft models 
The Journal of Experimental Medicine  2012;209(11):2017-2031.
A CBFA2T3-GLIS2 fusion gene was identified in 31% of non–Down syndrome AMKL.
Acute megakaryoblastic leukemia (AMKL) is a heterogeneous disease generally associated with poor prognosis. Gene expression profiles indicate the existence of distinct molecular subgroups, and several genetic alterations have been characterized in the past years, including the t(1;22)(p13;q13) and the trisomy 21 associated with GATA1 mutations. However, the majority of patients do not present with known mutations, and the limited access to primary patient leukemic cells impedes the efficient development of novel therapeutic strategies. In this study, using a xenotransplantation approach, we have modeled human pediatric AMKL in immunodeficient mice. Analysis of high-throughput RNA sequencing identified recurrent fusion genes defining new molecular subgroups. One subgroup of patients presented with MLL or NUP98 fusion genes leading to up-regulation of the HOX A cluster genes. A novel CBFA2T3-GLIS2 fusion gene resulting from a cryptic inversion of chromosome 16 was identified in another subgroup of 31% of non–Down syndrome AMKL and strongly associated with a gene expression signature of Hedgehog pathway activation. These molecular data provide useful markers for the diagnosis and follow up of patients. Finally, we show that AMKL xenograft models constitute a relevant in vivo preclinical screening platform to validate the efficacy of novel therapies such as Aurora A kinase inhibitors.
PMCID: PMC3478932  PMID: 23045605
4.  A universal RNA structural motif docking the elbow of tRNA in the ribosome, RNAse P and T-box leaders 
Nucleic Acids Research  2013;41(10):5494-5502.
The structure and function of conserved motifs constituting the apex of Stem I in T-box mRNA leaders are investigated. We point out that this apex shares striking similarities with the L1 stalk (helices 76–78) of the ribosome. A sequence and structure analysis of both elements shows that, similarly to the head of the L1 stalk, the function of the apex of Stem I lies in the docking of tRNA through a stacking interaction with the conserved G19:C56 base pair platform. The inferred structure in the apex of Stem I consists of a module of two T-loops bound together head to tail, a module that is also present in the head of the L1 stalk, but went unnoticed. Supporting the analysis, we show that a highly conserved structure in RNAse P formerly described as the J11/12–J12/11 module, which is precisely known to bind the elbow of tRNA, constitutes a third instance of this T-loop module. A structural analysis explains why six nucleotides constituting the core of this module are highly invariant among all three types of RNA. Our finding that major RNA partners of tRNA bind the elbow with a same RNA structure suggests an explanation for the origin of the tRNA L-shape.
PMCID: PMC3664808  PMID: 23580544
5.  BRASERO: A Resource for Benchmarking RNA Secondary Structure Comparison Algorithms 
Advances in Bioinformatics  2012;2012:893048.
The pairwise comparison of RNA secondary structures is a fundamental problem, with direct application in mining databases for annotating putative noncoding RNA candidates in newly sequenced genomes. An increasing number of software tools are available for comparing RNA secondary structures, based on different models (such as ordered trees or forests, arc annotated sequences, and multilevel trees) and computational principles (edit distance, alignment). We describe here the website BRASERO that offers tools for evaluating such software tools on real and synthetic datasets.
PMCID: PMC3366197  PMID: 22675348
6.  NAPP: the Nucleic Acid Phylogenetic Profile Database 
Nucleic Acids Research  2011;40(Database issue):D205-D209.
Nucleic acid phylogenetic profiling (NAPP) classifies coding and non-coding sequences in a genome according to their pattern of conservation across other genomes. This procedure efficiently distinguishes clusters of functional non-coding elements in bacteria, particularly small RNAs and cis-regulatory RNAs, from other conserved sequences. In contrast to other non-coding RNA detection pipelines, NAPP does not require the presence of conserved RNA secondary structure and therefore is likely to identify previously undetected RNA genes or elements. Furthermore, as NAPP clusters contain both coding and non-coding sequences with similar occurrence profiles, they can be analyzed under a functional perspective. We recently improved the NAPP pipeline and applied it to a collection of 949 bacterial and 68 archaeal species. The database and web interface available at enable detailed analysis of NAPP clusters enriched in non-coding RNAs, graphical display of phylogenetic profiles, visualization of predicted RNAs in their genome context and extraction of predicted RNAs for use with genome browsers or other software.
PMCID: PMC3245103  PMID: 21984475
7.  CsfG, a sporulation-specific, small non-coding RNA highly conserved in endospore formers 
RNA Biology  2011;8(3):358-364.
Endospore formation is a characteristic shared by some Bacilli and Clostridia that involves the creation of two cell types, the forespore and the mother cell. Hundreds of protein-encoding genes have been shown to be transcribed in a cell-specific fashion during this developmental process in Bacillus subtilis. We have used a phylogenetic profiling procedure to identify clusters of B. subtilis coding and non-coding sequences that co-occur in other endospore formers. One such cluster shows a strong bias for sporulation-related genes (42% among 156 genes) and is enriched in potential non-coding RNAs. We have studied one RNA candidate, encoded in the ylbG-ylbH interval. In vivo analysis using a transcriptional fusion to the Escherichia coli lacZ gene demonstrates that this region of the chromosome contains a gene, csfG, encoding a 147-nucleotide RNA that is transcribed only during sporulation, specifically in the forespore. csfG is present in many endospore formers, mostly Bacilli and some Clostridia, whereas it is absent from bacteria that do not produce endospores. All CsfG RNAs contain a strongly conserved, pyrimidine-rich, central motif that overlaps a potential stem-loop structure. The remarkable conservation of this sequence in widely divergent bacteria suggests that it plays a conserved physiological role, presumably by interacting with an unidentified target in the forespore, where it contributes to the acquisition of the spore properties.
PMCID: PMC3218505  PMID: 21532344
small RNA; sporulation; germination; forespore; Bacilli; Clostridia
8.  Premature terminator analysis sheds light on a hidden world of bacterial transcriptional attenuation 
Genome Biology  2010;11(9):R97.
Bacterial transcription attenuation occurs through a variety of cis-regulatory elements that control gene expression in response to a wide range of signals. The signal-sensing structures in attenuators are so diverse and rapidly evolving that only a small fraction have been properly annotated and characterized to date. Here we apply a broad-spectrum detection tool in order to achieve a more complete view of the transcriptional attenuation complement of key bacterial species.
Our protocol seeks gene families with an unusual frequency of 5' terminators found across multiple species. Many of the detected attenuators are part of annotated elements, such as riboswitches or T-boxes, which often operate through transcriptional attenuation. However, a significant fraction of candidates were not previously characterized in spite of their unmistakable footprint. We further characterized some of these new elements using sequence and secondary structure analysis. We also present elements that may control the expression of several non-homologous genes, suggesting co-transcription and response to common signals. An important class of such elements, which we called mobile attenuators, is provided by 3' terminators of insertion sequences or prophages that may be exapted as 5' regulators when inserted directly upstream of a cellular gene.
We show here that attenuators involve a complex landscape of signal-detection structures spanning the entire bacterial domain. We discuss possible scenarios through which these diverse 5' regulatory structures may arise or evolve.
PMCID: PMC2965389  PMID: 20920266
9.  Experimental discovery of small RNAs in Staphylococcus aureus reveals a riboregulator of central metabolism 
Nucleic Acids Research  2010;38(19):6620-6636.
Using an experimental approach, we investigated the RNome of the pathogen Staphylococcus aureus to identify 30 small RNAs (sRNAs) including 14 that are newly confirmed. Among the latter, 10 are encoded in intergenic regions, three are generated by premature transcription termination associated with riboswitch activities, and one is expressed from the complementary strand of a transposase gene. The expression of four sRNAs increases during the transition from exponential to stationary phase. We focused our study on RsaE, an sRNA that is highly conserved in the bacillales order and is deleterious when over-expressed. We show that RsaE interacts in vitro with the 5′ region of opp3A mRNA, encoding an ABC transporter component, to prevent formation of the ribosomal initiation complex. A previous report showed that RsaE targets opp3B which is co-transcribed with opp3A. Thus, our results identify an unusual case of riboregulation where the same sRNA controls an operon mRNA by targeting two of its cistrons. A combination of biocomputational and transcriptional analyses revealed a remarkably coordinated RsaE-dependent downregulation of numerous metabolic enzymes involved in the citrate cycle and the folate-dependent one-carbon metabolism. As we observed that RsaE accumulates transiently in late exponential growth, we propose that RsaE functions to ensure a coordinate downregulation of the central metabolism when carbon sources become scarce.
PMCID: PMC2965222  PMID: 20511587
11.  Metagenome Annotation Using a Distributed Grid of Undergraduate Students 
PLoS Biology  2008;6(11):e296.
The Annotathon is a novel bioinformatics teaching environment, where undergraduate students join in a community annotation effort. Besides being a rewarding educational tool, it holds the added promise of potentially useful scientific findings.
PMCID: PMC2586363  PMID: 19067492
12.  The genome sequence of the model ascomycete fungus Podospora anserina 
Genome Biology  2008;9(5):R77.
A 10X draft sequence of Podospora anserina genome shows highly dynamic evolution since its divergence from Neurospora crassa.
The dung-inhabiting ascomycete fungus Podospora anserina is a model used to study various aspects of eukaryotic and fungal biology, such as ageing, prions and sexual development.
We present a 10X draft sequence of P. anserina genome, linked to the sequences of a large expressed sequence tag collection. Similar to higher eukaryotes, the P. anserina transcription/splicing machinery generates numerous non-conventional transcripts. Comparison of the P. anserina genome and orthologous gene set with the one of its close relatives, Neurospora crassa, shows that synteny is poorly conserved, the main result of evolution being gene shuffling in the same chromosome. The P. anserina genome contains fewer repeated sequences and has evolved new genes by duplication since its separation from N. crassa, despite the presence of the repeat induced point mutation mechanism that mutates duplicated sequences. We also provide evidence that frequent gene loss took place in the lineages leading to P. anserina and N. crassa. P. anserina contains a large and highly specialized set of genes involved in utilization of natural carbon sources commonly found in its natural biotope. It includes genes potentially involved in lignin degradation and efficient cellulose breakdown.
The features of the P. anserina genome indicate a highly dynamic evolution since the divergence of P. anserina and N. crassa, leading to the ability of the former to use specific complex carbon sources that match its needs in its natural biotope.
PMCID: PMC2441463  PMID: 18460219
13.  Entropy Measures Quantify Global Splicing Disorders in Cancer 
PLoS Computational Biology  2008;4(3):e1000011.
Most mammalian genes are able to express several splice variants in a phenomenon known as alternative splicing. Serious alterations of alternative splicing occur in cancer tissues, leading to expression of multiple aberrant splice forms. Most studies of alternative splicing defects have focused on the identification of cancer-specific splice variants as potential therapeutic targets. Here, we examine instead the bulk of non-specific transcript isoforms and analyze their level of disorder using a measure of uncertainty called Shannon's entropy. We compare isoform expression entropy in normal and cancer tissues from the same anatomical site for different classes of transcript variations: alternative splicing, polyadenylation, and transcription initiation. Whereas alternative initiation and polyadenylation show no significant gain or loss of entropy between normal and cancer tissues, alternative splicing shows highly significant entropy gains for 13 of the 27 cancers studied. This entropy gain is characterized by a flattening in the expression profile of normal isoforms and is correlated to the level of estimated cellular proliferation in the cancer tissue. Interestingly, the genes that present the highest entropy gain are enriched in splicing factors. We provide here the first quantitative estimate of splicing disruption in cancer. The expression of normal splice variants is widely and significantly disrupted in at least half of the cancers studied. We postulate that such splicing disorders may develop in part from splicing alteration in key splice factors, which in turn significantly impact multiple target genes.
Author Summary
RNA splicing is the process by which gene products are pieced together to form a mature messenger RNA (mRNA). In normal cells, RNA splicing is a tightly controlled process that leads to production of a well-defined set of mRNAs. Cancer cells, however, often produce aberrant, mis-spliced mRNAs. Such disorders have not been quantified to date. To this end, we use a well-known measure of disorder called Shannon's entropy. We show that overall splicing disorders are highly significant in many cancers, and that the extent of disorder may be correlated to the level of cell proliferation in each tumor. Surprisingly, genes that control the splicing mechanism are unusually frequent among genes affected by splicing disorders. This suggests that cancer cells may withstand harmful chain reactions in which splicing defects in key regulatory genes would in turn cause extensive splicing damage. As mis-spliced mRNAs are widely studied for cancer diagnosis, awareness of these global disorders is important to distinguish reliable cancer markers from background noise.
PMCID: PMC2268240  PMID: 18369415
14.  Beyond the 3′ end: experimental validation of extended transcript isoforms 
Nucleic Acids Research  2007;35(6):1947-1957.
High throughput EST and full-length cDNA sequencing have revealed extensive variations at the 3′ ends of mammalian transcripts. Whether all of these changes are biologically meaningful has been the subject of controversy, as such, results may reflect in part transcription or polyadenylation leakage. We selected here a set of tandem poly(A) sites predicted from EST/cDNA sequence analysis that (i) are conserved between human and mouse, (ii) produce alternative 3′ isoforms with unusual size features and (iii) are not documented in current genome databases, and we submitted these sites to experimental validation in mouse tissues. Out of 86 tested poly(A) sites from 44 genes, 84 were individually confirmed using a specially devised RT-PCR strategy. We then focused on validating the exon structure between distant tandem poly(A) sites separated by over 3 kb, and between stop codons and alternative poly(A) sites located at 4.5 kb or more, using a long-distance RT-PCR strategy. In most cases, long transcripts spanning the whole poly(A)–poly(A) or stop-poly(A) distance were detected, confirming that tandem sites were part of the same transcription unit. Given the apparent conservation of these long alternative 3′ ends, different regulatory functions can be foreseen, depending on the location where transcription starts.
PMCID: PMC1874610  PMID: 17339231
15.  Conservation of alternative polyadenylation patterns in mammalian genes 
BMC Genomics  2006;7:189.
Alternative polyadenylation is a widespread mechanism contributing to transcript diversity in eukaryotes. Over half of mammalian genes are alternatively polyadenylated. Our understanding of poly(A) site evolution is limited by the lack of a reliable identification of conserved, equivalent poly(A) sites among species. We introduce here a working definition of conserved poly(A) sites as sites that are both (i) properly aligned in human and mouse orthologous 3' untranslated regions (UTRs) and (ii) supported by EST or cDNA data in both species.
We identified about 4800 such conserved poly(A) sites covering one third of the orthologous gene set studied. Characteristics of conserved poly(A) sites such as processing efficiency and tissue-specificity were analyzed. Conserved sites show a higher processing efficiency but no difference in tissular distribution when compared to non-conserved sites. In general, alternative poly(A) sites are species-specific and involve minor, non-conserved sites that are unlikely to play essential roles. However, there are about 500 genes with conserved tandem poly(A) sites. A significant fraction of these conserved tandems display a conserved arrangement of major/minor sites in their 3' UTR, suggesting that these alternative 3' ends may be under selection.
This analysis allows us to identify potential functional alternative poly(A) sites and provides clues on the selective mechanisms at play in the appearance of multiple poly(A) sites and their maintenance in the 3' UTRs of genes.
PMCID: PMC1550727  PMID: 16872498
16.  Differential Repression of Alternative Transcripts: A Screen for miRNA Targets 
PLoS Computational Biology  2006;2(5):e43.
Alternative polyadenylation sites produce transcript isoforms with 3′ untranslated regions (UTRs) of different lengths. If a microRNA (miRNA) target is present in the UTR, then only those target-containing isoforms should be sensitive to control by a cognate miRNA. We carried out a systematic examination of 3′ UTRs containing multiple poly(A) sites and putative miRNA targets. Based on expressed sequence tag (EST) counts and EST library information, we observed that levels of isoforms containing targets for miR-1 or miR-124, two miRNAs causing downregulation of transcript levels, were reduced in tissues expressing the corresponding miRNA. This analysis was repeated for all conserved 7-mers in 3′ UTRs, resulting in a selection of 312 motifs. We show that this set is significantly enriched in known miRNA targets and mRNA-destabilizing elements, which validates our initial hypothesis. We scanned the human genome for possible cognate miRNAs and identified phylogenetically conserved precursors matching our motifs. This analysis can help identify target-miRNA couples that went undetected in previous screens, but it may also reveal targets for other types of regulatory factors.
MicroRNAs (miRNAs) are short RNA molecules that recognize specific target sequences in the 3′ region of mRNAs. These miRNAs can then specifically keep the mRNAs from being expressed, or translated into proteins. In this article, the authors ask what happens when a targeted mRNA has several forms differing by their 3′ regions. Such 3′ variations are very common. If two or more variations are present in a single mRNA, the result is two or more mRNAs with 3′ ends of different lengths. If an miRNA target is located between the two sites of variability, the shorter transcript should be target free and should escape miRNA-mediated inhibition, while longer transcripts should be inhibited. To test this hypothesis, the authors looked at mRNAs that had these variable 3′ ends. Variants containing targets for certain miRNAs appeared to be specifically underrepresented in tissues where these particular miRNAs are found. This principle was used to find other sequence patterns in 3′ regions that had a similar effect, and a list of 312 significant patterns was obtained. The authors then scanned genome sequences and identified possible cognate miRNAs for these patterns. This new knowledge will help further an understanding of how genes are controlled.
PMCID: PMC1458965  PMID: 16699595
17.  AltTrans: Transcript pattern variants annotated for both alternative splicing and alternative polyadenylation 
BMC Bioinformatics  2006;7:169.
The three major mechanisms that regulate transcript formation involve the selection of alternative sites for transcription start (TS), splicing, and polyadenylation. Currently there are efforts that collect data & annotation individually for each of these variants. It is important to take an integrated view of these data sets and to derive a data set of alternate transcripts along with consolidated annotation. We have been developing in the past computational pipelines that generate value-added data at genome-scale on individual variant types; these include AltSplice on splicing and AltPAS on polyadenylation. We now extend these pipelines and integrate the resultant data sets to facilitate an integrated view of the contributions from splicing and polyadenylation in the formation of transcript variants.
The AltSplice pipeline examines gene-transcript alignments and delineates alternative splice events and splice patterns; this pipeline is extended as AltTrans to delineate isoform transcript patterns for each of which both introns/exons and 'terminating' polyA site are delineated; EST/mRNA sequences that qualify the transcript pattern confirm both the underlying splicing and polyadenylation. The AltPAS pipeline examines gene-transcript alignments and delineates all potential polyA sites irrespective of underlying splicing patterns. Resultant polyA sites from both AltTrans and AltPAS are merged. The generated database reports data on alternative splicing, alternative polyadenylation and the resultant alternate transcript patterns; the basal data is annotated for various biological features. The data (named as integrated AltTrans data) generated for both the organisms of human and mouse is made available through the Alternate Transcript Diversity web site at .
The reported data set presents alternate transcript patterns that are annotated for both alternative splicing and alternative polyadenylation. Results based on current transcriptome data indicate that the contribution of alternative splicing is larger than that of alternative polyadenylation.
PMCID: PMC1435940  PMID: 16556303
18.  Computing expectation values for RNA motifs using discrete convolutions 
BMC Bioinformatics  2005;6:118.
Computational biologists use Expectation values (E-values) to estimate the number of solutions that can be expected by chance during a database scan. Here we focus on computing Expectation values for RNA motifs defined by single-strand and helix lod-score profiles with variable helix spans. Such E-values cannot be computed assuming a normal score distribution and their estimation previously required lengthy simulations.
We introduce discrete convolutions as an accurate and fast mean to estimate score distributions of lod-score profiles. This method provides excellent score estimations for all single-strand or helical elements tested and also applies to the combination of elements into larger, complex, motifs. Further, the estimated distributions remain accurate even when pseudocounts are introduced into the lod-score profiles. Estimated score distributions are then easily converted into E-values.
A good agreement was observed between computed E-values and simulations for a number of complete RNA motifs. This method is now implemented into the ERPIN software, but it can be applied as well to any search procedure based on ungapped profiles with statistically independent columns.
PMCID: PMC1168889  PMID: 15892887
19.  The ERPIN server: an interface to profile-based RNA motif identification 
Nucleic Acids Research  2004;32(Web Server issue):W160-W165.
ERPIN is an RNA motif identification program that takes an RNA sequence alignment as an input and identifies related sequences using a profile-based dynamic programming algorithm. ERPIN differs from other RNA motif search programs in its ability to capture subtle biases in the training set and produce highly specific and sensitive searches, while keeping CPU requirements at a practical level. In its latest version, ERPIN also computes E-values, which tell biologists how likely they are to encounter a specific sequence match by chance—a useful indication of biological significance. We present here the ERPIN online search interface ( This web server automatically performs ERPIN searches for different RNA genes or motifs, using predefined training sets and search parameters. With a couple of clicks, users can analyze an entire bacterial genome or a genomic segment of up to 5Mb for the presence of tRNAs, 5S rRNAs, SRP RNA, C/D box snoRNAs, hammerhead motifs, miRNAs and other motifs. Search results are displayed with sequence, score, position, E-value and secondary structure graphics. An example of a complete genome scan is provided, as well as an evaluation of run times and specificity/sensitivity information for all available motifs.
PMCID: PMC441556  PMID: 15215371
20.  Sequence determinants in human polyadenylation site selection 
BMC Genomics  2003;4:7.
Differential polyadenylation is a widespread mechanism in higher eukaryotes producing mRNAs with different 3' ends in different contexts. This involves several alternative polyadenylation sites in the 3' UTR, each with its specific strength. Here, we analyze the vicinity of human polyadenylation signals in search of patterns that would help discriminate strong and weak polyadenylation sites, or true sites from randomly occurring signals.
We used human genomic sequences to retrieve the region downstream of polyadenylation signals, usually absent from cDNA or mRNA databases. Analyzing 4956 EST-validated polyadenylation sites and their -300/+300 nt flanking regions, we clearly visualized the upstream (USE) and downstream (DSE) sequence elements, both characterized by U-rich (not GU-rich) segments. The presence of a USE and a DSE is the main feature distinguishing true polyadenylation sites from randomly occurring A(A/U)UAAA hexamers. While USEs are indifferently associated with strong and weak poly(A) sites, DSEs are more conspicuous near strong poly(A) sites. We then used the region encompassing the hexamer and DSE as a training set for poly(A) site identification by the ERPIN program and achieved a prediction specificity of 69 to 85% for a sensitivity of 56%.
The availability of complete genomes and large EST sequence databases now permit large-scale observation of polyadenylation sites. Both U-rich sequences flanking both sides of poly(A) signals contribute to the definition of "true" sites. However, the downstream U-rich sequences may also play an enhancing role. Based on this information, poly(A) site prediction accuracy was moderately but consistently improved compared to the best previously available algorithm.
PMCID: PMC151664  PMID: 12600277
21.  RNAMotif, an RNA secondary structure definition and search algorithm 
Nucleic Acids Research  2001;29(22):4724-4735.
RNA molecules fold into characteristic secondary and tertiary structures that account for their diverse functional activities. Many of these RNA structures are assembled from a collection of RNA structural motifs. These basic building blocks are used repeatedly, and in various combinations, to form different RNA types and define their unique structural and functional properties. Identification of recurring RNA structural motifs will therefore enhance our understanding of RNA structure and help associate elements of RNA structure with functional and regulatory elements. Our goal was to develop a computer program that can describe an RNA structural element of any complexity and then search any nucleotide sequence database, including the complete prokaryotic and eukaryotic genomes, for these structural elements. Here we describe in detail a new computational motif search algorithm, RNAMotif, and demonstrate its utility with some motif search examples. RNAMotif differs from other motif search tools in two important aspects: first, the structure definition language is more flexible and can specify any type of base–base interaction; second, RNAMotif provides a user controlled scoring section that can be used to add capabilities that patterns alone cannot provide.
PMCID: PMC92549  PMID: 11713323
22.  Quantitative Analysis of the T Cell Repertoire Selected by a Single Peptide–Major Histocompatibility Complex  
The Journal of Experimental Medicine  1998;187(11):1871-1883.
The positive selection of CD4+ T cells requires the expression of major histocompatibility complex (MHC) class II molecules in the thymus, but the role of self-peptides complexed to class II molecules is still a matter of debate. Recently, it was observed that transgenic mice expressing a single peptide–MHC class II complex positively select significant numbers of diverse CD4+ T cells in the thymus. However, the number of selected T cell specificities has not been evaluated so far. Here, we have sequenced 700 junctional complementarity determining regions 3 (CDR3) from T cell receptors (TCRs) carrying Vβ11-Jβ1.1 or Vβ12-Jβ1.1 rearrangements. We found that a single peptide–MHC class II complex positively selects at least 105 different Vβ rearrangements. Our data yield a first evaluation of the size of the T cell repertoire. In addition, they provide evidence that the single Eα52-68–I-Ab complex skews the amino acid frequency in the TCR CDR3 loop of positively selected T cells. A detailed analysis of CDR3 sequences indicates that a fraction of the β chain repertoire bears the imprint of the selecting self-peptide.
PMCID: PMC2212317  PMID: 9607927
thymus; major histocompatibility complex; T cell receptors; repertoire development; transgenic/knockout

Results 1-22 (22)