Search tips
Search criteria

Results 1-20 (20)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
1.  The translational landscape of the splicing factor SRSF1 and its role in mitosis 
eLife  2014;3:e02028.
The shuttling serine/arginine rich (SR) protein SRSF1 (previously known as SF2/ASF) is a splicing regulator that also activates translation in the cytoplasm. In order to dissect the gene network that is translationally regulated by SRSF1, we performed a high-throughput deep sequencing analysis of polysomal fractions in cells overexpressing SRSF1. We identified approximately 1500 mRNAs that are translational targets of SRSF1. These include mRNAs encoding proteins involved in cell cycle regulation, such as spindle, kinetochore, and M phase proteins, which are essential for accurate chromosome segregation. Indeed, we show that translational activity of SRSF1 is required for normal mitotic progression. Furthermore, we found that mRNAs that display alternative splicing changes upon SRSF1 overexpression are also its translational targets, strongly suggesting that SRSF1 couples pre-mRNA splicing and translation. These data provide insights on the complex role of SRSF1 in the control of gene expression at multiple levels and its implications in cancer.
eLife digest
Genes contain the instructions to make proteins. These instructions are first transcribed to produce an intermediate molecule called a messenger RNA (mRNA), which is then translated to produce the protein. However, gene sequences are often interrupted by ‘introns’, sections of DNA that do not code for protein, and these introns must be removed from the mRNA molecules via a process called ‘splicing’ before the protein is produced.
Splicing can also be used to ‘mix and match’ sections of gene sequences to produce slightly different versions of the same protein in a process called ‘alternative splicing’. SRSF1 is one of a family of proteins that control both types of gene splicing but also promotes the translation of specific mRNAs. To date only a few of the genes whose translation is regulated by SRSF1 have been identified.
Here, Maslon, Heras et al. have used human cells that artificially produce more SRSF1 protein than normal to identify those genes whose translation is regulated by SRSF1. Over 1500 ‘target genes’ were found; many of which encoded proteins that are involved in cell division—and cells with less SRSF1 than normal failed to divide properly. Maslon, Heras et al. also found a link between alternative splicing and protein translation: many of the mRNAs that were spliced differently in cells that over-produced SRSF1 were also genes whose translation was affected by SRSF1.
Since uncontrolled cell division, or defects in mRNA splicing or protein synthesis are all often linked to cancer, these discoveries might provide new insights into the mechanisms underlying this disease.
PMCID: PMC4027812  PMID: 24842991
translation; splicing; SR proteins; human
2.  The Microprocessor controls the activity of mammalian retrotransposons 
Nature structural & molecular biology  2013;20(10):10.1038/nsmb.2658.
More than half of the human genome is made of Transposable Elements. Their ongoing mobilization is a driving force in genetic diversity; however, little is known about how the host regulates their activity. Here, we show that the Microprocessor (Drosha-DGCR8), which is required for microRNA biogenesis, also recognizes and binds RNAs derived from human LINE-1 (Long INterspersed Element 1), Alu and SVA retrotransposons. Expression analyses demonstrate that cells lacking a functional Microprocessor accumulate LINE-1 mRNA and encoded proteins. Furthermore, we show that structured regions of the LINE-1 mRNA can be cleaved in vitro by Drosha. Additionally, we used a cell culture-based assay to show that the Microprocessor negatively regulates LINE-1 and Alu retrotransposition in vivo. Altogether, these data reveal a new role for the Microprocessor as a post-transcriptional repressor of mammalian retrotransposons acting as a defender of human genome integrity.
PMCID: PMC3836241  PMID: 23995758
3.  Drosha Regulates Gene Expression Independently of RNA Cleavage Function 
Cell Reports  2013;5(6):1499-1510.
Drosha is the main RNase III-like enzyme involved in the process of microRNA (miRNA) biogenesis in the nucleus. Using whole-genome ChIP-on-chip analysis, we demonstrate that, in addition to miRNA sequences, Drosha specifically binds promoter-proximal regions of many human genes in a transcription-dependent manner. This binding is not associated with miRNA production or RNA cleavage. Drosha knockdown in HeLa cells downregulated nascent gene transcription, resulting in a reduction of polyadenylated mRNA produced from these gene regions. Furthermore, we show that this function of Drosha is dependent on its N-terminal protein-interaction domain, which associates with the RNA-binding protein CBP80 and RNA Polymerase II. Consequently, we uncover a previously unsuspected RNA cleavage-independent function of Drosha in the regulation of human gene expression.
Graphical Abstract
•Drosha binds promoter-proximal regions of transcribed human genes•Drosha binding is not associated with RNA cleavage or miRNA processing•Drosha regulates nascent gene transcription•Drosha interacts with CBP80 and RNA Pol II through its N-terminal domain
In higher eukaryotes, the Microprocessor complex (Drosha endonuclease and DGCR8 RNA-binding protein) recognizes and excises RNA hairpins from larger nuclear transcripts. This ultimately leads to cytoplasmic microRNA production or, in some cases, direct downregulation of gene expression through RNA cleavage. In this study, Gromak, Proudfoot, and colleagues show that Drosha-DGCR8 binds numerous gene promoters not to cleave RNA but, rather, to form a molecular interaction surface. This helps recruit additional factors to gene promoters with a consequent increase in gene activity.
PMCID: PMC3898267  PMID: 24360955
4.  The 5′ untranslated region of the serotonin receptor 2C pre-mRNA generates miRNAs and is expressed in non-neuronal cells 
The serotonin receptor 2C (HTR2C) gene encodes a G protein-coupled receptor that is exclusively expressed in neurons. Here, we report that the 5′ untranslated region of the receptor pre-mRNA as well as its hosted miRNAs is widely expressed in non-neuronal cell lines. Alternative splicing of HTR2C is regulated by MBII-52. MBII-52 and the neighboring MBII-85 cluster are absent in people with Prader–Willi syndrome, which likely causes the disease. We show that MBII-52 and MBII-85 increase expression of the HTR2C 5′ UTR and influence expression of the hosted miRNAs. The data indicate that the transcriptional unit expressing HTR2C is more complex than previously recognized and likely deregulated in Prader–Willi syndrome.
Electronic supplementary material
The online version of this article (doi:10.1007/s00221-013-3458-8) contains supplementary material, which is available to authorized users.
PMCID: PMC3787788  PMID: 23625045
miRNA; Alternative splicing; snoRNA; Serotonin receptor
5.  DGCR8 HITS-CLIP reveals novel functions for the Microprocessor 
The Drosha-DGCR8 complex (Microprocessor) is required for microRNA (miRNA) biogenesis. DGCR8 recognizes the RNA substrate, whereas Drosha functions as the endonuclease. High-throughput sequencing and crosslinking immunoprecipitation (HITS-CLIP) was used to identify RNA targets of DGCR8 in human cells. Unexpectedly, miRNAs were not the most abundant targets. DGCR8-bound RNAs also comprised several hundred mRNAs as well as snoRNAs and long non-coding RNAs. We found that the Microprocessor controls the abundance of several mRNAs as well as of MALAT-1. By contrast, DGCR8-mediated cleavage of snoRNAs is independent of Drosha, suggesting the involvement of DGCR8 in cellular complexes with other endonucleases. Interestingly, binding of DGCR8 to cassette exons, acts as a novel mechanism to regulate the relative abundance of alternatively spliced isoforms. Collectively, these data provide new insights in the complex role of DGCR8 in controlling the fate of several classes of RNAs.
PMCID: PMC3442229  PMID: 22796965
6.  Hog1 bypasses stress-mediated down-regulation of transcription by RNA polymerase II redistribution and chromatin remodeling 
Genome Biology  2012;13(11):R106.
Cells are subjected to dramatic changes of gene expression upon environmental changes. Stress causes a general down-regulation of gene expression together with the induction of a set of stress-responsive genes. The p38-related stress-activated protein kinase Hog1 is an important regulator of transcription upon osmostress in yeast.
Genome-wide localization studies of RNA polymerase II (RNA Pol II) and Hog1 showed that stress induced major changes in RNA Pol II localization, with a shift toward stress-responsive genes relative to housekeeping genes. RNA Pol II relocalization required Hog1, which was also localized to stress-responsive loci. In addition to RNA Pol II-bound genes, Hog1 also localized to RNA polymerase III-bound genes, pointing to a wider role for Hog1 in transcriptional control than initially expected. Interestingly, an increasing association of Hog1 with stress-responsive genes was strongly correlated with chromatin remodeling and increased gene expression. Remarkably, MNase-Seq analysis showed that although chromatin structure was not significantly altered at a genome-wide level in response to stress, there was pronounced chromatin remodeling for those genes that displayed Hog1 association.
Hog1 serves to bypass the general down-regulation of gene expression that occurs in response to osmostress, and does so both by targeting RNA Pol II machinery and by inducing chromatin remodeling at stress-responsive loci.
PMCID: PMC3580498  PMID: 23158682
7.  Predictive Models of Gene Regulation from High-Throughput Epigenomics Data 
The epigenetic regulation of gene expression involves multiple factors. The synergistic or antagonistic action of these factors has suggested the existence of an epigenetic code for gene regulation. Highthroughput sequencing (HTS) provides an opportunity to explore this code and to build quantitative models of gene regulation based on epigenetic differences between specific cellular conditions. We describe a new computational framework that facilitates the systematic integration of HTS epigenetic data. Our method relates epigenetic signals to expression by comparing two conditions. We show its effectiveness by building a model that predicts with high accuracy significant expression differences between two cell lines, using epigenetic data from the ENCODE project. Our analyses provide evidence for a degenerate epigenetic code, which involves multiple genic regions. In particular, signal changes at the 1st exon, 1st intron, and downstream of the polyadenylation site are found to associate strongly with expression regulation. Our analyses also show a different epigenetic code for intron-less and intron-containing genes. Our work provides a general methodology to do integrative analysis of epigenetic differences between cellular conditions that can be applied to other studies, like cell differentiation or carcinogenesis.
PMCID: PMC3424690  PMID: 22924024
8.  Use of ChIP-Seq data for the design of a multiple promoter-alignment method 
Nucleic Acids Research  2012;40(7):e52.
We address the challenge of regulatory sequence alignment with a new method, Pro-Coffee, a multiple aligner specifically designed for homologous promoter regions. Pro-Coffee uses a dinucleotide substitution matrix estimated on alignments of functional binding sites from TRANSFAC. We designed a validation framework using several thousand families of orthologous promoters. This dataset was used to evaluate the accuracy for predicting true human orthologs among their paralogs. We found that whereas other methods achieve on average 73.5% accuracy, and 77.6% when trained on that same dataset, the figure goes up to 80.4% for Pro-Coffee. We then applied a novel validation procedure based on multi-species ChIP-seq data. Trained and untrained methods were tested for their capacity to correctly align experimentally detected binding sites. Whereas the average number of correctly aligned sites for two transcription factors is 284 for default methods and 316 for trained methods, Pro-Coffee achieves 331, 16.5% above the default average. We find a high correlation between a method's performance when classifying orthologs and its ability to correctly align proven binding sites. Not only has this interesting biological consequences, it also allows us to conclude that any method that is trained on the ortholog data set will result in functionally more informative alignments.
PMCID: PMC3326335  PMID: 22230796
9.  Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data 
Bioinformatics  2011;27(24):3333-3340.
Motivation: High-throughput sequencing (HTS) has revolutionized gene regulation studies and is now fundamental for the detection of protein–DNA and protein–RNA binding, as well as for measuring RNA expression. With increasing variety and sequencing depth of HTS datasets, the need for more flexible and memory-efficient tools to analyse them is growing.
Results: We describe Pyicos, a powerful toolkit for the analysis of mapped reads from diverse HTS experiments: ChIP-Seq, either punctuated or broad signals, CLIP-Seq and RNA-Seq. We prove the effectiveness of Pyicos to select for significant signals and show that its accuracy is comparable and sometimes superior to that of methods specifically designed for each particular type of experiment. Pyicos facilitates the analysis of a variety of HTS datatypes through its flexibility and memory efficiency, providing a useful framework for data integration into models of regulatory genomics.
Availability: Open-source software, with tutorials and protocol files, is available at or as a Galaxy server at
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3232367  PMID: 21994224
10.  Direct cloning of double-stranded RNAs from RNase protection analysis reveals processing patterns of C/D box snoRNAs and provides evidence for widespread antisense transcript expression 
Nucleic Acids Research  2011;39(22):9720-9730.
We describe a new method that allows cloning of double-stranded RNAs (dsRNAs) that are generated in RNase protection experiments. We demonstrate that the mouse C/D box snoRNA MBII-85 (SNORD116) is processed into at least five shorter RNAs using processing sites near known functional elements of C/D box snoRNAs. Surprisingly, the majority of cloned RNAs from RNase protection experiments were derived from endogenous cellular RNA, indicating widespread antisense expression. The cloned dsRNAs could be mapped to genome areas that show RNA expression on both DNA strands and partially overlapped with experimentally determined argonaute-binding sites. The data suggest a conserved processing pattern for some C/D box snoRNAs and abundant expression of longer, non-coding RNAs in the cell that can potentially form dsRNAs.
PMCID: PMC3239178  PMID: 21880592
11.  Structural basis for the biological relevance of the invariant apical stem in IRES-mediated translation 
Nucleic Acids Research  2011;39(19):8572-8585.
RNA structure plays a fundamental role in internal initiation of translation. Picornavirus internal ribosome entry site (IRES) are long, efficient cis-acting elements that recruit the ribosome to internal mRNA sites. However, little is known about long-range constraints determining the IRES RNA structure. Here, we sought to investigate the functional and structural relevance of the invariant apical stem of a picornavirus IRES. Mutation of this apical stem revealed better performance of G:C compared with C:G base pairs, demonstrating that the secondary structure solely is not sufficient for IRES function. In turn, mutations designed to disrupt the stem abolished IRES activity. Lack of tolerance to accept genetic variability in the apical stem was supported by the presence of coupled covariations within the adjacent stem–loops. SHAPE structural analysis, gel mobility-shift and microarrays-based RNA accessibility revealed that the apical stem contributes to maintain IRES RNA structure through the generation of distant interactions between two adjacent stem–loops. Our results demonstrate that a highly interactive structure constrained by distant interactions involving invariant G:C base pairs plays a key role in maintaining the RNA conformation necessary for IRES-mediated translation.
PMCID: PMC3201876  PMID: 21742761
12.  Databases and resources for human small non-coding RNAs 
Human Genomics  2011;5(3):192-199.
Recent advances in high-throughput sequencing have facilitated the genome-wide studies of small non-coding RNAs (sRNAs). Numerous studies have highlighted the role of various classes of sRNAs at different levels of gene regulation and disease. The fast growth of sequence data and the diversity of sRNA species have prompted the need to organise them in annotation databases. There are currently several databases that collect sRNA data. Various tools are provided for access, with special emphasis on the well-characterised family of micro-RNAs. The striking heterogeneity of the new classes of sRNAs and the lack of sufficient functional annotation, however, make integration of these datasets a difficult task. This review describes the currently available databases for human sRNAs that are accessible via the internet, and some of the large datasets for human sRNAs from high-throughput sequencing experiments that are so far only available as supplementary data in publications. Some of the main issues related to the integration and annotation of sRNA datasets are also discussed.
PMCID: PMC3500172  PMID: 21504869
miRNAs; small RNAs; non-coding RNAs; high-throughput sequencing; databases
13.  Genome-Wide Association between Branch Point Properties and Alternative Splicing 
PLoS Computational Biology  2010;6(11):e1001016.
The branch point (BP) is one of the three obligatory signals required for pre-mRNA splicing. In mammals, the degeneracy of the motif combined with the lack of a large set of experimentally verified BPs complicates the task of modeling it in silico, and therefore of predicting the location of natural BPs. Consequently, BPs have been disregarded in a considerable fraction of the genome-wide studies on the regulation of splicing in mammals. We present a new computational approach for mammalian BP prediction. Using sequence conservation and positional bias we obtained a set of motifs with good agreement with U2 snRNA binding stability. Using a Support Vector Machine algorithm, we created a model complemented with polypyrimidine tract features, which considerably improves the prediction accuracy over previously published methods. Applying our algorithm to human introns, we show that BP position is highly dependent on the presence of AG dinucleotides in the 3′ end of introns, with distance to the 3′ splice site and BP strength strongly correlating with alternative splicing. Furthermore, experimental BP mapping for five exons preceded by long AG-dinucleotide exclusion zones revealed that, for a given intron, more than one BP can be chosen throughout the course of splicing. Finally, the comparison between exons of different evolutionary ages and pseudo exons suggests a key role of the BP in the pathway of exon creation in human. Our computational and experimental analyses suggest that BP recognition is more flexible than previously assumed, and it appears highly dependent on the presence of downstream polypyrimidine tracts. The reported association between BP features and the splicing outcome suggests that this, so far disregarded but yet crucial, element buries information that can complement current acceptor site models.
Author Summary
From transcription to translation, the events underlying protein production from DNA sequence are paramount to all aspects of cellular function. Pre-mRNAs in eukaryotes undergo several processing steps prior to their export to the cytoplasm. Among these, splicing – the process of intron removal and exon ligation – has been shown to play a central role in the regulation of gene expression. It has been estimated that more than half of the disease-causing mutations in humans do so by interfering with splicing. The difficulty in describing these disease mechanisms often lies in the low accuracy of the methods for prediction of functional splicing signals in the pre-mRNA. This is especially the case of the branch point, mainly due to its high sequence variability. We have developed a methodology for mammalian branch point prediction based on a machine-learning algorithm, which shows improved accuracy over previous published methods. Moreover, using a combination of experimental and bioinformatics approaches, we uncovered important positional properties of the branch point and shed new light on how some of its features may contribute to the final splicing outcome. These findings might prove useful for a better understanding of how splicing-associated mutations can lead to disease.
PMCID: PMC2991248  PMID: 21124863
14.  The Pivotal Roles of TIA Proteins in 5′ Splice-Site Selection of Alu Exons and Across Evolution 
PLoS Genetics  2009;5(11):e1000717.
More than 5% of alternatively spliced internal exons in the human genome are derived from Alu elements in a process termed exonization. Alus are comprised of two homologous arms separated by an internal polypyrimidine tract (PPT). In most exonizations, splice sites are selected from within the same arm. We hypothesized that the internal PPT may prevent selection of a splice site further downstream. Here, we demonstrate that this PPT enhanced the selection of an upstream 5′ splice site (5′ss), even in the presence of a stronger 5′ss downstream. Deletion of this PPT shifted selection to the stronger downstream 5′ss. This enhancing effect depended on the strength of the downstream 5′ss, on the efficiency of base-pairing to U1 snRNA, and on the length of the PPT. This effect of the PPT was mediated by the binding of TIA proteins and was dependent on the distance between the PPT and the upstream 5′ss. A wide-scale evolutionary analysis of introns across 22 eukaryotes revealed an enrichment in PPTs within ∼20 nt downstream of the 5′ss. For most metazoans, the strength of the 5′ss inversely correlated with the presence of a downstream PPT, indicative of the functional role of the PPT. Finally, we found that the proteins that mediate this effect, TIA and U1C, and in particular their functional domains, are highly conserved across evolution. Overall, these findings expand our understanding of the role of TIA1/TIAR proteins in enhancing recognition of exons, in general, and Alu exons, in particular.
Author Summary
Human genes are composed of functional regions, termed exons, separated by non-functional regions, termed introns. Intronic sequences may gradually accumulate mutations and subsequently become recognized by the splicing machinery as exons, a process termed exonization. Alu elements are prone to undergo exonization: more than 5% of alternatively spliced internal exons in the human genome originate from Alu elements. A typical Alu element is ∼300 nucleotides long, consisting of two arms separated by a polypyrimdine tract (PPT). Interestingly, in most cases, exonization occurs almost exclusively within either the right arm or the left, not both. Here we found that the PPT between the two arms serves as a binding site for TIA proteins and prevents the exon selection process from expanding into downstream regions. To obtain a wider overview of TIA function, we performed a cross-evolutionary analysis within 22 eukaryotes of this protein and of U1C, a protein known to interact with it, and found that functional regions of both these proteins were highly conserved. These findings highlight the pivotal role of TIA proteins in 5′ splice-site selection of Alu exons and exon recognition in general.
PMCID: PMC2766253  PMID: 19911040
15.  Exon creation and establishment in human genes 
Genome Biology  2008;9(9):R141.
A comparative genomics study of alternatively spliced exons showing that the relative local abundance of splicing regulatory motifs influences splicing decisions in humans.
A large proportion of species-specific exons are alternatively spliced. In primates, Alu elements play a crucial role in the process of exon creation but many new exons have appeared through other mechanisms. Despite many recent studies, it is still unclear which are the splicing regulatory requirements for de novo exonization and how splicing regulation changes throughout an exon's lifespan.
Using comparative genomics, we have defined sets of exons with different evolutionary ages. Younger exons have weaker splice-sites and lower absolute values for the relative abundance of putative splicing regulators between exonic and adjacent intronic regions, indicating a less consolidated splicing regulation. This relative abundance is shown to increase with exon age, leading to higher exon inclusion. We show that this local difference in the density of regulators might be of biological significance, as it outperforms other measures in real exon versus pseudo-exon classification. We apply this new measure to the specific case of the exonization of anti-sense Alu elements and show that they are characterized by a general lack of exonic splicing silencers.
Our results suggest that specific sequence environments are required for exonization and that these can change with time. We propose a model of exon creation and establishment in human genes, in which splicing decisions depend on the relative local abundance of regulatory motifs. Using this model, we provide further explanation as to why Alu elements serve as a major substrate for exon creation in primates. Finally, we discuss the benefits of integrating such information in gene prediction.
PMCID: PMC2592719  PMID: 18811936
16.  EGASP: the human ENCODE Genome Annotation Assessment Project 
Genome Biology  2006;7(Suppl 1):S2.
We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment.
The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified.
This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence.
PMCID: PMC1810551  PMID: 16925836
17.  Differentiated evolutionary rates in alternative exons and the implications for splicing regulation 
Alternatively spliced exons play an important role in the diversification of gene function in most metazoans and are highly regulated by conserved motifs in exons and introns. Two contradicting properties have been associated to evolutionary conserved alternative exons: higher sequence conservation and higher rate of non-synonymous substitutions, relative to constitutive exons. In order to clarify this issue, we have performed an analysis of the evolution of alternative and constitutive exons, using a large set of protein coding exons conserved between human and mouse and taking into account the conservation of the transcript exonic structure. Further, we have also defined a measure of the variation of the arrangement of exonic splicing enhancers (ESE-conservation score) to study the evolution of splicing regulatory sequences. We have used this measure to correlate the changes in the arrangement of ESEs with the divergence of exon and intron sequences.
We find evidence for a relation between the lack of conservation of the exonic structure and the weakening of the sequence evolutionary constraints in alternative and constitutive exons. Exons in transcripts with non-conserved exonic structures have higher synonymous (dS) and non-synonymous (dN) substitution rates than exons in conserved structures. Moreover, alternative exons in transcripts with non-conserved exonic structure are the least constrained in sequence evolution, and at high EST-inclusion levels they are found to be very similar to constitutive exons, whereas alternative exons in transcripts with conserved exonic structure have a dS significantly lower than average at all EST-inclusion levels. We also find higher conservation in the arrangement of ESEs in constitutive exons compared to alternative ones. Additionally, the sequence conservation at flanking introns remains constant for constitutive exons at all ESE-conservation values, but increases for alternative exons at high ESE-conservation values.
We conclude that most of the differences in dN observed between alternative and constitutive exons can be explained by the conservation of the transcript exonic structure. Low dS values are more characteristic of alternative exons with conserved exonic structure, but not of those with non-conserved exonic structure. Additionally, constitutive exons are characterized by a higher conservation in the arrangement of ESEs, and alternative exons with an ESE-conservation similar to that of constitutive exons are characterized by a conservation of the flanking intron sequences higher than average, indicating the presence of more intronic regulatory signals.
PMCID: PMC1543662  PMID: 16792801
18.  Gene finding in the chicken genome 
BMC Bioinformatics  2005;6:131.
Despite the continuous production of genome sequence for a number of organisms, reliable, comprehensive, and cost effective gene prediction remains problematic. This is particularly true for genomes for which there is not a large collection of known gene sequences, such as the recently published chicken genome. We used the chicken sequence to test comparative and homology-based gene-finding methods followed by experimental validation as an effective genome annotation method.
We performed experimental evaluation by RT-PCR of three different computational gene finders, Ensembl, SGP2 and TWINSCAN, applied to the chicken genome. A Venn diagram was computed and each component of it was evaluated. The results showed that de novo comparative methods can identify up to about 700 chicken genes with no previous evidence of expression, and can correctly extend about 40% of homology-based predictions at the 5' end.
De novo comparative gene prediction followed by experimental verification is effective at enhancing the annotation of the newly sequenced genomes provided by standard homology-based methods.
PMCID: PMC1174864  PMID: 15924626
19.  Comparative gene finding in chicken indicates that we are closing in on the set of multi-exonic widely expressed human genes 
Nucleic Acids Research  2005;33(6):1935-1939.
The recent availability of the chicken genome sequence poses the question of whether there are human protein-coding genes conserved in chicken that are currently not included in the human gene catalog. Here, we show, using comparative gene finding followed by experimental verification of exon pairs by RT–PCR, that the addition to the multi-exonic subset of this catalog could be as little as 0.2%, suggesting that we may be closing in on the human gene set. Our protocol, however, has two shortcomings: (i) the bioinformatic screening of the predicted genes, applied to filter out false positives, cannot handle intronless genes; and (ii) the experimental verification could fail to identify expression at a specific developmental time. This highlights the importance of developing methods that could provide a reliable estimate of the number of these two types of genes.
PMCID: PMC1074396  PMID: 15809229
20.  Evaluation of the chicken transcriptome by SAGE of B cells and the DT40 cell line 
BMC Genomics  2004;5:98.
The understanding of whole genome sequences in higher eukaryotes depends to a large degree on the reliable definition of transcription units including exon/intron structures, translated open reading frames (ORFs) and flanking untranslated regions. The best currently available chicken transcript catalog is the Ensembl build based on the mappings of a relatively small number of full length cDNAs and ESTs to the genome as well as genome sequence derived in silico gene predictions.
We use Long Serial Analysis of Gene Expression (LongSAGE) in bursal lymphocytes and the DT40 cell line to verify the quality and completeness of the annotated transcripts. 53.6% of the more than 38,000 unique SAGE tags (unitags) match to full length bursal cDNAs, the Ensembl transcript build or the genome sequence. The majority of all matching unitags show single matches to the genome, but no matches to the genome derived Ensembl transcript build. Nevertheless, most of these tags map close to the 3' boundaries of annotated Ensembl transcripts.
These results suggests that rather few genes are missing in the current Ensembl chicken transcript build, but that the 3' ends of many transcripts may not have been accurately predicted. The tags with no match in the transcript sequences can now be used to improve gene predictions, pinpoint the genomic location of entirely missed transcripts and optimize the accuracy of gene finder software.
PMCID: PMC543457  PMID: 15610564

Results 1-20 (20)