Search tips
Search criteria

Results 1-19 (19)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
1.  Fur Is the Master Regulator of the Extraintestinal Pathogenic Escherichia coli Response to Serum 
mBio  2014;5(4):e01460-14.
Drug-resistant extraintestinal pathogenic Escherichia coli (ExPEC) strains are the major cause of colisepticemia (colibacillosis), a condition that has become an increasing public health problem in recent years. ExPEC strains are characterized by high resistance to serum, which is otherwise highly toxic to most bacteria. To understand how these bacteria survive and grow in serum, we performed system-wide analyses of their response to serum, making a clear distinction between the responses to nutritional immunity and innate immunity. Thus, mild heat inactivation of serum destroys the immune complement and abolishes the bactericidal effect of serum (inactive serum), making it possible to examine nutritional immunity. We used a combination of deep RNA sequencing and proteomics in order to characterize ExPEC genes whose expression is affected by the nutritional stress of serum and by the immune complement. The major change in gene expression induced by serum—active and inactive—involved metabolic genes. In particular, the serum metabolic response is coordinated by three transcriptional regulators, Fur, BasR, and CysB. Fur alone was responsible for more than 80% of the serum-induced transcriptional response. Consistent with its role as a major serum response regulator, deletion of Fur renders the bacteria completely serum sensitive. These results highlight the role of metabolic adaptation in colisepticemia and virulence.
Drug-resistant extraintestinal pathogenic Escherichia coli (ExPEC) strains have emerged as major pathogens, especially in community- and hospital-acquired infections. These bacteria cause a large spectrum of syndromes, the most serious of which is septicemia, a condition with a high mortality rate. These bacterial strains are characterized by high resistance to serum, otherwise highly toxic to most bacteria. To understand the basis of this resistance, we carried out system-wide analyses of the response of ExPEC strains to serum by using proteomics and deep RNA sequencing. The major changes in gene expression induced by exposure to serum involved metabolic genes, not necessarily implicated in relation to virulence. One metabolic regulator—Fur—involved in iron metabolism was responsible for more than 80% of the serum-induced response, and its deletion renders the bacteria completely serum sensitive. These results highlight the role of metabolic adaptation in virulence.
PMCID: PMC4145685  PMID: 25118243
2.  Holding a grudge 
RNA Biology  2013;10(5):900-906.
The CRISPR (clustered regularly interspaced short palindromic repeats)/Cas (CRISPR-associated) system of bacteria and archaea constitutes a mechanism of acquired adaptive immunity against phages, which is based on genome-encoded markers of previously infecting phage sequences (“spacers”). As a repository of phage sequences, these spacers make the system particularly suitable for elucidating phage-bacteria interactions in metagenomic studies. Recent metagenomic analyses of CRISPRs associated with the human microbiome intriguingly revealed conserved “memory spacers” shared by bacteria in multiple unrelated, geographically separated individuals. Here, we discuss possible avenues for explaining this phenomenon by integrating insights from CRISPR biology and phage-bacteria ecology, with a special focus on the human gut. We further explore the growing body of evidence for the role of CRISPR/Cas in regulating the interplay between bacteria and lysogenic phages, which may be intimately related to the presence of memory spacers and sheds new light on the multifaceted biological and ecological modes of action of CRISPR/Cas.
PMCID: PMC3737347  PMID: 23439321
CRISPR; human gut; human microbiome; phages; lysogeny; prophages
3.  Discovery of functional toxin/antitoxin systems in bacteria by shotgun cloning 
Molecular cell  2013;50(1):136-148.
Toxin-antitoxin (TA) modules, composed of a toxic protein and a counteracting antitoxin, play important roles in bacterial physiology. We examined the experimental insertion of 1.5 million genes from 388 microbial genomes into an Escherichia coli host using over 8.5 million random clones. This revealed hundreds of genes (toxins) that could only be cloned when the neighboring gene (antitoxin) was present on the same clone. Clustering of these genes revealed novel TA families widespread in bacterial genomes, some of which deviate from the classical characteristics previously described for such modules. Introduction of these genes into E. coli validated that the toxin toxicity is mitigated by the antitoxin. Infection experiments with T7 phage showed that two of the new modules can provide resistance against phage. Moreover, our experiments revealed an 'anti-defense' protein in phage T7 that neutralizes phage resistance. Our results expose active fronts in the arms race between bacteria and phage.
PMCID: PMC3644417  PMID: 23478446
4.  Correlated Occurrence and Bypass of Frame-Shifting Insertion-Deletions (InDels) to Give Functional Proteins 
PLoS Genetics  2013;9(10):e1003882.
Short insertions and deletions (InDels) comprise an important part of the natural mutational repertoire. InDels are, however, highly deleterious, primarily because two-thirds result in frame-shifts. Bypass through slippage over homonucleotide repeats by transcriptional and/or translational infidelity is known to occur sporadically. However, the overall frequency of bypass and its relation to sequence composition remain unclear. Intriguingly, the occurrence of InDels and the bypass of frame-shifts are mechanistically related - occurring through slippage over repeats by DNA or RNA polymerases, or by the ribosome, respectively. Here, we show that the frequency of frame-shifting InDels, and the frequency by which they are bypassed to give full-length, functional proteins, are indeed highly correlated. Using a laboratory genetic drift, we have exhaustively mapped all InDels that occurred within a single gene. We thus compared the naive InDel repertoire that results from DNA polymerase slippage to the frame-shifting InDels tolerated following selection to maintain protein function. We found that InDels repeatedly occurred, and were bypassed, within homonucleotide repeats of 3–8 bases. The longer the repeat, the higher was the frequency of InDels formation, and the more frequent was their bypass. Besides an expected 8A repeat, other types of repeats, including short ones, and G and C repeats, were bypassed. Although obtained in vitro, our results indicate a direct link between the genetic occurrence of InDels and their phenotypic rescue, thus suggesting a potential role for frame-shifting InDels as bridging evolutionary intermediates.
Author Summary
Homonucleotide repeats are exceptionally prone to insertions and/or deletions of bases (InDels). However, unless they occur in a multiplicity of 3 bases, InDels disrupt the reading frame and are thus expected to be purged from coding regions. Homonucleotide repeats, however, are also vulnerable to slippage by RNA polymerases and the ribosome. Using laboratory evolution techniques, we systematically mapped the occurrence of InDels within a given gene, before and after selection. Our data indicate that frame-shifting InDels were frequently bypassed to give functional proteins at surprisingly high frequencies. Further, we found a strict correlation between the repeat length, the frequency of occurrence of InDels at the DNA level, and the likelihood of bypass by transcriptional/translational slippage. Our results suggest that frame-shifting InDels might comprise functional evolutionary intermediates, and an effective mean of sequence divergence (e.g. when an adjacent InDel restores the frame, resulting in altered sequence and, potentially, in an altered protein structure).
PMCID: PMC3812077  PMID: 24204297
5.  PanDaTox 
Bioengineered  2012;3(4):218-221.
Metabolic engineering is often facilitated by cloning of genes encoding enzymes from various heterologous organisms into E. coli. Such engineering efforts are frequently hampered by foreign genes that are toxic to the E. coli host. We have developed PanDaTox (, a web-based resource that provides experimental toxicity information for more than 1.5 million genes from hundreds of different microbial genomes. The toxicity predictions, which were extensively experimentally verified, are based on serial cloning of genes into E. coli as part of the Sanger whole genome shotgun sequencing process. PanDaTox can accelerate metabolic engineering projects by allowing researchers to exclude toxic genes from the engineering plan and verify the clonability of selected genes before the actual metabolic engineering experiments are conducted.
PMCID: PMC3476872  PMID: 22705841
gene cloning; metabolic engineering; pandatox; synthetic biology; toxic genes
6.  Transcriptome-Wide Mapping of 5-methylcytidine RNA Modifications in Bacteria, Archaea, and Yeast Reveals m5C within Archaeal mRNAs 
PLoS Genetics  2013;9(6):e1003602.
The presence of 5-methylcytidine (m5C) in tRNA and rRNA molecules of a wide variety of organisms was first observed more than 40 years ago. However, detection of this modification was limited to specific, abundant, RNA species, due to the usage of low-throughput methods. To obtain a high resolution, systematic, and comprehensive transcriptome-wide overview of m5C across the three domains of life, we used bisulfite treatment on total RNA from both gram positive (B. subtilis) and gram negative (E. coli) bacteria, an archaeon (S. solfataricus) and a eukaryote (S. cerevisiae), followed by massively parallel sequencing. We were able to recover most previously documented m5C sites on rRNA in the four organisms, and identified several novel sites in yeast and archaeal rRNAs. Our analyses also allowed quantification of methylated m5C positions in 64 tRNAs in yeast and archaea, revealing stoichiometric differences between the methylation patterns of these organisms. Molecules of tRNAs in which m5C was absent were also discovered. Intriguingly, we detected m5C sites within archaeal mRNAs, and identified a consensus motif of AUCGANGU that directs methylation in S. solfataricus. Our results, which were validated using m5C-specific RNA immunoprecipitation, provide the first evidence for mRNA modifications in archaea, suggesting that this mode of post-transcriptional regulation extends beyond the eukaryotic domain.
Author Summary
Ribonucleic acids are universally used to express genetic information in the form of gene transcripts. Although we envision RNA as a mere copy of the DNA four-base code, modification of specific RNA bases can expand the information code. Such modifications are abundant in transfer RNA (tRNA) and ribosomal RNA (rRNA), where they contribute to translation fidelity and ribosome assembly. Recent studies in eukaryotes have shown that mRNA modifications such as RNA-editing (conversion of an adenosine base to inosine), N6-adenine methylation (m6A), and 5-methylcytidine (m5C) can change the coding sequence, alter splicing patterns, or change RNA stability. However, no mRNA modifications in bacteria or archaea have been documented to date. We have used an approach that enables mapping of the m5C modifications across all expressed genes in a given organism. Applying this approach on model bacterial, archaeal, and fungal microorganisms enabled us to reveal the modified RNA bases in these organisms, and to provide an accurate and sensitive map of these modifications. In archaea, we documented multiple genes whose mRNAs are subject to RNA modification, suggesting that similar to eukaryotes, these organisms may utilize mRNA modifications as a mechanism for gene regulation.
PMCID: PMC3694839  PMID: 23825970
7.  A Global Transcriptional Switch between the Attack and Growth Forms of Bdellovibrio bacteriovorus 
PLoS ONE  2013;8(4):e61850.
Bdellovibrio bacteriovorus is an obligate predator of bacteria ubiquitously found in the environment. Its life cycle is composed of two essential phases: a free-living, non-replicative, fast swimming attack phase (AP) wherein the predator searches for prey; and a non-motile, actively dividing growth phase (GP) in which it consumes the prey. The molecular regulatory mechanisms governing the switch between AP and GP are largely unknown. We used RNA-seq to generate a single-base-resolution map of the Bdellovibrio transcriptome in AP and GP, revealing a specific "AP" transcriptional program, which is largely mutually exclusive of the GP program. Based on the expression map, most genes in the Bdellovibrio genome are classified as "AP only" or "GP only". We experimentally generated a genome-wide map of 140 AP promoters, controlling the majority of AP-specific genes. This revealed a common sigma-like DNA binding site highly similar to the E. coli flagellar genes regulator sigma28 (FliA). Further analyses suggest that FliA has evolved to become a global AP regulator in Bdellovibrio. Our results also reveal a non-coding RNA that is massively expressed in AP. This ncRNA contains a c-di-GMP riboswitch. We suggest it functions as an intracellular reservoir for c-di-GMP, playing a role in the rapid switch from AP to GP.
PMCID: PMC3627812  PMID: 23613952
8.  The Single-Nucleotide Resolution Transcriptome of Pseudomonas aeruginosa Grown in Body Temperature 
PLoS Pathogens  2012;8(9):e1002945.
One of the hallmarks of opportunistic pathogens is their ability to adjust and respond to a wide range of environmental and host-associated conditions. The human pathogen Pseudomonas aeruginosa has an ability to thrive in a variety of hosts and cause a range of acute and chronic infections in individuals with impaired host defenses or cystic fibrosis. Here we report an in-depth transcriptional profiling of this organism when grown at host-related temperatures. Using RNA-seq of samples from P. aeruginosa grown at 28°C and 37°C we detected genes preferentially expressed at the body temperature of mammalian hosts, suggesting that they play a role during infection. These temperature-induced genes included the type III secretion system (T3SS) genes and effectors, as well as the genes responsible for phenazines biosynthesis. Using genome-wide transcription start site (TSS) mapping by RNA-seq we were able to accurately define the promoters and cis-acting RNA elements of many genes, and uncovered new genes and previously unrecognized non-coding RNAs directly controlled by the LasR quorum sensing regulator. Overall we identified 165 small RNAs and over 380 cis-antisense RNAs, some of which predicted to perform regulatory functions, and found that non-coding RNAs are preferentially localized in pathogenicity islands and horizontally transferred regions. Our work identifies regulatory features of P. aeruginosa genes whose products play a role in environmental adaption during infection and provides a reference transcriptional landscape for this pathogen.
Author Summary
Identifying coordinately regulated genes and their control by environmentally-initiated signal transduction pathways is important for understanding bacterial virulence mechanisms. The work reported here provides a comprehensive, high resolution, transcriptome map of the opportunistic pathogen Pseudomonas aeruginosa using RNA-seq. The results suggest that P. aeruginosa senses the temperature during the transition from its natural environment to a mammalian host, and this plays a key role in regulating the coordinated expression of several virulence factors. A large number of antisense transcripts and non-coding RNAs were identified, with preferential clustering in the regions acquired through horizontal gene transfer, suggesting that a part of the non-coding genome has a distinct evolutionary origin. We created an online data viewer, the Pseudomonas transcriptome browser, to facilitate access to the transcriptome data from this study as well as the subsequent results of work deposited by other investigators. The resources generated through our analyses provide a valuable tool to the P. aeruginosa research community and set the foundation for a systems biology approach towards understanding the complexity of the regulatory networks controlling the multiple lifestyles of this highly versatile organism.
PMCID: PMC3460626  PMID: 23028334
9.  Comparative transcriptomics of pathogenic and non-pathogenic Listeria species 
Comparative RNA-seq analysis of two related pathogenic and non-pathogenic bacterial strains reveals a hidden layer of divergence in the non-coding genome as well as conserved, widespread regulatory structures called ‘Excludons', which mediate regulation through long non-coding antisense RNAs.
Comparative transcriptome sequencing of two closely related bacterial strains reveals a hidden layer of divergence in the non-coding genome.Pathogen-specific non-coding RNAs, which might contribute to virulence, are revealed.The Listeria genome contains a class of unusually long antisense RNAs (lasRNAs) which spans divergent genes and repress expression of the genes located opposite to them while activating the other. The genetic organization of these lasRNAs and operon was named an excludon.The exhaustive transcriptome information from this publication is provided as an open resource with a web-accessible transcriptome browser.
Listeria monocytogenes is a human, food-borne pathogen. Genomic comparisons between L. monocytogenes and Listeria innocua, a closely related non-pathogenic species, were pivotal in the identification of protein-coding genes essential for virulence. However, no comprehensive comparison has focused on the non-coding genome. We used strand-specific cDNA sequencing to produce genome-wide transcription start site maps for both organisms, and developed a publicly available integrative browser to visualize and analyze both transcriptomes in different growth conditions and genetic backgrounds. Our data revealed conservation across most transcripts, but significant divergence between the species in a subset of non-coding RNAs. In L. monocytogenes, we identified 113 small RNAs (33 novel) and 70 antisense RNAs (53 novel), significantly increasing the repertoire of ncRNAs in this species. Remarkably, we identified a class of long antisense transcripts (lasRNAs) that overlap one gene while also serving as the 5′ UTR of the adjacent divergent gene. Experimental evidence suggests that lasRNAs transcription inhibits expression of one operon while activating the expression of another. Such a lasRNA/operon structure, that we named ‘excludon', might represent a novel form of regulation in bacteria.
PMCID: PMC3377988  PMID: 22617957
comparative genomics; Listeria monocytogenes; RNA-seq; transcriptome; TSS map
10.  The phage-host arms-race: Shaping the evolution of microbes 
Bioessays  2011;33(1):43-51.
Bacteria, the most abundant organisms on the planet, are outnumbered by a factor of 10 to 1 by phages that infect them. Faced with the rapid evolution and turnover of phage particles, bacteria have evolved various mechanisms to evade phage infection and killing, leading to an evolutionary arms-race. The extensive co-evolution of both phage and host has resulted in considerable diversity on the part of both bacterial and phage defensive and offensive strategies. Here, we discuss the unique and common features of phage resistance mechanisms and their role in global biodiversity. The commonalities between defense mechanisms suggest avenues for the discovery of novel such mechanisms based on their evolutionary traits.
PMCID: PMC3274958  PMID: 20979102
arms-race; phage; bacteria; evolution; resistance
11.  Transcriptome-wide discovery of circular RNAs in Archaea 
Nucleic Acids Research  2011;40(7):3131-3142.
Circular RNA forms had been described in all domains of life. Such RNAs were shown to have diverse biological functions, including roles in the life cycle of viral and viroid genomes, and in maturation of permuted tRNA genes. Despite their potentially important biological roles, discovery of circular RNAs has so far been mostly serendipitous. We have developed circRNA-seq, a combined experimental/computational approach that enriches for circular RNAs and allows profiling their prevalence in a whole-genome, unbiased manner. Application of this approach to the archaeon Sulfolobus solfataricus P2 revealed multiple circular transcripts, a subset of which was further validated independently. The identified circular RNAs included expected forms, such as excised tRNA introns and rRNA processing intermediates, but were also enriched with non-coding RNAs, including C/D box RNAs and RNase P, as well as circular RNAs of unknown function. Many of the identified circles were conserved in Sulfolobus acidocaldarius, further supporting their functional significance. Our results suggest that circular RNAs, and particularly circular non-coding RNAs, are more prevalent in archaea than previously recognized, and might have yet unidentified biological roles. Our study establishes a specific and sensitive approach for identification of circular RNAs using RNA-seq, and can readily be applied to other organisms.
PMCID: PMC3326292  PMID: 22140119
12.  RNA-seq analysis of small RNPs in Trypanosoma brucei reveals a rich repertoire of non-coding RNAs 
Nucleic Acids Research  2011;40(3):1282-1298.
The discovery of a plethora of small non-coding RNAs (ncRNAs) has fundamentally changed our understanding of how genes are regulated. In this study, we employed the power of deep sequencing of RNA (RNA-seq) to examine the repertoire of ncRNAs present in small ribonucleoprotein particles (RNPs) of Trypanosoma brucei, an important protozoan parasite. We identified new C/D and H/ACA small nucleolar RNAs (snoRNAs), as well as tens of putative novel non-coding RNAs; several of these are processed from trans-spliced and polyadenylated transcripts. The RNA-seq analysis provided information on the relative abundance of the RNAs, and their 5′- and 3′-termini. The study demonstrated that three highly abundant snoRNAs are involved in rRNA processing and highlight the unique trypanosome-specific repertoire of these RNAs. Novel RNAs were studied using in situ hybridization, association in RNP complexes, and ‘RNA walk’ to detect interaction with their target RNAs. Finally, we showed that the abundance of certain ncRNAs varies between the two stages of the parasite, suggesting that ncRNAs may contribute to gene regulation during the complex parasite’s life cycle. This is the first study to provide a whole-genome analysis of the large repertoire of small RNPs in trypanosomes.
PMCID: PMC3273796  PMID: 21976736
13.  Self-targeting by CRISPR: gene regulation or autoimmunity? 
Trends in genetics : TIG  2010;26(8):335-340.
CRISPR/Cas is a recently discovered prokaryotic immune system, which is based on small RNAs (“spacers”) that restrict phage and plasmid infection. It has been hypothesized that CRISPRs can also regulate self gene expression by utilizing spacers that target self genes. By analyzing CRISPRs from 330 organisms we found that one in every 250 spacers is self targeting, and that such self-targeting occurs in 18% of all CRISPR-bearing organisms. However, complete lack of conservation across species, combined with abundance of degraded repeats near self-targeting spacers, suggests that self-targeting is a consequence of autoimmunity rather than gene regulation. We propose that accidental incorporation of self nucleic-acids by CRISPR can incur an autoimmune fitness cost, which may explain the abundance of degraded CRISPR systems across prokaryotes.
PMCID: PMC2910793  PMID: 20598393
14.  Mutation Detection with Next-Generation Resequencing through a Mediator Genome 
PLoS ONE  2010;5(12):e15628.
The affordability of next generation sequencing (NGS) is transforming the field of mutation analysis in bacteria. The genetic basis for phenotype alteration can be identified directly by sequencing the entire genome of the mutant and comparing it to the wild-type (WT) genome, thus identifying acquired mutations. A major limitation for this approach is the need for an a-priori sequenced reference genome for the WT organism, as the short reads of most current NGS approaches usually prohibit de-novo genome assembly. To overcome this limitation we propose a general framework that utilizes the genome of relative organisms as mediators for comparing WT and mutant bacteria. Under this framework, both mutant and WT genomes are sequenced with NGS, and the short sequencing reads are mapped to the mediator genome. Variations between the mutant and the mediator that recur in the WT are ignored, thus pinpointing the differences between the mutant and the WT. To validate this approach we sequenced the genome of Bdellovibrio bacteriovorus 109J, an obligatory bacterial predator, and its prey-independent mutant, and compared both to the mediator species Bdellovibrio bacteriovorus HD100. Although the mutant and the mediator sequences differed in more than 28,000 nucleotide positions, our approach enabled pinpointing the single causative mutation. Experimental validation in 53 additional mutants further established the implicated gene. Our approach extends the applicability of NGS-based mutant analyses beyond the domain of available reference genomes.
PMCID: PMC3013116  PMID: 21209874
15.  Evolutionary conservation of sequence and secondary structures in CRISPR repeats 
Genome Biology  2007;8(4):R61.
The categorisation and structural analysis of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) sequences from 195 microbial genomes show that repeats from diverse organisms can be grouped based on sequence similarity, and that some groups have pronounced secondary structures with compensatory base changes.
Clustered regularly interspaced short palindromic repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in approximately 40% of bacterial and most archaeal genomes analyzed to date. More than 40 gene families, called CRISPR-associated sequences (CASs), appear in conjunction with these repeats and are thought to be involved in the propagation and functioning of CRISPRs. It has been recently shown that CRISPR provides acquired resistance against viruses in prokaryotes.
Here we analyze CRISPR repeats identified in 195 microbial genomes and show that they can be organized into multiple clusters based on sequence similarity. Some of the clusters present stable, highly conserved RNA secondary structures, while others lack detectable structures. Stable secondary structures exhibit multiple compensatory base changes in the stem region, indicating evolutionary and functional conservation.
We show that the repeat-based classification corresponds to, and expands upon, a previously reported CAS gene-based classification, including specific relationships between CRISPR and CAS subtypes.
PMCID: PMC1896005  PMID: 17442114
16.  RNA-editing-mediated exon evolution 
Genome Biology  2007;8(2):R29.
A primate-specific exon is found to be dependent on RNA editing for its exonization.
Alu retroelements are specific to primates and abundant in the human genome. Through mutations that create functional splice sites within intronic Alus, these elements can become new exons in a process denoted exonization. It was recently shown that Alu elements are also heavily changed by RNA editing in the human genome.
Here we show that the human nuclear prelamin A recognition factor contains a primate-specific Alu-exon that exclusively depends on RNA editing for its exonization. We demonstrate that RNA editing regulates the exonization in a tissue-dependent manner, through both the creation of a functional AG 3' splice site, and alteration of functional exonic splicing enhancers within the exon. Furthermore, a premature stop codon within the Alu-exon is eliminated by an exceptionally efficient RNA editing event. The sequence surrounding this editing site is important not only for editing of that site but also for editing in other neighboring sites as well.
Our results show that the abundant RNA editing of Alu sequences can be recruited as a mechanism supporting the birth of new exons in the human genome.
PMCID: PMC1852406  PMID: 17326827
17.  Assessing the number of ancestral alternatively spliced exons in the human genome 
BMC Genomics  2006;7:273.
It is estimated that between 35% and 74% of all human genes undergo alternative splicing. However, as a gene that undergoes alternative splicing can have between one and dozens of alternative exons, the number of alternatively spliced genes by itself is not informative enough. An additional parameter, which was not addressed so far, is therefore the number of human exons that undergo alternative splicing. We have previously described an accurate machine-learning method allowing the detection of conserved alternatively spliced exons without using ESTs, which relies on specific features of the exon and its genomic vicinity that distinguish alternatively spliced exons from constitutive ones.
In this study we use the above-described approach to calculate that 7.2% (± 1.1%) of all human exons that are conserved in mouse are alternatively spliced in both species.
This number is the first estimation for the extent of ancestral alternatively spliced exons in the human genome.
PMCID: PMC1635713  PMID: 17062157
18.  AluGene: a database of Alu elements incorporated within protein-coding genes 
Nucleic Acids Research  2004;32(Database issue):D489-D492.
Alu elements are short interspersed elements (SINEs) ∼300 nucleotides in length. More than 1 million Alus are found in the human genome. Despite their being genetically functionless, recent findings suggest that Alu elements may have a broad evolutionary impact by affecting gene structures, protein sequences, splicing motifs and expression patterns. Because of these effects, compiling a genomic database of Alu sequences that reside within protein-coding genes seemed a useful enterprise. Presently, such data are limited since the structural and positional information on genes and Alu sequences are scattered throughout incompatible and unconnected databases. AluGene ( provides easy access to a complete Alu map of the human genome, as well as Alu-associated information. The Alu elements are annotated with respect to coding region and exon/intron location. This design facilitates queries on Alu sequences, locations, as well as motifs and compositional properties via a one-stop search page.
PMCID: PMC308866  PMID: 14681464
19.  A novel algorithm for computational identification of contaminated EST libraries 
Nucleic Acids Research  2003;31(3):1067-1074.
A key goal of the Human Genome Project was to understand the complete set of human proteins, the proteome. Since the genome sequence by itself is not sufficient for predicting new genes and alternative splicing events that lead to new proteins, expressed sequence tags (ESTs) are used as the primary tool for these purposes. The high prevalence of artifacts in dbEST, however, often leads to invalid predictions. Here we describe a novel method for recognizing genomic DNA contamination and other artifacts that cannot be identified using current EST cleaning techniques. Our method uses the alignment of the entire set of ESTs to the human genome to identify highly contaminated EST libraries. We discovered 53 highly contaminated libraries and a subset of 24 766 ESTs from these libraries that probably represent contamination with genomic DNA, pre-mRNA, and ESTs that span non-canonical introns. Although this is only a small fraction of the entire EST dataset, each contaminating sequence could create a spurious transcript prediction. Indeed, in the clustering and assembly tool that we used, these sequences would have caused incorrect inference of 9575 new splice variants and 6370 new genes. Conclusions based on EST analysis, including prediction of alternative splicing, should be re-evaluated in light of these results. Our method, along with the identified set of contaminated sequences, will be essential for applications that depend on large EST datasets.
PMCID: PMC149192  PMID: 12560505

Results 1-19 (19)