PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (111)
 

Clipboard (0)
None

Select a Filter Below

Year of Publication
more »
1.  DIAMUND: Direct Comparison of Genomes to Detect Mutations 
Human Mutation  2013;35(3):283-288.
DNA sequencing has become a powerful method to discover the genetic basis of disease. Standard, widely used protocols for analysis usually begin by comparing each individual to the human reference genome. When applied to a set of related individuals, this approach reveals millions of differences, most of which are shared among the individuals and unrelated to the disease being investigated. We have developed a novel algorithm for variant detection, one that compares DNA sequences directly to one another, without aligning them to the reference genome. When used to find de novo mutations in exome sequences from family trios, or to compare normal and diseased samples from the same individual, the new method, direct alignment for mutation discovery (DIAMUND), produces a dramatically smaller list of candidate mutations than previous methods, without losing sensitivity to detect the true cause of a genetic disease. We demonstrate our results on several example cases, including two family trios in which it correctly found the disease-causing variant while excluding thousands of harmless variants that standard methods had identified.
doi:10.1002/humu.22503
PMCID: PMC4031744  PMID: 24375697
variant detection; computational biology; bioinformatics; exome sequencing; sequence alignment
2.  Genomic Features of a Bumble Bee Symbiont Reflect Its Host Environment 
Applied and Environmental Microbiology  2014;80(13):3793-3803.
Here, we report the genome of one gammaproteobacterial member of the gut microbiota, for which we propose the name “Candidatus Schmidhempelia bombi,” that was inadvertently sequenced alongside the genome of its host, the bumble bee, Bombus impatiens. This symbiont is a member of the recently described bacterial order Orbales, which has been collected from the guts of diverse insect species; however, “Ca. Schmidhempelia” has been identified exclusively with bumble bees. Metabolic reconstruction reveals that “Ca. Schmidhempelia” lacks many genes for a functioning NADH dehydrogenase I, all genes for the high-oxygen cytochrome o, and most genes in the tricarboxylic acid (TCA) cycle. “Ca. Schmidhempelia” has retained NADH dehydrogenase II, the low-oxygen specific cytochrome bd, anaerobic nitrate respiration, mixed-acid fermentation pathways, and citrate fermentation, which may be important for survival in low-oxygen or anaerobic environments found in the bee hindgut. Additionally, a type 6 secretion system, a Flp pilus, and many antibiotic/multidrug transporters suggest complex interactions with its host and other gut commensals or pathogens. This genome has signatures of reduction (2.0 megabase pairs) and rearrangement, as previously observed for genomes of host-associated bacteria. A survey of wild and laboratory B. impatiens revealed that “Ca. Schmidhempelia” is present in 90% of individuals and, therefore, may provide benefits to its host.
doi:10.1128/AEM.00322-14
PMCID: PMC4054214  PMID: 24747890
3.  Unexpected cross-species contamination in genome sequencing projects 
PeerJ  2014;2:e675.
The raw data from a genome sequencing project sometimes contains DNA from contaminating organisms, which may be introduced during sample collection or sequence preparation. In some instances, these contaminants remain in the sequence even after assembly and deposition of the genome into public databases. As a result, searches of these databases may yield erroneous and confusing results. We used efficient microbiome analysis software to scan the draft assembly of domestic cow, Bos taurus, and identify 173 small contigs that appeared to derive from microbial contaminants. In the course of verifying these findings, we discovered that one genome, Neisseria gonorrhoeae TCDC-NG08107, although putatively a complete genome, contained multiple sequences that actually derived from the cow and sheep genomes. Our findings illustrate the need to carefully validate findings of anomalous DNA that rely on comparisons to either draft or finished genomes.
doi:10.7717/peerj.675
PMCID: PMC4243333  PMID: 25426337
Genomics; Bioinformatics; Genome assembly; Microbiome; Sequence analysis; DNA sequencing
4.  The MaSuRCA genome assembler 
Bioinformatics  2013;29(21):2669-2677.
Motivation: Second-generation sequencing technologies produce high coverage of the genome by short reads at a low cost, which has prompted development of new assembly methods. In particular, multiple algorithms based on de Bruijn graphs have been shown to be effective for the assembly problem. In this article, we describe a new hybrid approach that has the computational efficiency of de Bruijn graph methods and the flexibility of overlap-based assembly strategies, and which allows variable read lengths while tolerating a significant level of sequencing error. Our method transforms large numbers of paired-end reads into a much smaller number of longer ‘super-reads’. The use of super-reads allows us to assemble combinations of Illumina reads of differing lengths together with longer reads from 454 and Sanger sequencing technologies, making it one of the few assemblers capable of handling such mixtures. We call our system the Maryland Super-Read Celera Assembler (abbreviated MaSuRCA and pronounced ‘mazurka’).
Results: We evaluate the performance of MaSuRCA against two of the most widely used assemblers for Illumina data, Allpaths-LG and SOAPdenovo2, on two datasets from organisms for which high-quality assemblies are available: the bacterium Rhodobacter sphaeroides and chromosome 16 of the mouse genome. We show that MaSuRCA performs on par or better than Allpaths-LG and significantly better than SOAPdenovo on these data, when evaluated against the finished sequence. We then show that MaSuRCA can significantly improve its assemblies when the original data are augmented with long reads.
Availability: MaSuRCA is available as open-source code at ftp://ftp.genome.umd.edu/pub/MaSuRCA/. Previous (pre-publication) releases have been publicly available for over a year.
Contact: alekseyz@ipst.umd.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt476
PMCID: PMC3799473  PMID: 23990416
5.  A new rhesus macaque assembly and annotation for next-generation sequencing analyses 
Biology Direct  2014;9:20.
Background
The rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses.
Results
We report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies.
Conclusions
The MacaM assembly and annotation files provide a substantially more complete and accurate representation of the rhesus macaque genome than rheMac2 or CR_1.0 and will serve as an important resource for investigators conducting next-generation sequencing studies with nonhuman primates.
Reviewers
This article was reviewed by Dr. Lutz Walter, Dr. Soojin Yi and Dr. Kateryna Makova.
doi:10.1186/1745-6150-9-20
PMCID: PMC4214606  PMID: 25319552
Macaca mulatta; Rhesus macaque; Genome; Assembly; Annotation; Transcriptome; Next-generation sequencing
6.  Genome-Guided Transcriptome Assembly in the Age of Next-Generation Sequencing 
Next-generation sequencing technologies provide unprecedented power to explore the repertoire of genes and their alternative splice variants, collectively defining the transcriptome of a species in great detail. However, assembling the short reads into full-length gene and transcript models presents significant computational challenges. We review current algorithms for assembling transcripts and genes from next-generation sequencing reads aligned to a reference genome, and lay out areas for future improvements.
PMCID: PMC4086730  PMID: 24524156
Algorithms; biology and genetics; computer applications; medicine and science
7.  Open access to tree genomes: the path to a better forest 
Genome Biology  2013;14(6):120.
An open-access culture and a well-developed comparative-genomics infrastructure must be developed in forest trees to derive the full potential of genome sequencing in this diverse group of plants that are the dominant species in much of the earth's terrestrial ecosystems.
doi:10.1186/gb-2013-14-6-120
PMCID: PMC3706761  PMID: 23796049
Forest tree genome; Open access; Sequencing; Genomics; Database
8.  Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies 
Genome Biology  2014;15(3):R59.
Background
The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination.
Results
We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome.
Conclusions
In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied.
doi:10.1186/gb-2014-15-3-r59
PMCID: PMC4053751  PMID: 24647006
9.  Kraken: ultrafast metagenomic sequence classification using exact alignments 
Genome Biology  2014;15(3):R46.
Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Previous programs designed for this task have been relatively slow and computationally expensive, forcing researchers to use faster abundance estimation programs, which only classify small subsets of metagenomic data. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. In its fastest mode, Kraken classifies 100 base pair reads at a rate of over 4.1 million reads per minute, 909 times faster than Megablast and 11 times faster than the abundance estimation program MetaPhlAn. Kraken is available at http://ccb.jhu.edu/software/kraken/.
doi:10.1186/gb-2014-15-3-r46
PMCID: PMC4053813  PMID: 24580807
metagenomics; sequence classification; sequence alignment; next-generation sequencing; microbiome
10.  Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies 
Briefings in Bioinformatics  2011;14(2):213-224.
Since its launch in 2004, the open-source AMOS project has released several innovative DNA sequence analysis applications including: Hawkeye, a visual analytics tool for inspecting the structure of genome assemblies; the Assembly Forensics and FRCurve pipelines for systematically evaluating the quality of a genome assembly; and AMOScmp, the first comparative genome assembler. These applications have been used to assemble and analyze dozens of genomes ranging in complexity from simple microbial species through mammalian genomes. Recent efforts have been focused on enhancing support for new data characteristics brought on by second- and now third-generation sequencing. This review describes the major components of AMOS in light of these challenges, with an emphasis on methods for assessing assembly quality and the visual analytics capabilities of Hawkeye. These interactive graphical aspects are essential for navigating and understanding the complexities of a genome assembly, from the overall genome structure down to individual bases. Hawkeye and AMOS are available open source at http://amos.sourceforge.net.
doi:10.1093/bib/bbr074
PMCID: PMC3603210  PMID: 22199379
DNA Sequencing; genome assembly; assembly forensics; visual analytics
11.  Genome sequence of the human malaria parasite Plasmodium falciparum 
Nature  2002;419(6906):10.1038/nature01097.
The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host–parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.
doi:10.1038/nature01097
PMCID: PMC3836256  PMID: 12368864
12.  Thousands of exon skipping events differentiate among splicing patterns in sixteen human tissues 
F1000Research  2013;2:188.
Alternative splicing is widely recognized for its roles in regulating genes and creating gene diversity. However, despite many efforts, the repertoire of gene splicing variation is still incompletely characterized, even in humans. Here we describe a new computational system, ASprofile, and its application to RNA-seq data from Illumina’s Human Body Map project (>2.5 billion reads).  Using the system, we identified putative alternative splicing events in 16 different human tissues, which provide a dynamic picture of splicing variation across the tissues. We detected 26,989 potential exon skipping events representing differences in splicing patterns among the tissues. A large proportion of the events (>60%) were novel, involving new exons (~3000), new introns (~16000), or both. When tracing these events across the sixteen tissues, only a small number (4-7%) appeared to be differentially expressed (‘switched’) between two tissues, while 30-45% showed little variation, and the remaining 50-65% were not present in one or both tissues compared.  Novel exon skipping events appeared to be slightly less variable than known events, but were more tissue-specific. Our study represents the first effort to build a comprehensive catalog of alternative splicing in normal human tissues from RNA-seq data, while providing insights into the role of alternative splicing in shaping tissue transcriptome differences. The catalog of events and the ASprofile software are freely available from the Zenodo repository
( http://zenodo.org/record/7068; doi: 10.5281/zenodo.7068) and from our web site http://ccb.jhu.edu/software/ASprofile.
doi:10.12688/f1000research.2-188.v2
PMCID: PMC3892928  PMID: 24555089
13.  Thousands of exon skipping events differentiate among splicing patterns in sixteen human tissues 
F1000Research  2013;2:188.
Alternative splicing is widely recognized for its roles in regulating genes and creating gene diversity. However, despite many efforts, the repertoire of gene splicing variation is still incompletely characterized, even in humans. Here we describe a new computational system, ASprofile, and its application to RNA-seq data from Illumina’s Human Body Map project (>2.5 billion reads).  Using the system, we identified putative alternative splicing events in 16 different human tissues, which provide a dynamic picture of splicing variation across the tissues. We detected 26,989 potential exon skipping events representing differences in splicing patterns among the tissues. A large proportion of the events (>60%) were novel, involving new exons (~3000), new introns (~16000), or both. When tracing these events across the sixteen tissues, only a small number (4-7%) appeared to be differentially expressed (‘switched’) between two tissues, while 30-45% showed little variation, and the remaining 50-65% were not present in one or both tissues compared.  Novel exon skipping events appeared to be slightly less variable than known events, but were more tissue-specific. Our study represents the first effort to build a comprehensive catalog of alternative splicing in normal human tissues from RNA-seq data, while providing insights into the role of alternative splicing in shaping tissue transcriptome differences. The catalog of events and the ASprofile software are freely available from the Zenodo repository
( http://zenodo.org/record/7068; doi: 10.5281/zenodo.7068) and from our web site http://ccb.jhu.edu/software/ASprofile.
doi:10.12688/f1000research.2-188.v1
PMCID: PMC3892928  PMID: 24555089
14.  Insights into the Loblolly Pine Genome: Characterization of BAC and Fosmid Sequences 
PLoS ONE  2013;8(9):e72439.
Despite their prevalence and importance, the genome sequences of loblolly pine, Norway spruce, and white spruce, three ecologically and economically important conifer species, are just becoming available to the research community. Following the completion of these large assemblies, annotation efforts will be undertaken to characterize the reference sequences. Accurate annotation of these ancient genomes would be aided by a comprehensive repeat library; however, few studies have generated enough sequence to fully evaluate and catalog their non-genic content. In this paper, two sets of loblolly pine genomic sequence, 103 previously assembled BACs and 90,954 newly sequenced and assembled fosmid scaffolds, were analyzed. Together, this sequence represents 280 Mbp (roughly 1% of the loblolly pine genome) and one of the most comprehensive studies of repetitive elements and genes in a gymnosperm species. A combination of homology and de novo methodologies were applied to identify both conserved and novel repeats. Similarity analysis estimated a repetitive content of 27% that included both full and partial elements. When combined with the de novo investigation, the estimate increased to almost 86%. Over 60% of the repetitive sequence consists of full or partial LTR (long terminal repeat) retrotransposons. Through de novo approaches, 6,270 novel, full-length transposable element families and 9,415 sub-families were identified. Among those 6,270 families, 82% were annotated as single-copy. Several of the novel, high-copy families are described here, with the largest, PtPiedmont, comprising 133 full-length copies. In addition to repeats, analysis of the coding region reported 23 full-length eukaryotic orthologous proteins (KOGS) and another 29 novel or orthologous genes. These discoveries, along with other genomic resources, will be used to annotate conifer genomes and address long-standing questions about gymnosperm evolution.
doi:10.1371/journal.pone.0072439
PMCID: PMC3762812  PMID: 24023741
15.  The COMBREX Project: Design, Methodology, and Initial Results 
Anton, Brian P. | Chang, Yi-Chien | Brown, Peter | Choi, Han-Pil | Faller, Lina L. | Guleria, Jyotsna | Hu, Zhenjun | Klitgord, Niels | Levy-Moonshine, Ami | Maksad, Almaz | Mazumdar, Varun | McGettrick, Mark | Osmani, Lais | Pokrzywa, Revonda | Rachlin, John | Swaminathan, Rajeswari | Allen, Benjamin | Housman, Genevieve | Monahan, Caitlin | Rochussen, Krista | Tao, Kevin | Bhagwat, Ashok S. | Brenner, Steven E. | Columbus, Linda | de Crécy-Lagard, Valérie | Ferguson, Donald | Fomenkov, Alexey | Gadda, Giovanni | Morgan, Richard D. | Osterman, Andrei L. | Rodionov, Dmitry A. | Rodionova, Irina A. | Rudd, Kenneth E. | Söll, Dieter | Spain, James | Xu, Shuang-yong | Bateman, Alex | Blumenthal, Robert M. | Bollinger, J. Martin | Chang, Woo-Suk | Ferrer, Manuel | Friedberg, Iddo | Galperin, Michael Y. | Gobeill, Julien | Haft, Daniel | Hunt, John | Karp, Peter | Klimke, William | Krebs, Carsten | Macelis, Dana | Madupu, Ramana | Martin, Maria J. | Miller, Jeffrey H. | O'Donovan, Claire | Palsson, Bernhard | Ruch, Patrick | Setterdahl, Aaron | Sutton, Granger | Tate, John | Yakunin, Alexander | Tchigvintsev, Dmitri | Plata, Germán | Hu, Jie | Greiner, Russell | Horn, David | Sjölander, Kimmen | Salzberg, Steven L. | Vitkup, Dennis | Letovsky, Stanley | Segrè, Daniel | DeLisi, Charles | Roberts, Richard J. | Steffen, Martin | Kasif, Simon
PLoS Biology  2013;11(8):e1001638.
Experimental data exists for only a vanishingly small fraction of sequenced microbial genes. This community page discusses the progress made by the COMBREX project to address this important issue using both computational and experimental resources.
doi:10.1371/journal.pbio.1001638
PMCID: PMC3754883  PMID: 24013487
17.  Genome Analysis Linking Recent European and African Influenza (H5N1) Viruses 
Emerging infectious diseases  2007;13(5):713-718.
To better understand the ecology and epidemiology of the highly pathogenic avian influenza virus in its transcontinental spread, we sequenced and analyzed the complete genomes of 36 recent influenza A (H5N1) viruses collected from birds in Europe, northern Africa, and southeastern Asia. These sequences, among the first complete genomes of influenza (H5N1) viruses outside Asia, clearly depict the lineages now infecting wild and domestic birds in Europe and Africa and show the relationships among these isolates and other strains affecting both birds and humans. The isolates fall into 3 distinct lineages, 1 of which contains all known non-Asian isolates. This new Euro-African lineage, which was the cause of several recent (2006) fatal human infections in Egypt and Iraq, has been introduced at least 3 times into the European-African region and has split into 3 distinct, independently evolving sublineages. One isolate provides evidence that 2 of these sublineages have recently reassorted.
PMCID: PMC2432181  PMID: 17553249
18.  Genome Analysis Linking Recent European and African Influenza (H5N1) Viruses 
Emerging Infectious Diseases  2007;13(5):713-718.
Although linked, these viruses are distinct from earlier outbreak strains.
To better understand the ecology and epidemiology of the highly pathogenic avian influenza virus in its transcontinental spread, we sequenced and analyzed the complete genomes of 36 recent influenza A (H5N1) viruses collected from birds in Europe, northern Africa, and southeastern Asia. These sequences, among the first complete genomes of influenza (H5N1) viruses outside Asia, clearly depict the lineages now infecting wild and domestic birds in Europe and Africa and show the relationships among these isolates and other strains affecting both birds and humans. The isolates fall into 3 distinct lineages, 1 of which contains all known non-Asian isolates. This new Euro-African lineage, which was the cause of several recent (2006) fatal human infections in Egypt and Iraq, has been introduced at least 3 times into the European-African region and has split into 3 distinct, independently evolving sublineages. One isolate provides evidence that 2 of these sublineages have recently reassorted.
doi:10.3201/eid1305.070013
PMCID: PMC2432181  PMID: 17553249
Influenza A virus; genomics; sequence analysis; DNA; evolution; molecular; research
19.  GAGE-B: an evaluation of genome assemblers for bacterial organisms 
Bioinformatics  2013;29(14):1718-1725.
Motivation: A large and rapidly growing number of bacterial organisms have been sequenced by the newest sequencing technologies. Cheaper and faster sequencing technologies make it easy to generate very high coverage of bacterial genomes, but these advances mean that DNA preparation costs can exceed the cost of sequencing for small genomes. The need to contain costs often results in the creation of only a single sequencing library, which in turn introduces new challenges for genome assembly methods.
Results: We evaluated the ability of multiple genome assembly programs to assemble bacterial genomes from a single, deep-coverage library. For our comparison, we chose bacterial species spanning a wide range of GC content and measured the contiguity and accuracy of the resulting assemblies. We compared the assemblies produced by this very high-coverage, one-library strategy to the best assemblies created by two-library sequencing, and we found that remarkably good bacterial assemblies are possible with just one library. We also measured the effect of read length and depth of coverage on assembly quality and determined the values that provide the best results with current algorithms.
Contact: salzberg@jhu.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt273
PMCID: PMC3702249  PMID: 23665771
20.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions 
Genome Biology  2013;14(4):R36.
TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat.
doi:10.1186/gb-2013-14-4-r36
PMCID: PMC4053844  PMID: 23618408
21.  Fast gapped-read alignment with Bowtie 2 
Nature methods  2012;9(4):357-359.
As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
doi:10.1038/nmeth.1923
PMCID: PMC3322381  PMID: 22388286
22.  Sequestration: inadvertently killing biomedical research to score political points 
Genome Biology  2013;14(3):109.
doi:10.1186/gb-2013-14-3-109
PMCID: PMC3663088  PMID: 23651812
23.  EDGE-pro: Estimated Degree of Gene Expression in Prokaryotic Genomes 
Background
The expression levels of bacterial genes can be measured directly using next-generation sequencing (NGS) methods, offering much greater sensitivity and accuracy than earlier, microarray-based methods. Most bioinformatics software for estimating levels of gene expression from NGS data has been designed for eukaryotic genomes, with algorithms focusing particularly on detection of splicing patterns. These methods do not perform well on bacterial genomes.
Results
Here we describe the first software system designed explicitly for quantifying the degree of gene expression in bacteria and other prokaryotes. EDGE-pro (Estimated Degree of Gene Expression in PROkaryotes) processes the raw data from an RNA-seq experiment on a bacterial or archaeal species and produces estimates of the expression levels for each gene in these gene-dense genomes.
Software
The EDGE-pro tool is implemented as a pipeline of C++ and Perl programs and is freely available as open-source code at http://www.genomics.jhu.edu/software/EDGE/index.shtml.
doi:10.4137/EBO.S11250
PMCID: PMC3603529  PMID: 23531787
RNA-seq; bacteria; prokaryotes; gene expression
24.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks 
Nature Protocols  2012;7(3):562-578.
Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ~1 h of hands-on time.
doi:10.1038/nprot.2012.016
PMCID: PMC3334321  PMID: 22383036
25.  Butterfly genome reveals promiscuous exchange of mimicry adaptations among species 
Dasmahapatra, Kanchon K | Walters, James R. | Briscoe, Adriana D. | Davey, John W. | Whibley, Annabel | Nadeau, Nicola J. | Zimin, Aleksey V. | Hughes, Daniel S. T. | Ferguson, Laura C. | Martin, Simon H. | Salazar, Camilo | Lewis, James J. | Adler, Sebastian | Ahn, Seung-Joon | Baker, Dean A. | Baxter, Simon W. | Chamberlain, Nicola L. | Chauhan, Ritika | Counterman, Brian A. | Dalmay, Tamas | Gilbert, Lawrence E. | Gordon, Karl | Heckel, David G. | Hines, Heather M. | Hoff, Katharina J. | Holland, Peter W.H. | Jacquin-Joly, Emmanuelle | Jiggins, Francis M. | Jones, Robert T. | Kapan, Durrell D. | Kersey, Paul | Lamas, Gerardo | Lawson, Daniel | Mapleson, Daniel | Maroja, Luana S. | Martin, Arnaud | Moxon, Simon | Palmer, William J. | Papa, Riccardo | Papanicolaou, Alexie | Pauchet, Yannick | Ray, David A. | Rosser, Neil | Salzberg, Steven L. | Supple, Megan A. | Surridge, Alison | Tenger-Trolander, Ayse | Vogel, Heiko | Wilkinson, Paul A. | Wilson, Derek | Yorke, James A. | Yuan, Furong | Balmuth, Alexi L. | Eland, Cathlene | Gharbi, Karim | Thomson, Marian | Gibbs, Richard A. | Han, Yi | Jayaseelan, Joy C. | Kovar, Christie | Mathew, Tittu | Muzny, Donna M. | Ongeri, Fiona | Pu, Ling-Ling | Qu, Jiaxin | Thornton, Rebecca L. | Worley, Kim C. | Wu, Yuan-Qing | Linares, Mauricio | Blaxter, Mark L. | Constant, Richard H. ffrench | Joron, Mathieu | Kronforst, Marcus R. | Mullen, Sean P. | Reed, Robert D. | Scherer, Steven E. | Richards, Stephen | Mallet, James | McMillan, W. Owen | Jiggins, Chris D.
Nature  2012;487(7405):94-98.
The evolutionary importance of hybridization and introgression has long been debated1. We used genomic tools to investigate introgression in Heliconius, a rapidly radiating genus of neotropical butterflies widely used in studies of ecology, behaviour, mimicry and speciation2-5 . We sequenced the genome of Heliconius melpomene and compared it with other taxa to investigate chromosomal evolution in Lepidoptera and gene flow among multiple Heliconius species and races. Among 12,657 predicted genes for Heliconius, biologically important expansions of families of chemosensory and Hox genes are particularly noteworthy. Chromosomal organisation has remained broadly conserved since the Cretaceous, when butterflies split from the silkmoth lineage. Using genomic resequencing, we show hybrid exchange of genes between three co-mimics, H. melpomene, H. timareta, and H. elevatus, especially at two genomic regions that control mimicry pattern. Closely related Heliconius species clearly exchange protective colour pattern genes promiscuously, implying a major role for hybridization in adaptive radiation.
doi:10.1038/nature11041
PMCID: PMC3398145  PMID: 22722851

Results 1-25 (111)