PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-11 (11)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Assessing De Novo transcriptome assembly metrics for consistency and utility 
BMC Genomics  2013;14:465.
Background
Transcriptome sequencing and assembly represent a great resource for the study of non-model species, and many metrics have been used to evaluate and compare these assemblies. Unfortunately, it is still unclear which of these metrics accurately reflect assembly quality.
Results
We simulated sequencing transcripts of Drosophila melanogaster. By assembling these simulated reads using both a “perfect” and a modern transcriptome assembler while varying read length and sequencing depth, we evaluated quality metrics to determine whether they 1) revealed perfect assemblies to be of higher quality, and 2) revealed perfect assemblies to be more complete as data quantity increased.
Several commonly used metrics were not consistent with these expectations, including average contig coverage and length, though they became consistent when singletons were included in the analysis. We found several annotation-based metrics to be consistent and informative, including contig reciprocal best hit count and contig unique annotation count. Finally, we evaluated a number of novel metrics such as reverse annotation count, contig collapse factor, and the ortholog hit ratio, discovering that each assess assembly quality in unique ways.
Conclusions
Although much attention has been given to transcriptome assembly, little research has focused on determining how best to evaluate assemblies, particularly in light of the variety of options available for read length and sequencing depth. Our results provide an important review of these metrics and give researchers tools to produce the highest quality transcriptome assemblies.
doi:10.1186/1471-2164-14-465
PMCID: PMC3733778  PMID: 23837739
2.  The Evolution of the Anopheles 16 Genomes Project 
G3: Genes|Genomes|Genetics  2013;3(7):1191-1194.
We report the imminent completion of a set of reference genome assemblies for 16 species of Anopheles mosquitoes. In addition to providing a generally useful resource for comparative genomic analyses, these genome sequences will greatly facilitate exploration of the capacity exhibited by some Anopheline mosquito species to serve as vectors for malaria parasites. A community analysis project will commence soon to perform a thorough comparative genomic investigation of these newly sequenced genomes. Completion of this project via the use of short next-generation sequence reads required innovation in both the bioinformatic and laboratory realms, and the resulting knowledge gained could prove useful for genome sequencing projects targeting other unconventional genomes.
doi:10.1534/g3.113.006247
PMCID: PMC3704246  PMID: 23708298
comparative; assembly; vector; malaria; collaboration
3.  Haplotype and minimum-chimerism consensus determination using short sequence data 
BMC Genomics  2012;13(Suppl 2):S4.
Background
Assembling haplotypes given sequence data derived from a single individual is a well studied problem, but only recently has haplotype assembly been considered for population-sampled data. We discuss a software tool called Hapler, which is designed specifically for low-diversity, low-coverage data such as ecological samples derived from natural populations. Because such data may contain error as well as ambiguous haplotype information, we developed methods that increase confidence in these assemblies. Hapler also reconstructs full consensus sequences while minimizing and identifying possible chimeric points.
Results
Experiments on simulated data indicate that Hapler is effective at assembling haplotypes from gene-sized alignments of short reads. Further, in our tests Hapler-generated consensus sequences are less chimeric than the alternative consensus approaches of majority vote and viral quasispecies estimation regardless of error rate, read length, or population haplotype bias.
Conclusions
The analysis of genetically diverse sequence data is increasingly common, particularly in the field of ecoinformatics where transcriptome sequencing of natural populations is a cost effective alternative to genome sequencing. For such studies, it is important to consider and identify haplotype diversity. Hapler provides robust haplotype information and identifies possible phasing errors in consensus sequences, providing valuable information for population studies and downstream usage of resulting assemblies.
doi:10.1186/1471-2164-13-S2-S4
PMCID: PMC3394418  PMID: 22537299
4.  VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics 
Nucleic Acids Research  2011;40(D1):D729-D734.
VectorBase (http://www.vectorbase.org) is a NIAID-supported bioinformatics resource for invertebrate vectors of human pathogens. It hosts data for nine genomes: mosquitoes (three Anopheles gambiae genomes, Aedes aegypti and Culex quinquefasciatus), tick (Ixodes scapularis), body louse (Pediculus humanus), kissing bug (Rhodnius prolixus) and tsetse fly (Glossina morsitans). Hosted data range from genomic features and expression data to population genetics and ontologies. We describe improvements and integration of new data that expand our taxonomic coverage. Releases are bi-monthly and include the delivery of preliminary data for emerging genomes. Frequent updates of the genome browser provide VectorBase users with increasing options for visualizing their own high-throughput data. One major development is a new population biology resource for storing genomic variations, insecticide resistance data and their associated metadata. It takes advantage of improved ontologies and controlled vocabularies. Combined, these new features ensure timely release of multiple types of data in the public domain while helping overcome the bottlenecks of bioinformatics and annotation by engaging with our user community.
doi:10.1093/nar/gkr1089
PMCID: PMC3245112  PMID: 22135296
5.  High-throughput 454 resequencing for allele discovery and recombination mapping in Plasmodium falciparum 
BMC Genomics  2011;12:116.
Background
Knowledge of the origins, distribution, and inheritance of variation in the malaria parasite (Plasmodium falciparum) genome is crucial for understanding its evolution; however the 81% (A+T) genome poses challenges to high-throughput sequencing technologies. We explore the viability of the Roche 454 Genome Sequencer FLX (GS FLX) high throughput sequencing technology for both whole genome sequencing and fine-resolution characterization of genetic exchange in malaria parasites.
Results
We present a scheme to survey recombination in the haploid stage genomes of two sibling parasite clones, using whole genome pyrosequencing that includes a sliding window approach to predict recombination breakpoints. Whole genome shotgun (WGS) sequencing generated approximately 2 million reads, with an average read length of approximately 300 bp. De novo assembly using a combination of WGS and 3 kb paired end libraries resulted in contigs ≤ 34 kb. More than 8,000 of the 24,599 SNP markers identified between parents were genotyped in the progeny, resulting in a marker density of approximately 1 marker/3.3 kb and allowing for the detection of previously unrecognized crossovers (COs) and many non crossover (NCO) gene conversions throughout the genome.
Conclusions
By sequencing the 23 Mb genomes of two haploid progeny clones derived from a genetic cross at more than 30× coverage, we captured high resolution information on COs, NCOs and genetic variation within the progeny genomes. This study is the first to resequence progeny clones to examine fine structure of COs and NCOs in malaria parasites.
doi:10.1186/1471-2164-12-116
PMCID: PMC3055840  PMID: 21324207
6.  Breakpoint structure of the Anopheles gambiae 2Rb chromosomal inversion 
Malaria Journal  2010;9:293.
Background
Alternative arrangements of chromosome 2 inversions in Anopheles gambiae are important sources of population structure, and are associated with adaptation to environmental heterogeneity. The forces responsible for their origin and maintenance are incompletely understood. Molecular characterization of inversion breakpoints provides insight into how they arose, and provides the basis for development of molecular karyotyping methods useful in future studies.
Methods
Sequence comparison of regions near the cytological breakpoints of 2Rb allowed the molecular delineation of breakpoint boundaries. Comparisons were made between the standard 2R+b arrangement in the An. gambiae PEST reference genome and the inverted 2Rb arrangements in the An. gambiae M and S genome assemblies. Sequence differences between alternative 2Rb arrangements were exploited in the design of a PCR diagnostic assay, which was evaluated against the known chromosomal banding pattern of laboratory colonies and field-collected samples from Mali and Cameroon.
Results
The breakpoints of the 7.55 Mb 2Rb inversion are flanked by extensive runs of the same short (72 bp) tandemly organized sequence, which was likely responsible for chromosomal breakage and rearrangement. Application of the molecular diagnostic assay suggested that 2Rb has a single common origin in An. gambiae and its sibling species, Anopheles arabiensis, and also that the standard arrangement (2R+b) may have arisen twice through breakpoint reuse. The molecular diagnostic was reliable when applied to laboratory colonies, but its accuracy was lower in natural populations.
Conclusions
The complex repetitive sequence flanking the 2Rb breakpoint region may be prone to structural and sequence-level instability. The 2Rb molecular diagnostic has immediate application in studies based on laboratory colonies, but its usefulness in natural populations awaits development of complementary molecular tools.
doi:10.1186/1475-2875-9-293
PMCID: PMC2988034  PMID: 20974007
7.  A statistical approach to finding overlooked genetic associations 
BMC Bioinformatics  2010;11:526.
Background
Complexity and noise in expression quantitative trait loci (eQTL) studies make it difficult to distinguish potential regulatory relationships among the many interactions. The predominant method of identifying eQTLs finds associations that are significant at a genome-wide level. The vast number of statistical tests carried out on these data make false negatives very likely. Corrections for multiple testing error render genome-wide eQTL techniques unable to detect modest regulatory effects.
We propose an alternative method to identify eQTLs that builds on traditional approaches. In contrast to genome-wide techniques, our method determines the significance of an association between an expression trait and a locus with respect to the set of all associations to the expression trait. The use of this specific information facilitates identification of expression traits that have an expression profile that is characterized by a single exceptional association to a locus.
Our approach identifies expression traits that have exceptional associations regardless of the genome-wide significance of those associations. This property facilitates the identification of possible false negatives for genome-wide significance. Further, our approach has the property of prioritizing expression traits that are affected by few strong associations. Expression traits identified by this method may warrant additional study because their expression level may be affected by targeting genes near a single locus.
Results
We demonstrate our method by identifying eQTL hotspots in Plasmodium falciparum (malaria) and Saccharomyces cerevisiae (yeast). We demonstrate the prioritization of traits with few strong genetic effects through Gene Ontology (GO) analysis of Yeast. Our results are strongly consistent with results gathered using genome-wide methods and identify additional hotspots and eQTLs.
Conclusions
New eQTLs and hotspots found with this method may represent regions of the genome or biological processes that are controlled through few relatively strong genetic interactions. These points of interest warrant experimental investigation.
doi:10.1186/1471-2105-11-526
PMCID: PMC2974753  PMID: 20964847
8.  Population-level transcriptome sequencing of nonmodel organisms Erynnis propertius and Papilio zelicaon 
BMC Genomics  2010;11:310.
Background
Several recent studies have demonstrated the use of Roche 454 sequencing technology for de novo transcriptome analysis. Low error rates and high coverage also allow for effective SNP discovery and genetic diversity estimates. However, genetically diverse datasets, such as those sourced from natural populations, pose challenges for assembly programs and subsequent analysis. Further, estimating the effectiveness of transcript discovery using Roche 454 transcriptome data is still a difficult task.
Results
Using the Roche 454 FLX Titanium platform, we sequenced and assembled larval transcriptomes for two butterfly species: the Propertius duskywing, Erynnis propertius (Lepidoptera: Hesperiidae) and the Anise swallowtail, Papilio zelicaon (Lepidoptera: Papilionidae). The Expressed Sequence Tags (ESTs) generated represent a diverse sample drawn from multiple populations, developmental stages, and stress treatments.
Despite this diversity, > 95% of the ESTs assembled into long (> 714 bp on average) and highly covered (> 9.6× on average) contigs. To estimate the effectiveness of transcript discovery, we compared the number of bases in the hit region of unigenes (contigs and singletons) to the length of the best match silkworm (Bombyx mori) protein--this "ortholog hit ratio" gives a close estimate on the amount of the transcript discovered relative to a model lepidopteran genome. For each species, we tested two assembly programs and two parameter sets; although CAP3 is commonly used for such data, the assemblies produced by Celera Assembler with modified parameters were chosen over those produced by CAP3 based on contig and singleton counts as well as ortholog hit ratio analysis. In the final assemblies, 1,413 E. propertius and 1,940 P. zelicaon unigenes had a ratio > 0.8; 2,866 E. propertius and 4,015 P. zelicaon unigenes had a ratio > 0.5.
Conclusions
Ultimately, these assemblies and SNP data will be used to generate microarrays for ecoinformatics examining climate change tolerance of different natural populations. These studies will benefit from high quality assemblies with few singletons (less than 26% of bases for each assembled transcriptome are present in unassembled singleton ESTs) and effective transcript discovery (over 6,500 of our putative orthologs cover at least 50% of the corresponding model silkworm gene).
doi:10.1186/1471-2164-11-310
PMCID: PMC2887415  PMID: 20478048
9.  SNP discovery via 454 transcriptome sequencing 
The Plant Journal   2007;51(5):910-918.
A massively parallel pyro-sequencing technology commercialized by 454 Life Sciences Corporation was used to sequence the transcriptomes of shoot apical meristems isolated from two inbred lines of maize using laser capture microdissection (LCM). A computational pipeline that uses the POLYBAYES polymorphism detection system was adapted for 454 ESTs and used to detect SNPs (single nucleotide polymorphisms) between the two inbred lines. Putative SNPs were computationally identified using 260 000 and 280 000 454 ESTs from the B73 and Mo17 inbred lines, respectively. Over 36 000 putative SNPs were detected within 9980 unique B73 genomic anchor sequences (MAGIs). Stringent post-processing reduced this number to > 7000 putative SNPs. Over 85% (94/110) of a sample of these putative SNPs were successfully validated by Sanger sequencing. Based on this validation rate, this pilot experiment conservatively identified > 4900 valid SNPs within > 2400 maize genes. These results demonstrate that 454-based transcriptome sequencing is an excellent method for the high-throughput acquisition of gene-associated SNPs.
doi:10.1111/j.1365-313X.2007.03193.x
PMCID: PMC2169515  PMID: 17662031
SNPs; ESTs; maize; 454 sequencing; markers
10.  Global gene expression analysis of the shoot apical meristem of maize (Zea mays L.) 
The Plant Journal   2007;52(3):391-404.
All above-ground plant organs are derived from shoot apical meristems (SAMs). Global analyses of gene expression were conducted on maize (Zea mays L.) SAMs to identify genes preferentially expressed in the SAM. The SAMs were collected from 14-day-old B73 seedlings via laser capture microdissection (LCM). The RNA samples extracted from LCM-collected SAMs and from seedlings were hybridized to microarrays spotted with 37 660 maize cDNAs. Approximately 30% (10 816) of these cDNAs were prepared as part of this study from manually dissected B73 maize apices. Over 5000 expressed sequence tags (ESTs) (about 13% of the total) were differentially expressed (P<0.0001) between SAMs and seedlings. Of these, 2783 and 2248 ESTs were up- and down-regulated in the SAM, respectively. The expression in the SAM of several of the differentially expressed ESTs was validated via quantitative RT-PCR and/or in situ hybridization. The up-regulated ESTs included many regulatory genes including transcription factors, chromatin remodeling factors and components of the gene-silencing machinery, as well as about 900 genes with unknown functions. Surprisingly, transcripts that hybridized to 62 retrotransposon-related cDNAs were also substantially up-regulated in the SAM. Complementary DNAs derived from the LCM-collected SAMs were sequenced to identify additional genes that are expressed in the SAM. This generated around 550 000 ESTs (454-SAM ESTs) from two genotypes. Consistent with the microarray results, approximately 14% of the 454-SAM ESTs from B73 were retrotransposon-related. Possible roles of genes that are preferentially expressed in the SAM are discussed.
doi:10.1111/j.1365-313X.2007.03244.x
PMCID: PMC2156186  PMID: 17764504
shoot apical meristem; global gene expression; laser capture microdissection; 454 sequencing; development; retrotransposon expression
11.  PROBEmer: a web-based software tool for selecting optimal DNA oligos 
Nucleic Acids Research  2003;31(13):3746-3750.
PROBEmer (http://probemer.cs.loyola.edu) is a web-based software tool that enables a researcher to select optimal oligos for PCR applications and multiplex detection platforms including oligonucleotide microarrays and bead-based arrays. Given two groups of nucleic-acid sequences, a target group and a non-target group, the software identifies oligo sequences that occur in members of the target group, but not in the non-target group. To help predict potential cross hybridization, PROBEmer computes all near neighbors in the non-target group and displays their alignments. The software has been used to obtain genus-specific prokaryotic probes based on the 16S rRNA gene, gene-specific probes for expression analyses and PCR primers. In this paper, we describe how to use PROBEmer, the computational methods it employs, and experimental results for oligos identified by this software tool.
PMCID: PMC168975  PMID: 12824409

Results 1-11 (11)