Search tips
Search criteria

Results 1-10 (10)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline 
BMC Bioinformatics  2014;15:30.
Massively parallel DNA sequencing generates staggering amounts of data. Decreasing cost, increasing throughput, and improved annotation have expanded the diversity of genomics applications in research and clinical practice. This expanding scale creates analytical challenges: accommodating peak compute demand, coordinating secure access for multiple analysts, and sharing validated tools and results.
To address these challenges, we have developed the Mercury analysis pipeline and deployed it in local hardware and the Amazon Web Services cloud via the DNAnexus platform. Mercury is an automated, flexible, and extensible analysis workflow that provides accurate and reproducible genomic results at scales ranging from individuals to large cohorts.
By taking advantage of cloud computing and with Mercury implemented on the DNAnexus platform, we have demonstrated a powerful combination of a robust and fully validated software pipeline and a scalable computational resource that, to date, we have applied to more than 10,000 whole genome and whole exome samples.
PMCID: PMC3922167  PMID: 24475911
NGS data; Variant calling; Annotation; Clinical sequencing; Cloud computing
2.  Exome sequencing resolves apparent incidental findings and reveals further complexity of SH3TC2 variant alleles causing Charcot-Marie-Tooth neuropathy 
Genome Medicine  2013;5(6):57.
The debate regarding the relative merits of whole genome sequencing (WGS) versus exome sequencing (ES) centers around comparative cost, average depth of coverage for each interrogated base, and their relative efficiency in the identification of medically actionable variants from the myriad of variants identified by each approach. Nevertheless, few genomes have been subjected to both WGS and ES, using multiple next generation sequencing platforms. In addition, no personal genome has been so extensively analyzed using DNA derived from peripheral blood as opposed to DNA from transformed cell lines that may either accumulate mutations during propagation or clonally expand mosaic variants during cell transformation and propagation.
We investigated a genome that was studied previously by SOLiD chemistry using both ES and WGS, and now perform six independent ES assays (Illumina GAII (x2), Illumina HiSeq (x2), Life Technologies' Personal Genome Machine (PGM) and Proton), and one additional WGS (Illumina HiSeq).
We compared the variants identified by the different methods and provide insights into the differences among variants identified between ES runs in the same technology platform and among different sequencing technologies. We resolved the true genotypes of medically actionable variants identified in the proband through orthogonal experimental approaches. Furthermore, ES identified an additional SH3TC2 variant (p.M1?) that likely contributes to the phenotype in the proband.
ES identified additional medically actionable variant calls and helped resolve ambiguous single nucleotide variants (SNV) documenting the power of increased depth of coverage of the captured targeted regions. Comparative analyses of WGS and ES reveal that pseudogenes and segmental duplications may explain some instances of apparent disease mutations in unaffected individuals.
PMCID: PMC3706849  PMID: 23806086
Exome sequencing; Whole-genome sequencing; Incidental findings; SH3TC2; Personal genomes; Precision medicine
3.  Exome Sequencing: Applications From the Lab Bench to the Clinic 
Since the initial report of targeted-enrichment (Albert et al, 2007) we have been evolving the design and utility of capture reagents and methods, while taking advantage of the parallel advances in sequencing platforms. New exome designs target a comprehensive set of coding exons from 6 different gene databases, as well as computationally predicted coding and non-coding elements: regulatory regions, and conserved UTRs. Library automation, reduction of DNA input samples, capture hybridization multiplexing and application of faster read mapping tools such as BWA, together allow a rate of >4,300 libraries/captures per month, with >40,000 exome and regional capture libraries completed to date. In addition, a fully integrated informatics and analysis pipeline (Mercury), supports all aspects of data flow and analysis from the initial data production on the sequencing instrument to annotated variant calls (SNPs and small Indels). These laboratory methods and analysis pipelines have been production hardened at the Human Genome Sequencing Center (HGSC) and have now been applied toward clinical exome sequencing. Through a joint collaboration between the Human Genome Sequencing Center and the Medical Genetics Laboratories (MGL) of the Department of Molecular and Human Genetics, clinical exome sequencing and interpretation are now provided through the CAP/CLIA certified Whole Genome Laboratory (WGL). To date, the WGL has completed exome sequencing of 650 patient samples and final interpretation completed for over 450 patients with causative deleterious mutations identified in 25% of cases. Performance has been maintained to a high standard of 95% of the exome target bases represented at 20X coverage. Overall exome performance metrics, LIMS support, variant analysis and validation of the clinical pipeline for a CAP/CLIA environment will be presented.
PMCID: PMC3635387
4.  Strict evolutionary conservation followed rapid gene loss on human and rhesus Y chromosomes 
Nature  2012;483(7387):82-86.
The human X and Y chromosomes evolved from an ordinary pair of autosomes during the past 200–300 million years1–3. Due to genetic decay, the human MSY (male-specific region of Y chromosome) retains only three percent of the ancestral autosomes’ genes4,5. This evolutionary decay was driven by a series of five “stratification” events. Each event suppressed X-Y crossing over within a chromosome segment or “stratum”, incorporated that segment into the MSY, and subjected its genes to the erosive forces that attend the absence of crossing over2,6. The last of these events occurred 30 million years ago (mya), or 5 million years before the human and Old World monkey (OWM) lineages diverged. Although speculation abounds regarding ongoing decay and looming extinction of the human Y chromosome7–10, remarkably little is known about how many MSY genes were lost in the human lineage in the 25 million years that have followed its separation from the OWM lineage. To explore this question, we sequenced the MSY of the rhesus macaque, an OWM, and compared it to the human MSY. We discovered that, during the last 25 million years, MSY gene loss in the human lineage was limited to the youngest stratum (stratum 5), which comprises three percent of the human MSY. Within the older strata, which collectively comprise the bulk of the human MSY, gene loss evidently ceased more than 25 mya. Likewise, the rhesus MSY has not lost any older genes (from strata 1–4) during the past 25 million years, despite major structural differences from the human MSY. The rhesus MSY is simpler, with few amplified gene families or palindromes that might enable intrachromosomal recombination and repair. We present an empirical reconstruction of human MSY evolution in which each stratum transitioned from rapid, exponential loss of ancestral genes to strict conservation through purifying selection.
PMCID: PMC3292678  PMID: 22367542
5.  Complete genome sequence of Enterococcus faecium strain TX16 and comparative genomic analysis of Enterococcus faecium genomes 
BMC Microbiology  2012;12:135.
Enterococci are among the leading causes of hospital-acquired infections in the United States and Europe, with Enterococcus faecalis and Enterococcus faecium being the two most common species isolated from enterococcal infections. In the last decade, the proportion of enterococcal infections caused by E. faecium has steadily increased compared to other Enterococcus species. Although the underlying mechanism for the gradual replacement of E. faecalis by E. faecium in the hospital environment is not yet understood, many studies using genotyping and phylogenetic analysis have shown the emergence of a globally dispersed polyclonal subcluster of E. faecium strains in clinical environments. Systematic study of the molecular epidemiology and pathogenesis of E. faecium has been hindered by the lack of closed, complete E. faecium genomes that can be used as references.
In this study, we report the complete genome sequence of the E. faecium strain TX16, also known as DO, which belongs to multilocus sequence type (ST) 18, and was the first E. faecium strain ever sequenced. Whole genome comparison of the TX16 genome with 21 E. faecium draft genomes confirmed that most clinical, outbreak, and hospital-associated (HA) strains (including STs 16, 17, 18, and 78), in addition to strains of non-hospital origin, group in the same clade (referred to as the HA clade) and are evolutionally considerably more closely related to each other by phylogenetic and gene content similarity analyses than to isolates in the community-associated (CA) clade with approximately a 3–4% average nucleotide sequence difference between the two clades at the core genome level. Our study also revealed that many genomic loci in the TX16 genome are unique to the HA clade. 380 ORFs in TX16 are HA-clade specific and antibiotic resistance genes are enriched in HA-clade strains. Mobile elements such as IS16 and transposons were also found almost exclusively in HA strains, as previously reported.
Our findings along with other studies show that HA clonal lineages harbor specific genetic elements as well as sequence differences in the core genome which may confer selection advantages over the more heterogeneous CA E. faecium isolates. Which of these differences are important for the success of specific E. faecium lineages in the hospital environment remain(s) to be determined.
PMCID: PMC3433357  PMID: 22769602
6.  Complete Genome Sequence of Treponema paraluiscuniculi, Strain Cuniculi A: The Loss of Infectivity to Humans Is Associated with Genome Decay 
PLoS ONE  2011;6(5):e20415.
Treponema paraluiscuniculi is the causative agent of rabbit venereal spirochetosis. It is not infectious to humans, although its genome structure is very closely related to other pathogenic Treponema species including Treponema pallidum subspecies pallidum, the etiological agent of syphilis. In this study, the genome sequence of Treponema paraluiscuniculi, strain Cuniculi A, was determined by a combination of several high-throughput sequencing strategies. Whereas the overall size (1,133,390 bp), arrangement, and gene content of the Cuniculi A genome closely resembled those of the T. pallidum genome, the T. paraluiscuniculi genome contained a markedly higher number of pseudogenes and gene fragments (51). In addition to pseudogenes, 33 divergent genes were also found in the T. paraluiscuniculi genome. A set of 32 (out of 84) affected genes encoded proteins of known or predicted function in the Nichols genome. These proteins included virulence factors, gene regulators and components of DNA repair and recombination. The majority (52 or 61.9%) of the Cuniculi A pseudogenes and divergent genes were of unknown function. Our results indicate that T. paraluiscuniculi has evolved from a T. pallidum-like ancestor and adapted to a specialized host-associated niche (rabbits) during loss of infectivity to humans. The genes that are inactivated or altered in T. paraluiscuniculi are candidates for virulence factors important in the infectivity and pathogenesis of T. pallidum subspecies.
PMCID: PMC3105029  PMID: 21655244
7.  Comparative Genomics of Gardnerella vaginalis Strains Reveals Substantial Differences in Metabolic and Virulence Potential 
PLoS ONE  2010;5(8):e12411.
Gardnerella vaginalis is described as a common vaginal bacterial species whose presence correlates strongly with bacterial vaginosis (BV). Here we report the genome sequencing and comparative analyses of three strains of G. vaginalis. Strains 317 (ATCC 14019) and 594 (ATCC 14018) were isolated from the vaginal tracts of women with symptomatic BV, while Strain 409-05 was isolated from a healthy, asymptomatic individual with a Nugent score of 9.
Principal Findings
Substantial genomic rearrangement and heterogeneity were observed that appeared to have resulted from both mobile elements and substantial lateral gene transfer. These genomic differences translated to differences in metabolic potential. All strains are equipped with significant virulence potential, including genes encoding the previously described vaginolysin, pili for cytoadhesion, EPS biosynthetic genes for biofilm formation, and antimicrobial resistance systems, We also observed systems promoting multi-drug and lantibiotic extrusion. All G. vaginalis strains possess a large number of genes that may enhance their ability to compete with and exclude other vaginal colonists. These include up to six toxin-antitoxin systems and up to nine additional antitoxins lacking cognate toxins, several of which are clustered within each genome. All strains encode bacteriocidal toxins, including two lysozyme-like toxins produced uniquely by strain 409-05. Interestingly, the BV isolates encode numerous proteins not found in strain 409-05 that likely increase their pathogenic potential. These include enzymes enabling mucin degradation, a trait previously described to strongly correlate with BV, although commonly attributed to non-G. vaginalis species.
Collectively, our results indicate that all three strains are able to thrive in vaginal environments, and therein the BV isolates are capable of occupying a niche that is unique from 409-05. Each strain has significant virulence potential, although genomic and metabolic differences, such as the ability to degrade mucin, indicate that the detection of G. vaginalis in the vaginal tract provides only partial information on the physiological potential of the organism.
PMCID: PMC2928729  PMID: 20865041
8.  Large scale variation in Enterococcus faecalis illustrated by the genome analysis of strain OG1RF 
Genome Biology  2008;9(7):R110.
A comparison of two strains of the hospital pathogen Enterococcus faecalis suggests that mediators of virulence differ between strains and that virulence does not depend on mobile gene elements
Enterococcus faecalis has emerged as a major hospital pathogen. To explore its diversity, we sequenced E. faecalis strain OG1RF, which is commonly used for molecular manipulation and virulence studies.
The 2,739,625 base pair chromosome of OG1RF was found to contain approximately 232 kilobases unique to this strain compared to V583, the only publicly available sequenced strain. Almost no mobile genetic elements were found in OG1RF. The 64 areas of divergence were classified into three categories. First, OG1RF carries 39 unique regions, including 2 CRISPR loci and a new WxL locus. Second, we found nine replacements where a sequence specific to V583 was substituted by a sequence specific to OG1RF. For example, the iol operon of OG1RF replaces a possible prophage and the vanB transposon in V583. Finally, we found 16 regions that were present in V583 but missing from OG1RF, including the proposed pathogenicity island, several probable prophages, and the cpsCDEFGHIJK capsular polysaccharide operon. OG1RF was more rapidly but less frequently lethal than V583 in the mouse peritonitis model and considerably outcompeted V583 in a murine model of urinary tract infections.
E. faecalis OG1RF carries a number of unique loci compared to V583, but the almost complete lack of mobile genetic elements demonstrates that this is not a defining feature of the species. Additionally, OG1RF's effects in experimental models suggest that mediators of virulence may be diverse between different E. faecalis strains and that virulence is not dependent on the presence of mobile genetic elements.
PMCID: PMC2530867  PMID: 18611278
9.  Subtle genetic changes enhance virulence of methicillin resistant and sensitive Staphylococcus aureus 
BMC Microbiology  2007;7:99.
Community acquired (CA) methicillin-resistant Staphylococcus aureus (MRSA) increasingly causes disease worldwide. USA300 has emerged as the predominant clone causing superficial and invasive infections in children and adults in the USA. Epidemiological studies suggest that USA300 is more virulent than other CA-MRSA. The genetic determinants that render virulence and dominance to USA300 remain unclear.
We sequenced the genomes of two pediatric USA300 isolates: one CA-MRSA and one CA-methicillin susceptible (MSSA), isolated at Texas Children's Hospital in Houston. DNA sequencing was performed by Sanger dideoxy whole genome shotgun (WGS) and 454 Life Sciences pyrosequencing strategies. The sequence of the USA300 MRSA strain was rigorously annotated. In USA300-MRSA 2658 chromosomal open reading frames were predicted and 3.1 and 27 kilobase (kb) plasmids were identified. USA300-MSSA contained a 20 kb plasmid with some homology to the 27 kb plasmid found in USA300-MRSA. Two regions found in US300-MRSA were absent in USA300-MSSA. One of these carried the arginine deiminase operon that appears to have been acquired from S. epidermidis. The USA300 sequence was aligned with other sequenced S. aureus genomes and regions unique to USA300 MRSA were identified.
USA300-MRSA is highly similar to other MRSA strains based on whole genome alignments and gene content, indicating that the differences in pathogenesis are due to subtle changes rather than to large-scale acquisition of virulence factor genes. The USA300 Houston isolate differs from another sequenced USA300 strain isolate, derived from a patient in San Francisco, in plasmid content and a number of sequence polymorphisms. Such differences will provide new insights into the evolution of pathogens.
PMCID: PMC2222628  PMID: 17986343
10.  Paradoxical DNA Repair and Peroxide Resistance Gene Conservation in Bacillus pumilus SAFR-032 
PLoS ONE  2007;2(9):e928.
Bacillus spores are notoriously resistant to unfavorable conditions such as UV radiation, γ-radiation, H2O2, desiccation, chemical disinfection, or starvation. Bacillus pumilus SAFR-032 survives standard decontamination procedures of the Jet Propulsion Lab spacecraft assembly facility, and both spores and vegetative cells of this strain exhibit elevated resistance to UV radiation and H2O2 compared to other Bacillus species.
Principal Findings
The genome of B. pumilus SAFR-032 was sequenced and annotated. Lists of genes relevant to DNA repair and the oxidative stress response were generated and compared to B. subtilis and B. licheniformis. Differences in conservation of genes, gene order, and protein sequences are highlighted because they potentially explain the extreme resistance phenotype of B. pumilus. The B. pumilus genome includes genes not found in B. subtilis or B. licheniformis and conserved genes with sequence divergence, but paradoxically lacks several genes that function in UV or H2O2 resistance in other Bacillus species.
This study identifies several candidate genes for further research into UV and H2O2 resistance. These findings will help explain the resistance of B. pumilus and are applicable to understanding sterilization survival strategies of microbes.
PMCID: PMC1976550  PMID: 17895969

Results 1-10 (10)