1.  An Emerging Mycoplasma Associated with Trichomoniasis, Vaginal Infection and Disease 
PLoS ONE  2014;9(10):e110943.
Humans are colonized by thousands of bacterial species, but it is difficult to assess the metabolic and pathogenic potential of the majority of these because they have yet to be cultured. Here, we characterize an uncultivated vaginal mycoplasma tightly associated with trichomoniasis that was previously known by its 16S rRNA sequence as “Mnola.” In this study, the mycoplasma was found almost exclusively in women infected with the sexually transmitted pathogen Trichomonas vaginalis, but rarely observed in women with no diagnosed disease. The genomes of four strains of this species were reconstructed using metagenome sequencing and assembly of DNA from four discrete mid-vaginal samples, one of which was obtained from a pregnant woman with trichomoniasis who delivered prematurely. These bacteria harbor several putative virulence factors and display unique metabolic strategies. Genes encoding proteins with high similarity to potential virulence factors include two collagenases, a hemolysin, an O-sialoglycoprotein endopeptidase and a feoB-type ferrous iron transport system. We propose the name “Candidatus Mycoplasma girerdii” for this potential new pathogen.
PMCID: PMC4206474  PMID: 25337710
2.  Proteomics-based metabolic modeling and characterization of the cellulolytic bacterium Thermobifida fusca 
BMC Systems Biology  2014;8:86.
Thermobifida fusca is a cellulolytic bacterium with potential to be used as a platform organism for sustainable industrial production of biofuels, pharmaceutical ingredients and other bioprocesses due to its capability of potential to convert plant biomass to value-added chemicals. To best develop T. fusca as a bioprocess organism, it is important to understand its native cellular processes. In the current study, we characterize the metabolic network of T. fusca through reconstruction of a genome-scale metabolic model and proteomics data. The overall goal of this study was to use multiple metabolic models generated by different methods and comparison to experimental data to gain a high-confidence understanding of the T. fusca metabolic network.
We report the generation of three versions of a metabolic model of Thermobifida fusca sp. XY developed using three different approaches (automated, semi-automated, and proteomics-derived). The model closest to in vivo growth was the proteomics-derived model that consists of 975 reactions involving 1382 metabolites and account for 316 EC numbers (296 genes). The model was optimized for biomass production with the optimal flux of 0.48 doublings per hour when grown on cellobiose with a substrate uptake rate of 0.25 mmole/h. In vivo activity of the DXP pathway for terpenoid biosynthesis was also confirmed using real-time PCR.
iTfu296 provides a platform to understand and explore the metabolic capabilities of the actinomycete T. fusca for the potential use in bioprocess industries for the production of biofuel and pharmaceutical ingredients. By comparing different model reconstruction methods, the use of high-throughput proteomics data as a starting point proved to be the most accurate to in vivo growth.
PMCID: PMC4236713  PMID: 25115351
Metabolic Modeling; Flux Balance Analysis; Constraint Based Modeling; Actinomycete; Thermobifida fusca; Proteomics Profiling; Terpenoids Biosynthesis Pathway; DXP Pathway; Mevalonate Pathway; Biofuel
3.  Gap Detection for Genome-Scale Constraint-Based Models 
Advances in Bioinformatics  2012;2012:323472.
Constraint-based metabolic models are currently the most comprehensive system-wide models of cellular metabolism. Several challenges arise when building an in silico constraint-based model of an organism that need to be addressed before flux balance analysis (FBA) can be applied for simulations. An algorithm called FBA-Gap is presented here that aids the construction of a working model based on plausible modifications to a given list of reactions that are known to occur in the organism. When applied to a working model, the algorithm gives a hypothesis concerning a minimal medium for sustaining the cell in culture. The utility of the algorithm is demonstrated in creating a new model organism and is applied to four existing working models for generating hypotheses about culture media. In modifying a partial metabolic reconstruction so that biomass may be produced using FBA, the proposed method is more efficient than a previously proposed method in that fewer new reactions are added to complete the model. The proposed method is also more accurate than other approaches in that only biologically plausible reactions and exchange reactions are used.
PMCID: PMC3444828  PMID: 22997515
4.  Hypothesis Testing and Power Calculations for Taxonomic-Based Human Microbiome Data 
PLoS ONE  2012;7(12):e52078.
This paper presents new biostatistical methods for the analysis of microbiome data based on a fully parametric approach using all the data. The Dirichlet-multinomial distribution allows the analyst to calculate power and sample sizes for experimental design, perform tests of hypotheses (e.g., compare microbiomes across groups), and to estimate parameters describing microbiome properties. The use of a fully parametric model for these data has the benefit over alternative non-parametric approaches such as bootstrapping and permutation testing, in that this model is able to retain more information contained in the data. This paper details the statistical approaches for several tests of hypothesis and power/sample size calculations, and applies them for illustration to taxonomic abundance distribution and rank abundance distribution data using HMP Jumpstart data on 24 subjects for saliva, subgingival, and supragingival samples. Software for running these analyses is available.
PMCID: PMC3527355  PMID: 23284876
5.  Species-level classification of the vaginal microbiome 
BMC Genomics  2012;13(Suppl 8):S17.
The application of next-generation sequencing to the study of the vaginal microbiome is revealing the spectrum of microbial communities that inhabit the human vagina. High-resolution identification of bacterial taxa, minimally to the species level, is necessary to fully understand the association of the vaginal microbiome with bacterial vaginosis, sexually transmitted infections, pregnancy complications, menopause, and other physiological and infectious conditions. However, most current taxonomic assignment strategies based on metagenomic 16S rDNA sequence analysis provide at best a genus-level resolution. While surveys of 16S rRNA gene sequences are common in microbiome studies, few well-curated, body-site-specific reference databases of 16S rRNA gene sequences are available, and no such resource is available for vaginal microbiome studies.
We constructed the Vaginal 16S rDNA Reference Database, a comprehensive and non-redundant database of 16S rDNA reference sequences for bacterial taxa likely to be associated with vaginal health, and we developed STIRRUPS, a new method that employs the USEARCH algorithm with a curated reference database for rapid species-level classification of 16S rDNA partial sequences. The method was applied to two datasets of V1-V3 16S rDNA reads: one generated from a mock community containing DNA from six bacterial strains associated with vaginal health, and a second generated from over 1,000 mid-vaginal samples collected as part of the Vaginal Human Microbiome Project at Virginia Commonwealth University. In both datasets, STIRRUPS, used in conjunction with the Vaginal 16S rDNA Reference Database, classified more than 95% of processed reads to a species-level taxon using a 97% global identity threshold for assignment.
This database and method provide accurate species-level classifications of metagenomic 16S rDNA sequence reads that will be useful for analysis and comparison of microbiome profiles from vaginal samples. STIRRUPS can be used to classify 16S rDNA sequence reads from other ecological niches if an appropriate reference database of 16S rDNA sequences is available.
PMCID: PMC3535711  PMID: 23282177
6.  Genome-scale metabolic analysis of Clostridium thermocellum for bioethanol production 
BMC Systems Biology  2010;4:31.
Microorganisms possess diverse metabolic capabilities that can potentially be leveraged for efficient production of biofuels. Clostridium thermocellum (ATCC 27405) is a thermophilic anaerobe that is both cellulolytic and ethanologenic, meaning that it can directly use the plant sugar, cellulose, and biochemically convert it to ethanol. A major challenge in using microorganisms for chemical production is the need to modify the organism to increase production efficiency. The process of properly engineering an organism is typically arduous.
Here we present a genome-scale model of C. thermocellum metabolism, iSR432, for the purpose of establishing a computational tool to study the metabolic network of C. thermocellum and facilitate efforts to engineer C. thermocellum for biofuel production. The model consists of 577 reactions involving 525 intracellular metabolites, 432 genes, and a proteomic-based representation of a cellulosome. The process of constructing this metabolic model led to suggested annotation refinements for 27 genes and identification of areas of metabolism requiring further study. The accuracy of the iSR432 model was tested using experimental growth and by-product secretion data for growth on cellobiose and fructose. Analysis using this model captures the relationship between the reduction-oxidation state of the cell and ethanol secretion and allowed for prediction of gene deletions and environmental conditions that would increase ethanol production.
By incorporating genomic sequence data, network topology, and experimental measurements of enzyme activities and metabolite fluxes, we have generated a model that is reasonably accurate at predicting the cellular phenotype of C. thermocellum and establish a strong foundation for rational strain design. In addition, we are able to draw some important conclusions regarding the underlying metabolic mechanisms for observed behaviors of C. thermocellum and highlight remaining gaps in the existing genome annotations.
PMCID: PMC2852388  PMID: 20307315

