1.  Complete Genome Sequence of the Extreme Thermophile Dictyoglomus thermophilum H-6-12 
Genome Announcements  2014;2(1):e00109-14.
Here, we present the complete genome of the extreme thermophile, Dictyoglomus thermophilum H-6-12 (phylum Dictyoglomi), which consists of 1,959,987 bp.
PMCID: PMC3931368  PMID: 24558247
2.  Draft Genome Sequence of Extended-Spectrum β-Lactamase-Producing Klebsiella pneumoniae Isolated from a Patient in Lebanon 
Genome Announcements  2014;2(1):e00121-14.
We present the draft genome sequence of extended-spectrum β-lactamase (ESBL)-producing Klebsiella pneumoniae isolated from a stool sample collected from a patient admitted for a gastrointestinal procedure. The draft genome sequence consists of 86 contigs, including a combined 5,632,663 bases with 57% G+C content.
PMCID: PMC3931372  PMID: 24558251
3.  Draft Genome Sequences of Extended-Spectrum β-Lactamase-Producing Escherichia coli Strains Isolated from Patients in Lebanon 
Genome Announcements  2014;2(1):e00123-14.
We present the draft genome sequences of nine extended-spectrum β-lactamase (ESBL)-producing Escherichia coli strains isolated from stool samples collected from patients admitted for gastrointestinal and urological procedures/surgeries. An average of 3,889,300 paired-end reads per sample were generated, which assembled in 77 to 157 contigs.
PMCID: PMC3931373  PMID: 24558252
4.  Troubleshooting Public Data Archiving: Suggestions to Increase Participation 
PLoS Biology  2014;12(1):e1001779.
Public data archiving has many benefits for society, but some scientists are reluctant to share their data. This Perspective offers some practical solutions to reduce costs and increase benefits for individual researchers.
An increasing number of publishers and funding agencies require public data archiving (PDA) in open-access databases. PDA has obvious group benefits for the scientific community, but many researchers are reluctant to share their data publicly because of real or perceived individual costs. Improving participation in PDA will require lowering costs and/or increasing benefits for primary data collectors. Small, simple changes can enhance existing measures to ensure that more scientific data are properly archived and made publicly available: (1) facilitate more flexible embargoes on archived data, (2) encourage communication between data generators and re-users, (3) disclose data re-use ethics, and (4) encourage increased recognition of publicly archived data.
PMCID: PMC3904821  PMID: 24492920
5.  PhyloSift: phylogenetic analysis of genomes and metagenomes 
PeerJ  2014;2:e243.
Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection.
In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata.
These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454).
PMCID: PMC3897386  PMID: 24482762
Metagenomics; Phylogenetics; Forensics; Bayes factor; Microbial diversity; Community structure; Microbial ecology; Edge PCA; Phylogenetic diversity; Microbial evolution
6.  Open Science and Reporting Animal Studies: Who's Accountable? 
PLoS Biology  2014;12(1):e1001757.
If being open means maximizing the number of people a paper can reach and minimizing the difficulties of re-using the information within it, then the release of all information associated with a paper is critical. For ethical reasons, high standards of reporting are extra critical in regards to animal research.
PMCID: PMC3883631  PMID: 24409097
7.  A Field Guide to Genomics Research 
PLoS Biology  2014;12(1):e1001744.
Portraying high-throughput genomics research as a wild frontier, Andrea Bild and colleagues use caricatures to highlight common pitfalls in genomic research and provide recommendations for navigating this terrain.
PMCID: PMC3883637  PMID: 24409093
8.  Two Years Later: Journals Are Not Yet Enforcing the ARRIVE Guidelines on Reporting Standards for Pre-Clinical Animal Studies 
PLoS Biology  2014;12(1):e1001756.
A study by David Baker and colleagues reveals poor quality of reporting in pre-clinical animal research and a failure of journals to implement the ARRIVE guidelines.
There is growing concern that poor experimental design and lack of transparent reporting contribute to the frequent failure of pre-clinical animal studies to translate into treatments for human disease. In 2010, the Animal Research: Reporting of In Vivo Experiments (ARRIVE) guidelines were introduced to help improve reporting standards. They were published in PLOS Biology and endorsed by funding agencies and publishers and their journals, including PLOS, Nature research journals, and other top-tier journals. Yet our analysis of papers published in PLOS and Nature journals indicates that there has been very little improvement in reporting standards since then. This suggests that authors, referees, and editors generally are ignoring guidelines, and the editorial endorsement is yet to be effectively implemented.
PMCID: PMC3883646  PMID: 24409096
9.  Best Practices for Scientific Computing 
PLoS Biology  2014;12(1):e1001745.
We describe a set of best practices for scientific software development, based on research and experience, that will improve scientists' productivity and the reliability of their software.
PMCID: PMC3886731  PMID: 24415924
11.  Genome sequence of the human malaria parasite Plasmodium falciparum 
Nature  2002;419(6906):10.1038/nature01097.
The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host–parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.
PMCID: PMC3836256  PMID: 12368864
12.  Draft Genome Sequence of the Arsenate-Respiring Bacterium Chrysiogenes arsenatis Strain DSM 11915 
Genome Announcements  2013;1(6):e00953-13.
Here we present the draft genome sequence of Chrysiogenes arsenatis strain DSM 11915, only the second genome sequence from the phylum Chrysiogenetes. This strictly anaerobic organism was isolated from arsenic-contaminated gold mine wastewater and respires arsenate or nitrate instead of oxygen. The assembly contains 2,824,977 bp in 22 scaffolds.
PMCID: PMC3828317  PMID: 24233593
13.  PhyBin: binning trees by topology 
PeerJ  2013;1:e187.
A major goal of many evolutionary analyses is to determine the true evolutionary history of an organism. Molecular methods that rely on the phylogenetic signal generated by a few to a handful of loci can be used to approximate the evolution of the entire organism but fall short of providing a global, genome-wide, perspective on evolutionary processes. Indeed, individual genes in a genome may have different evolutionary histories. Therefore, it is informative to analyze the number and kind of phylogenetic topologies found within an orthologous set of genes across a genome. Here we present PhyBin: a flexible program for clustering gene trees based on topological structure. PhyBin can generate bins of topologies corresponding to exactly identical trees or can utilize Robinson-Fould’s distance matrices to generate clusters of similar trees, using a user-defined threshold. Additionally, PhyBin allows the user to adjust for potential noise in the dataset (as may be produced when comparing very closely related organisms) by pre-processing trees to collapse very short branches or those nodes not meeting a defined bootstrap threshold. As a test case, we generated individual trees based on an orthologous gene set from 10 Wolbachia species across four different supergroups (A–D) and utilized PhyBin to categorize the complete set of topologies produced from this dataset. Using this approach, we were able to show that although a single topology generally dominated the analysis, confirming the separation of the supergroups, many genes supported alternative evolutionary histories. Because PhyBin’s output provides the user with lists of gene trees in each topological cluster, it can be used to explore potential reasons for discrepancies between phylogenies including homoplasies, long-branch attraction, or horizontal gene transfer events.
PMCID: PMC3807594  PMID: 24167782
Robinson-Foulds; Phylogenetics; Evolutionary history; Wolbachia; Horizontal gene transfer
14.  Systematic Identification of Gene Families for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological Studies of Bacteria and Archaea and Their Major Subgroups 
PLoS ONE  2013;8(10):e77033.
With the astonishing rate that genomic and metagenomic sequence data sets are accumulating, there are many reasons to constrain the data analyses. One approach to such constrained analyses is to focus on select subsets of gene families that are particularly well suited for the tasks at hand. Such gene families have generally been referred to as “marker” genes. We are particularly interested in identifying and using such marker genes for phylogenetic and phylogeny-driven ecological studies of microbes and their communities (e.g., construction of species trees, phylogenetic based assignment of metagenomic sequence reads to taxonomic groups, phylogeny-based assessment of alpha- and beta-diversity of microbial communities from metagenomic data). We therefore refer to these as PhyEco (for phylogenetic and phylogenetic ecology) markers. The dual use of these PhyEco markers means that we needed to develop and apply a set of somewhat novel criteria for identification of the best candidates for such markers. The criteria we focused on included universality across the taxa of interest, ability to be used to produce robust phylogenetic trees that reflect as much as possible the evolution of the species from which the genes come, and low variation in copy number across taxa.
We describe here an automated protocol for identifying potential PhyEco markers from a set of complete genome sequences. The protocol combines rapid searching, clustering and phylogenetic tree building algorithms to generate protein families that meet the criteria listed above. We report here the identification of PhyEco markers for different taxonomic levels including 40 for “all bacteria and archaea”, 114 for “all bacteria (greatly expanding on the ∼30 commonly used), and 100 s to 1000 s for some of the individual phyla of bacteria. This new list of PhyEco markers should allow much more detailed automated phylogenetic and phylogenetic ecology analyses of these groups than possible previously.
PMCID: PMC3798382  PMID: 24146954
15.  Expert Failure: Re-evaluating Research Assessment 
PLoS Biology  2013;11(10):e1001677.
It is unlikely that there is any single objective measure of merit, so research assessment therefore requires new multivariate metrics that reflect the context of research, regardless of discipline.
PMCID: PMC3792859  PMID: 24115910
16.  The Assessment of Science: The Relative Merits of Post-Publication Review, the Impact Factor, and the Number of Citations 
PLoS Biology  2013;11(10):e1001675.
Because both subjective post-publication review and the number of citations are highly error prone and biased measures of merit of scientific papers, journal-based metrics may be a better surrogate.
The assessment of scientific publications is an integral part of the scientific process. Here we investigate three methods of assessing the merit of a scientific paper: subjective post-publication peer review, the number of citations gained by a paper, and the impact factor of the journal in which the article was published. We investigate these methods using two datasets in which subjective post-publication assessments of scientific publications have been made by experts. We find that there are moderate, but statistically significant, correlations between assessor scores, when two assessors have rated the same paper, and between assessor score and the number of citations a paper accrues. However, we show that assessor score depends strongly on the journal in which the paper is published, and that assessors tend to over-rate papers published in journals with high impact factors. If we control for this bias, we find that the correlation between assessor scores and between assessor score and the number of citations is weak, suggesting that scientists have little ability to judge either the intrinsic merit of a paper or its likely impact. We also show that the number of citations a paper receives is an extremely error-prone measure of scientific merit. Finally, we argue that the impact factor is likely to be a poor measure of merit, since it depends on subjective assessment. We conclude that the three measures of scientific merit considered here are poor; in particular subjective assessments are an error-prone, biased, and expensive method by which to assess merit. We argue that the impact factor may be the most satisfactory of the methods we have considered, since it is a form of pre-publication review. However, we emphasise that it is likely to be a very error-prone measure of merit that is qualitative, not quantitative.
PMCID: PMC3792863  PMID: 24115908
17.  The Impact of Helicobacter pylori Infection on the Gastric Microbiota of the Rhesus Macaque 
PLoS ONE  2013;8(10):e76375.
Helicobacter pylori colonization is highly prevalent among humans and causes significant gastric disease in a subset of those infected. When present, this bacterium dominates the gastric microbiota of humans and induces antimicrobial responses in the host. Since the microbial context of H. pylori colonization influences the disease outcome in a mouse model, we sought to assess the impact of H. pylori challenge upon the pre-existing gastric microbial community members in the rhesus macaque model. Deep sequencing of the bacterial 16S rRNA gene identified a community profile of 221 phylotypes that was distinct from that of the rhesus macaque distal gut and mouth, although there were taxa in common. High proportions of both H. pylori and H. suis were observed in the post-challenge libraries, but at a given time, only one Helicobacter species was dominant. However, the relative abundance of non-Helicobacter taxa was not significantly different before and after challenge with H. pylori. These results suggest that while different gastric species may show competitive exclusion in the gastric niche, the rhesus gastric microbial community is largely stable despite immune and physiological changes due to H. pylori infection.
PMCID: PMC3792980  PMID: 24116104
18.  Genome sequence of Frateuria aurantia type strain (Kondô 67T), a xanthomonade isolated from Lilium auratium Lindl. 
Standards in Genomic Sciences  2013;9(1):83-92.
Frateuria aurantia (ex Kondô and Ameyama 1958) Swings et al. 1980 is a member of the bispecific genus Frateuria in the family Xanthomonadaceae, which is already heavily targeted for non-type strain genome sequencing. Strain Kondô 67T was initially (1958) identified as a member of ‘Acetobacter aurantius’, a name that was not considered for the approved list. Kondô 67T was therefore later designated as the type strain of the newly proposed acetogenic species Frateuria aurantia. The strain is of interest because of its triterpenoids (hopane family). F. aurantia Kondô 67T is the first member of the genus Frateura whose genome sequence has been deciphered, and here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,603,458-bp long chromosome with its 3,200 protein-coding and 88 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.
PMCID: PMC3910546  PMID: 24501647
strictly aerobic; motile; rod-shaped; acetogenic; mesophilic; ‘Acetobacter aurantius’; Xanthomonadaceae; GEBA
19.  Genome sequence of the moderately thermophilic sulfur-reducing bacterium Thermanaerovibrio velox type strain (Z-9701T) and emended description of the genus Thermanaerovibrio 
Standards in Genomic Sciences  2013;9(1):57-70.
Thermanaerovibrio velox Zavarzina et al. 2000 is a member of the Synergistaceae, a family in the phylum Synergistetes that is already well-characterized at the genome level. Members of this phylum were described as Gram-negative staining anaerobic bacteria with a rod/vibrioid cell shape and possessing an atypical outer cell envelope. They inhabit a large variety of anaerobic environments including soil, oil wells, wastewater treatment plants and animal gastrointestinal tracts. They are also found to be linked to sites of human diseases such as cysts, abscesses, and areas of periodontal disease. The moderately thermophilic and organotrophic T. velox shares most of its morphologic and physiologic features with the closely related species, T. acidaminovorans. In addition to Su883T, the type strain of T. acidaminovorans, stain Z-9701T is the second type strain in the genus Thermanaerovibrio to have its genome sequence published. Here we describe the features of this organism, together with the non-contiguous genome sequence and annotation. The 1,880,838 bp long chromosome (non-contiguous finished sequence) with its 1,751 protein-coding and 59 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.
PMCID: PMC3910556  PMID: 24501645
obligate anaerobic; motile; curved rods; organotrophic; S0-reduction; cyanobacterial mat; Synergistaceae; Synergistetes; GEBA
20.  Effects of Diet on Resource Utilization by a Model Human Gut Microbiota Containing Bacteroides cellulosilyticus WH2, a Symbiont with an Extensive Glycobiome 
PLoS Biology  2013;11(8):e1001637.
Artificial human gut microbial communities implanted into germ-free mice provide insights into how species-level responses to changes in diet give rise to community-level structural and functional reconfiguration and how types of bacteria prioritize use of available nutrients in vivo.
The human gut microbiota is an important metabolic organ, yet little is known about how its individual species interact, establish dominant positions, and respond to changes in environmental factors such as diet. In this study, gnotobiotic mice were colonized with an artificial microbiota comprising 12 sequenced human gut bacterial species and fed oscillating diets of disparate composition. Rapid, reproducible, and reversible changes in the structure of this assemblage were observed. Time-series microbial RNA-Seq analyses revealed staggered functional responses to diet shifts throughout the assemblage that were heavily focused on carbohydrate and amino acid metabolism. High-resolution shotgun metaproteomics confirmed many of these responses at a protein level. One member, Bacteroides cellulosilyticus WH2, proved exceptionally fit regardless of diet. Its genome encoded more carbohydrate active enzymes than any previously sequenced member of the Bacteroidetes. Transcriptional profiling indicated that B. cellulosilyticus WH2 is an adaptive forager that tailors its versatile carbohydrate utilization strategy to available dietary polysaccharides, with a strong emphasis on plant-derived xylans abundant in dietary staples like cereal grains. Two highly expressed, diet-specific polysaccharide utilization loci (PULs) in B. cellulosilyticus WH2 were identified, one with characteristics of xylan utilization systems. Introduction of a B. cellulosilyticus WH2 library comprising >90,000 isogenic transposon mutants into gnotobiotic mice, along with the other artificial community members, confirmed that these loci represent critical diet-specific fitness determinants. Carbohydrates that trigger dramatic increases in expression of these two loci and many of the organism's 111 other predicted PULs were identified by RNA-Seq during in vitro growth on 31 distinct carbohydrate substrates, allowing us to better interpret in vivo RNA-Seq and proteomics data. These results offer insight into how gut microbes adapt to dietary perturbations at both a community level and from the perspective of a well-adapted symbiont with exceptional saccharolytic capabilities, and illustrate the value of artificial communities.
Author Summary
Our intestines are populated by an almost unimaginably large number of microbial cells, most of which are bacteria. This species assemblage operates as a microbial metabolic organ, performing myriad tasks that contribute to our well-being, including processing components of our diet. The way this incredible machine assembles itself and operates remains mysterious. One approach to understanding its properties is to create artificial communities composed of a limited number of sequenced human gut bacterial species and to install them in the guts of germ-free mice that are then fed different diets. In this report, we adopt this approach. We describe the genome sequence of a new gut bacterial isolate, Bacteroides cellulosilyticus WH2, which is equipped with an unprecedented number of carbohydrate active enzymes. Deploying four different “omics” technologies, we characterize the response to diet, the relative stability, and the temporal dynamics of a 12-species artificial bacterial assemblage (including B. cellulosilyticus WH2) implanted in germ-free mouse guts. We also combine high-throughput substrate utilization screens and RNA-Seq to generate reference data analogous to a “Rosetta stone” in order to decipher what types of carbohydrates B. cellulosilyticus encounters and uses within the gut, and how it interacts with other organisms that have similar and/or distinct “professions.” This work sets the stage for future ecological and metabolic studies of more complex assemblages that more fully emulate the properties of our native gut communities.
PMCID: PMC3747994  PMID: 23976882
21.  Prokaryotic Super Program Advisory Committee DOE Joint Genome Institute, Walnut Creek, CA, March 27, 2013 
Standards in Genomic Sciences  2013;8(3):561-570.
The Prokaryotic Super Program Advisory Committee met on March 27, 2013 for their annual review the Prokaryotic Super Program at the DOE Joint Genome Institute. As is the case with any site visit or program review, the objective is to evaluate progress in meeting organizational objectives, provide feedback to from the user-community and to assist the JGI in formulating plans for the coming year. The advisors want to commend the JGI for its central role in developing new technologies and capabilities, and for catalyzing the formation of new collaborative user communities. Highlights of the post-meeting exchanges among the advisors focused on the importance of programmatic initiatives including:
• GEBA, which serves as a phylogenetic “base-map” on which our knowledge of functional diversity can be layered.
• FEBA, which promises to provide new insights into the physiological capabilities of prokaryotes under highly standardized conditions.
• Single-cell genomics technology, which is seen to significantly enhance our ability to interpret genomic and metagenomic data and broaden the scope of the GEBA program to encompass at least a part of the microbial “dark-matter”.
• IMG, which is seen to play a central role in JGI programs and is viewed as a strategically important asset in the JGI portfolio.
On this latter point, the committee encourages the formation of a strategic relationship between IMG and the Kbase to ensure that the intelligence, deep knowledge and experience captured in the former is not lost. The committee strongly urges the DOE to continue its support for maintaining this critical resource.
PMCID: PMC3910701  PMID: 24501639
22.  Distributive Conjugal Transfer in Mycobacteria Generates Progeny with Meiotic-Like Genome-Wide Mosaicism, Allowing Mapping of a Mating Identity Locus 
PLoS Biology  2013;11(7):e1001602.
We find that genome-wide DNA transfer by conjugation in mycobacteria affords bacteria that reproduce by binary fission the same advantages of sexual reproduction, and may explain the genomic evolution of Mycobacterium tuberculosis.
Horizontal gene transfer (HGT) in bacteria generates variation and drives evolution, and conjugation is considered a major contributor as it can mediate transfer of large segments of DNA between strains and species. We previously described a novel form of chromosomal conjugation in mycobacteria that does not conform to classic oriT-based conjugation models, and whose potential evolutionary significance has not been evaluated. Here, we determined the genome sequences of 22 F1-generation transconjugants, providing the first genome-wide view of conjugal HGT in bacteria at the nucleotide level. Remarkably, mycobacterial recipients acquired multiple, large, unlinked segments of donor DNA, far exceeding expectations for any bacterial HGT event. Consequently, conjugal DNA transfer created extensive genome-wide mosaicism within individual transconjugants, which generated large-scale sibling diversity approaching that seen in meiotic recombination. We exploited these attributes to perform genome-wide mapping and introgression analyses to map a locus that determines conjugal mating identity in M. smegmatis. Distributive conjugal transfer offers a plausible mechanism for the predicted HGT events that created the genome mosaicism observed among extant Mycobacterium tuberculosis and Mycobacterium canettii species. Mycobacterial distributive conjugal transfer permits innovative genetic approaches to map phenotypic traits and confers the evolutionary benefits of sexual reproduction in an asexual organism.
Author Summary
Bacteria reproduce by binary fission, generating two clones of the original; this restricts the genomic diversity of the population, which brings with it inherent evolutionary drawbacks. This problem can be eased by conjugation, which transfers DNA from a donor to a recipient bacterium. Understanding the potential of conjugal DNA transfer for generating genetic diversity is necessary for estimating gene flow through populations and for predicting rates of bacterial evolution. The influence of chromosomal conjugal DNA transfer on mycobacterial diversity has not been previously addressed. Here, we determine and compare the complete genome sequences of independent progeny from bacterial matings between defined donor and recipient strains of Mycobacterium smegmatis. We find the resulting hybrid bacteria to be extremely diverse blends of the parental strains, reminiscent of the genetic mixing that occurs through meiotic recombination in sexual organisms. This novel mechanism of conjugation can create genome-wide mosaicism in a single event, generating segments of donor DNA that range from small (∼0.05 kb) to large (∼250 kb), widely distributed around the recipient chromosome. We exploit this mixing by using genetic tools originally developed for finding mammalian disease genes to locate the genes that confer a donor phenotype in M. smegmatis. We speculate that similar genomic mosaicism observed in pathogenic mycobacteria arose from conjugation between ancestral progenitor strains.
PMCID: PMC3706393  PMID: 23874149
23.  Gene Conservation among Endospore-Forming Bacteria Reveals Additional Sporulation Genes in Bacillus subtilis 
Journal of Bacteriology  2013;195(2):253-260.
The capacity to form endospores is unique to certain members of the low-G+C group of Gram-positive bacteria (Firmicutes) and requires signature sporulation genes that are highly conserved across members of distantly related genera, such as Clostridium and Bacillus. Using gene conservation among endospore-forming bacteria, we identified eight previously uncharacterized genes that are enriched among endospore-forming species. The expression of five of these genes was dependent on sporulation-specific transcription factors. Mutants of none of the genes exhibited a conspicuous defect in sporulation, but mutants of two, ylxY and ylyA, were outcompeted by a wild-type strain under sporulation-inducing conditions, but not during growth. In contrast, a ylmC mutant displayed a slight competitive advantage over the wild type specific to sporulation-inducing conditions. The phenotype of a ylyA mutant was ascribed to a defect in spore germination efficiency. This work demonstrates the power of combining phylogenetic profiling with reverse genetics and gene-regulatory studies to identify unrecognized genes that contribute to a conserved developmental process.
PMCID: PMC3553846  PMID: 23123912
24.  Bacteria-Human Somatic Cell Lateral Gene Transfer Is Enriched in Cancer Samples 
PLoS Computational Biology  2013;9(6):e1003107.
There are 10× more bacterial cells in our bodies from the microbiome than human cells. Viral DNA is known to integrate in the human genome, but the integration of bacterial DNA has not been described. Using publicly available sequence data from the human genome project, the 1000 Genomes Project, and The Cancer Genome Atlas (TCGA), we examined bacterial DNA integration into the human somatic genome. Here we present evidence that bacterial DNA integrates into the human somatic genome through an RNA intermediate, and that such integrations are detected more frequently in (a) tumors than normal samples, (b) RNA than DNA samples, and (c) the mitochondrial genome than the nuclear genome. Hundreds of thousands of paired reads support random integration of Acinetobacter-like DNA in the human mitochondrial genome in acute myeloid leukemia samples. Numerous read pairs across multiple stomach adenocarcinoma samples support specific integration of Pseudomonas-like DNA in the 5′-UTR and 3′-UTR of four proto-oncogenes that are up-regulated in their transcription, consistent with conversion to an oncogene. These data support our hypothesis that bacterial integrations occur in the human somatic genome and may play a role in carcinogenesis. We anticipate that the application of our approach to additional cancer genome projects will lead to the more frequent detection of bacterial DNA integrations in tumors that are in close proximity to the human microbiome.
Author Summary
There are 10× more bacterial cells in the human body than there are human cells that are part of the human microbiome. Many of those bacteria are in constant, intimate contact with human cells. We sought to establish if bacterial cells insert their own DNA into the human genome. Such random mutations could cause disease in the same manner that mutagens like UV rays from the sun or chemicals in cigarettes induce mutations. We detected the integration of bacterial DNA in the human genome more readily in tumors than normal samples. In particular, extensive amounts of DNA with similarity to Acinetobacter DNA were fused to human mitochondrial DNA in acute myeloid leukemia samples. We also identified specific integrations of DNA with similarity to Pseudomonas DNA near the untranslated regulatory regions of four proto-oncogenes. This supports our hypothesis that bacterial integrations occur in the human somatic genome that may potentially play a role in carcinogenesis. Further study in this area may provide new avenues for cancer prevention.
PMCID: PMC3688693  PMID: 23840181
25.  Draft Genome Sequence of Leucobacter sp. Strain UCD-THU (Phylum Actinobacteria) 
Genome Announcements  2013;1(3):e00325-13.
Here we present the draft genome of Leucobacter sp. strain UCD-THU. The genome contains 3,317,267 bp in 11 scaffolds. This strain was isolated from a residential toilet as part of an undergraduate project to sequence reference genomes of microbes from the built environment.
PMCID: PMC3675516  PMID: 23792744

