The current pathogen-typing methods have suboptimal sensitivities and specificities. DNA sequencing offers an opportunity to type pathogens with greater degrees of discrimination using single nucleotide polymorphisms (SNPs) than with pulsed-field gel electrophoresis (PFGE) and other methodologies. In a recent cluster of Escherichia coli O157:H7 infections attributed to salad bar exposures and romaine lettuce, a subset of cases denied exposure to either source, although PFGE and multiple-locus variable-number tandem-repeat analysis (MLVA) suggested that all isolates had the same recent progenitor. Interrogation of a preselected set of 3,442,673 nucleotides in backbone open reading frames (ORFs) identified only 1 or 2 single nucleotide differences in 3 of 12 isolates from the cases who denied exposure. The backbone DNAs of 9 of 9 and 3 of 3 cases who reported or were unsure about exposure, respectively, were isogenic. Backbone ORF SNP set sequencing offers pathogen differentiation capabilities that exceed those of PFGE and MLVA.
Determining bacterial abundance variation is the first step in understanding bacterial similarity between individuals. Categorization of bacterial communities into groups or community classes is the subsequent step in describing microbial distribution based on abundance patterns. Here, we present an analysis of the groupings of bacterial communities in stool, nasal, skin, vaginal and oral habitats in a healthy cohort of 236 subjects from the Human Microbiome Project.
We identify distinct community group patterns in the anterior nares, four skin sites, and vagina at the genus level. We also confirm three enterotypes previously identified in stools. We identify two clusters with low silhouette values in most oral sites, in which bacterial communities are more homogeneous. Subjects sharing a community class in one habitat do not necessarily share a community class in another, except in the three vaginal sites and the symmetric habitats of the left and right retroauricular creases. Demographic factors, including gender, age, and ethnicity, significantly influence community composition in several habitats. Community classes in the vagina, retroauricular crease and stool are stable over approximately 200 days.
The community composition, association of demographic factors with community classes, and demonstration of community stability deepen our understanding of the variability and dynamics of human microbiomes. This also has significant implications for experimental designs that seek microbial correlations with clinical phenotypes.
Background. Diabetic foot infections are a leading cause of lower extremity amputations. Our study examines the microbiota of diabetic skin prior to ulcer development or infection.
Methods. In a case-control study, outpatient males were recruited at a veterans hospital. Subjects were swabbed at 4 cutaneous sites, 1 on the forearm and 3 on the foot. Quantitative polymerase chain reaction (qPCR) with primers and probes specific for bacteria, Staphylococcus species, Staphylococcus aureus, and fungi were performed on all samples. High-throughput 16S ribosomal RNA (rRNA) sequencing was performed on samples from the forearm and the plantar aspect of the foot.
Results. qPCR analysis of swab specimens from 30 diabetic subjects and 30 control subjects showed no differences in total numbers of bacteria or fungi at any sampled site. Increased log10 concentrations of Staphylococcus aureus, quantified by the number of nuc gene copies, were present in diabetic men on the plantar aspect of the foot. High-throughput 16S rRNA sequencing found that, on the foot, the microbiota in controls (n = 24) was dominated by Staphylococcus species, whereas the microbiota in diabetics (n = 23) was more diverse at the genus level. The forearm microbiota had similar diversity in diabetic and control groups.
Conclusions. The feet of diabetic men had decreased populations of Staphylococcus species, increased populations of S. aureus, and increased bacterial diversity, compared with the feet of controls. These ecologic changes may affect the risk for wound infections.
microbiota; microbiome; diabetic foot; cutaneous; Staphylococcus; Staphylococcus aureus
The human skin microbiome plays important roles in skin health and
disease. However, bacterial population structure and diversity at the strain
level is poorly understood. We compared the skin microbiome at the strain level
and genome level of Propionibacterium acnes, a dominant skin
commensal, between 49 acne patients and 52 healthy individuals by sampling the
pilosebaceous units on their noses. Metagenomic analysis demonstrated that while
the relative abundances of P. acnes were similar, the strain
population structures were significantly different in the two cohorts. Certain
strains were highly associated with acne and other strains were enriched in
healthy skin. By sequencing 66 previously unreported P. acnes
strains and comparing 71 P. acnes genomes, we identified
potential genetic determinants of various P. acnes strains in
association with acne or health. Our analysis suggests that acquired DNA
sequences and bacterial immune elements may play roles in determining virulence
properties of P. acnes strains and some could be future targets
for therapeutic interventions. This study demonstrates a previously unreported
paradigm of commensal strain populations that could explain the pathogenesis of
human diseases. It underscores the importance of strain level analysis of the
human microbiome to define the role of commensals in health and disease.
The dynamics of adaptation determines which mutations fix in a population, and hence how reproducible evolution will be. This is central to understanding the spectra of mutations recovered in evolution of antibiotic resistance1, the response of pathogens to immune selection2,3, and the dynamics of cancer progression4,5. In laboratory evolution experiments, demonstrably beneficial mutations are found repeatedly6–8, but are often accompanied by other mutations with no obvious benefit. Here we use whole-genome whole-population sequencing to examine the dynamics of genome sequence evolution at high temporal resolution in 40 replicate Saccharomyces cerevisiae populations growing in rich medium for 1,000 generations. We find pervasive genetic hitchhiking: multiple mutations arise and move synchronously through the population as mutational “cohorts.” Multiple clonal cohorts are often present simultaneously, competing with each other in the same population. Our results show that patterns of sequence evolution are driven by a balance between these chance effects of hitchhiking and interference, which increase stochastic variation in evolutionary outcomes, and the deterministic action of selection on individual mutations, which favors parallel evolutionary solutions in replicate populations.
Treponema pallidum ssp. pallidum (TPA), the causative agent of syphilis, is a highly clonal bacterium showing minimal genetic variability in the genome sequence of individual strains. Nevertheless, genetically characterized syphilis strains can be clearly divided into two groups, Nichols-like strains and SS14-like strains. TPA Nichols and SS14 strains were completely sequenced in 1998 and 2008, respectively. Since publication of their complete genome sequences, a number of sequencing errors in each genome have been reported. Therefore, we have resequenced TPA Nichols and SS14 strains using next-generation sequencing techniques.
The genomes of TPA strains Nichols and SS14 were resequenced using the 454 and Illumina sequencing methods that have a combined average coverage higher than 90x. In the TPA strain Nichols genome, 134 errors were identified (25 substitutions and 109 indels), and 102 of them affected protein sequences. In the TPA SS14 genome, a total of 191 errors were identified (85 substitutions and 106 indels) and 136 of them affected protein sequences. A set of new intrastrain heterogenic regions in the TPA SS14 genome were identified including the tprD gene, where both tprD and tprD2 alleles were found. The resequenced genomes of both TPA Nichols and SS14 strains clustered more closely with related strains (i.e. strains belonging to same syphilis treponeme subcluster). At the same time, groups of Nichols-like and SS14-like strains were found to be more distantly related.
We identified errors in 11.5% of all annotated genes and, after correction, we found a significant impact on the predicted proteomes of both Nichols and SS14 strains. Corrections of these errors resulted in protein elongations, truncations, fusions and indels in more than 11% of all annotated proteins. Moreover, it became more evident that syphilis is caused by treponemes belonging to two separate genetic subclusters.
The DNA sequences of chromosomes I and II of Rhodobacter sphaeroides strain 2.4.1 have been revised, and the annotation of the entire genomic sequence, including both chromosomes and the five plasmids, has been updated. Errors in the originally published sequence have been corrected, and ∼11% of the coding regions in the original sequence have been affected by the revised annotation.
Propionibacterium acnes constitutes a major part of the skin microbiome and contributes to human health. However, it has also been implicated as a pathogenic factor in several diseases, including acne, one of the most common skin diseases. Its pathogenic role, however, remains elusive. To better understand the genetic landscape and diversity of the organism and its role in human health and disease, we performed a comparative genome analysis of 82 P. acnes strains, 69 of which were sequenced by our group. This collection covers all known P. acnes lineages, including types IA, IB, II, and III. Our analysis demonstrated that although the P. acnes pan-genome is open, it is relatively small and expands slowly. The core regions, shared by all the sequenced genomes, accounted for 88% of the average genome. Comparative genome analysis showed that within each lineage, the strains isolated from the same individuals were more closely related than the ones isolated from different individuals, suggesting that clonal expansions occurred within each individual microbiome. We also identified the genetic elements specific to each lineage. Differences in harboring these elements may explain the phenotypic and functional differences of P. acnes in functioning as a commensal in healthy skin and as a pathogen in diseases. Our findings of the differences among P. acnes strains at the genome level underscore the importance of identifying the human microbiome variations at the strain level in understanding its association with diseases and provide insight into novel and personalized therapeutic approaches for P. acnes-related diseases.
Propionibacterium acnes is a major human skin bacterium. It plays an important role in maintaining skin health. However, it has also been hypothesized to be a pathogenic factor in several diseases, including acne, a common skin disease affecting 85% of teenagers. To understand whether different strains have different virulent properties and thus play different roles in health and diseases, we compared the genomes of 82 P. acnes strains, most of which were isolated from acne or healthy skin. We identified lineage-specific genetic elements that may explain the phenotypic and functional differences of P. acnes as a commensal in health and as a pathogen in diseases. By analyzing a large number of sequenced strains, we provided an improved understanding of the genetic landscape and diversity of the organism at the strain level and at the molecular level that can be further applied in the development of new and personalized therapies.
Unclassified simian strain Treponema Fribourg-Blanc was isolated in 1966 from baboons (Papio cynocephalus) in West Africa. This strain was morphologically indistinguishable from T. pallidum ssp. pallidum or ssp. pertenue strains, and it was shown to cause human infections.
To precisely define genetic differences between Treponema Fribourg-Blanc (unclassified simian isolate, FB) and T. pallidum ssp. pertenue strains (TPE), a high quality sequence of the whole Fribourg-Blanc genome was determined with 454-pyrosequencing and Illumina sequencing platforms. Combined average coverage of both methods was greater than 500×. Restriction target sites (n = 1,773), identified in silico, of selected restriction enzymes within the Fribourg-Blanc genome were verified experimentally and no discrepancies were found. When compared to the other three sequenced TPE genomes (Samoa D, CDC-2, Gauthier), no major genome rearrangements were found. The Fribourg-Blanc genome clustered with other TPE strains (especially with the TPE CDC-2 strain), while T. pallidum ssp. pallidum strains clustered separately as well as the genome of T. paraluiscuniculi strain Cuniculi A. Within coding regions, 6 deletions, 5 insertions and 117 substitutions differentiated Fribourg-Blanc from other TPE genomes.
The Fribourg-Blanc genome showed similar genetic characteristics as other TPE strains. Therefore, we propose to rename the unclassified simian isolate to Treponema pallidum ssp. pertenue strain Fribourg-Blanc. Since the Fribourg-Blanc strain was shown to cause experimental infection in human hosts, non-human primates could serve as possible reservoirs of TPE strains. This could considerably complicate recent efforts to eradicate yaws. Genetic differences specific for Fribourg-Blanc could then contribute for identification of cases of animal-derived yaws infections.
A bacterial strain isolated in 1966 from baboons (Papio cynocephalus) in West Africa was preliminarily characterized as unclassified simian strain Treponema Fribourg-Blanc (FB). This strain was morphologically identical to T. pallidum ssp. pallidum (TPA, agent of syphilis) or ssp. pertenue (TPE, agent of yaws). In this study, we completed a high quality whole genome sequence of simian isolate Treponema Fribourg-Blanc and compared it to known genome sequences of Treponema pallidum strains. No major differences in the gene order of the FB genome were found when compared to all known genomes of Treponema pallidum subspecies. Moreover, the FB genome clustered with other TPE strains, while T. pallidum ssp. pallidum strains clustered separately. In general, the FB genome showed similar genetic characteristics to other TPE strains. Therefore, we proposed that the simian isolate Fribourg-Blanc be classified as a bacterial strain belonging to Treponema pallidum ssp. pertenue. It appears that, except for humans, the reservoir of yaws-causing treponemes may also include free-living primates, especially in Africa.
Characterizing the biogeography of the microbiome of healthy humans is essential for understanding microbial associated diseases. Previous studies mainly focused on a single body habitat from a limited set of subjects. Here, we analyzed one of the largest microbiome datasets to date and generated a biogeographical map that annotates the biodiversity, spatial relationships, and temporal stability of 22 habitats from 279 healthy humans.
We identified 929 genera from more than 24 million 16S rRNA gene sequences of 22 habitats, and we provide a baseline of inter-subject variation for healthy adults. The oral habitat has the most stable microbiota with the highest alpha diversity, while the skin and vaginal microbiota are less stable and show lower alpha diversity. The level of biodiversity in one habitat is independent of the biodiversity of other habitats in the same individual. The abundances of a given genus at a body site in which it dominates do not correlate with the abundances at body sites where it is not dominant. Additionally, we observed the human microbiota exhibit both cosmopolitan and endemic features. Finally, comparing datasets of different projects revealed a project-based clustering pattern, emphasizing the significance of standardization of metagenomic studies.
The data presented here extend the definition of the human microbiome by providing a more complete and accurate picture of human microbiome biogeography, addressing questions best answered by a large dataset of subjects and body sites that are deeply sampled by sequencing.
Biogeography; Human microbiome; Biodiversity; Temporal stability
This paper presents new biostatistical methods for the analysis of microbiome data based on a fully parametric approach using all the data. The Dirichlet-multinomial distribution allows the analyst to calculate power and sample sizes for experimental design, perform tests of hypotheses (e.g., compare microbiomes across groups), and to estimate parameters describing microbiome properties. The use of a fully parametric model for these data has the benefit over alternative non-parametric approaches such as bootstrapping and permutation testing, in that this model is able to retain more information contained in the data. This paper details the statistical approaches for several tests of hypothesis and power/sample size calculations, and applies them for illustration to taxonomic abundance distribution and rank abundance distribution data using HMP Jumpstart data on 24 subjects for saliva, subgingival, and supragingival samples. Software for running these analyses is available.
A variety of microbial communities and their genes (microbiome) exist throughout the human body, playing fundamental roles in human health and disease. The NIH funded Human Microbiome Project (HMP) Consortium has established a population-scale framework which catalyzed significant development of metagenomic protocols resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomic data available to the scientific community. Here we present resources from a population of 242 healthy adults sampled at 15 to 18 body sites up to three times, which to date, have generated 5,177 microbial taxonomic profiles from 16S rRNA genes and over 3.5 Tb of metagenomic sequence. In parallel, approximately 800 human-associated reference genomes have been sequenced. Collectively, these data represent the largest resource to date describing the abundance and variety of the human microbiome, while providing a platform for current and future studies.
The Anelloviridae family consists of non-enveloped, circular, single-stranded DNA viruses. Three genera of anellovirus are known to infect humans, named TTV, TTMDV, and TTMV. Although anelloviruses were initially thought to cause non-A-G viral hepatitis, continued research has shown no definitive associations between anellovirus and human disease to date. Using high-throughput sequencing, we investigated the association between anelloviruses and fever in pediatric patients 2–36 months of age. We determined that although anelloviruses were present in a large number of specimens from both febrile and afebrile patients, they were more prevalent in the plasma and nasopharyngeal (NP) specimens of febrile patients compared to afebrile controls. Using PCR to detect each of the three species of anellovirus that infect humans, we found that anellovirus species TTV and TTMDV were more prevalent in the plasma and NP specimens of febrile patients compared to afebrile controls. This was not the case for species TTMV which was found in similar percentages of febrile and afebrile patient specimens. Analysis of patient age showed that the percentage of plasma and NP specimens containing anellovirus increased with age until patients were 19–24 months of age, after which the percentage of anellovirus positive patient specimens dropped. This trend was striking for TTV and TTMDV and very modest for TTMV in both plasma and NP specimens. Finally, as the temperature of febrile patients increased, so too did the frequency of TTV and TTMDV detection. Again, TTMV was equally present in both febrile and afebrile patient specimens. Taken together these data indicate that the human anellovirus species TTV and TTMDV are associated with fever in children, while the highly related human anellovirus TTMV has no association with fever.
Human microbiome research characterizes the microbial content of samples from human habitats to learn how interactions between bacteria and their host might impact human health. In this work a novel parametric statistical inference method based on object-oriented data analysis (OODA) for analyzing HMP data is proposed. OODA is an emerging area of statistical inference where the goal is to apply statistical methods to objects such as functions, images, and graphs or trees. The data objects that pertain to this work are taxonomic trees of bacteria built from analysis of 16S rRNA gene sequences (e.g. using RDP); there is one such object for each biological sample analyzed. Our goal is to model and formally compare a set of trees. The contribution of our work is threefold: first, a weighted tree structure to analyze RDP data is introduced; second, using a probability measure to model a set of taxonomic trees, we introduce an approximate MLE procedure for estimating model parameters and we derive LRT statistics for comparing the distributions of two metagenomic populations; and third the Jumpstart HMP data is analyzed using the proposed model providing novel insights and future directions of analysis.
Linkage testing using Affymetrix 6.0 SNP Arrays mapped the disease locus in TCD-G, an Irish family with autosomal dominant retinitis pigmentosa (adRP), to an 8.8 Mb region on 1p31. Of 50 known genes in the region, 11 candidates, including RPE65 and PDE4B, were sequenced using di-deoxy capillary electrophoresis. Simultaneously, a subset of family members was analyzed using Agilent SureSelect All Exome capture, followed by sequencing on an Illumina GAIIx platform. Candidate gene and exome sequencing resulted in the identification of an Asp477Gly mutation in exon 13 of the RPE65 gene tracking with the disease in TCD-G. All coding exons of genes not sequenced to sufficient depth by next generation sequencing were sequenced by di-deoxy sequencing. No other potential disease-causing variants were found to segregate with disease in TCD-G. The Asp477Gly mutation was not present in Irish controls, but was found in a second Irish family provisionally diagnosed with choroideremia, bringing the combined maximum two-point LOD score to 5.3. Mutations in RPE65 are a known cause of recessive Leber congenital amaurosis (LCA) and recessive RP, but no dominant mutations have been reported. Protein modeling suggests that the Asp477Gly mutation may destabilize protein folding, and mutant RPE65 protein migrates marginally faster on SDS-PAGE, compared with wild type. Gene therapy for LCA patients with RPE65 mutations has shown great promise, raising the possibility of related therapies for dominant-acting mutations in this gene.
retinitis pigmentosa; choroideremia; RPE65; exome capture; next-generation sequencing
The goal of the Human Microbiome Project (HMP) is to generate a comprehensive catalog of human-associated microorganisms including reference genomes representing the most common species. Toward this goal, the HMP has characterized the microbial communities at 18 body habitats in a cohort of over 200 healthy volunteers using 16S rRNA gene (16S) sequencing and has generated nearly 1,000 reference genomes from human-associated microorganisms. To determine how well current reference genome collections capture the diversity observed among the healthy microbiome and to guide isolation and future sequencing of microbiome members, we compared the HMP’s 16S data sets to several reference 16S collections to create a ‘most wanted’ list of taxa for sequencing. Our analysis revealed that the diversity of commonly occurring taxa within the HMP cohort microbiome is relatively modest, few novel taxa are represented by these OTUs and many common taxa among HMP volunteers recur across different populations of healthy humans. Taken together, these results suggest that it should be possible to perform whole-genome sequencing on a large fraction of the human microbiome, including the ‘most wanted’, and that these sequences should serve to support microbiome studies across multiple cohorts. Also, in stark contrast to other taxa, the ‘most wanted’ organisms are poorly represented among culture collections suggesting that novel culture- and single-cell-based methods will be required to isolate these organisms for sequencing.
Enterococci are among the leading causes of hospital-acquired infections in the United States and Europe, with Enterococcus faecalis and Enterococcus faecium being the two most common species isolated from enterococcal infections. In the last decade, the proportion of enterococcal infections caused by E. faecium has steadily increased compared to other Enterococcus species. Although the underlying mechanism for the gradual replacement of E. faecalis by E. faecium in the hospital environment is not yet understood, many studies using genotyping and phylogenetic analysis have shown the emergence of a globally dispersed polyclonal subcluster of E. faecium strains in clinical environments. Systematic study of the molecular epidemiology and pathogenesis of E. faecium has been hindered by the lack of closed, complete E. faecium genomes that can be used as references.
In this study, we report the complete genome sequence of the E. faecium strain TX16, also known as DO, which belongs to multilocus sequence type (ST) 18, and was the first E. faecium strain ever sequenced. Whole genome comparison of the TX16 genome with 21 E. faecium draft genomes confirmed that most clinical, outbreak, and hospital-associated (HA) strains (including STs 16, 17, 18, and 78), in addition to strains of non-hospital origin, group in the same clade (referred to as the HA clade) and are evolutionally considerably more closely related to each other by phylogenetic and gene content similarity analyses than to isolates in the community-associated (CA) clade with approximately a 3–4% average nucleotide sequence difference between the two clades at the core genome level. Our study also revealed that many genomic loci in the TX16 genome are unique to the HA clade. 380 ORFs in TX16 are HA-clade specific and antibiotic resistance genes are enriched in HA-clade strains. Mobile elements such as IS16 and transposons were also found almost exclusively in HA strains, as previously reported.
Our findings along with other studies show that HA clonal lineages harbor specific genetic elements as well as sequence differences in the core genome which may confer selection advantages over the more heterogeneous CA E. faecium isolates. Which of these differences are important for the success of specific E. faecium lineages in the hospital environment remain(s) to be determined.
We tested the hypothesis that Crohn’s disease (CD)-related genetic polymorphisms involved in host innate immunity are associated with shifts in human ileum–associated microbial composition in a cross-sectional analysis of human ileal samples. Sanger sequencing of the bacterial 16S ribosomal RNA (rRNA) gene and 454 sequencing of 16S rRNA gene hypervariable regions (V1–V3 and V3–V5), were conducted on macroscopically disease-unaffected ileal biopsies collected from 52 ileal CD, 58 ulcerative colitis and 60 control patients without inflammatory bowel diseases (IBD) undergoing initial surgical resection. These subjects also were genotyped for the three major NOD2 risk alleles (Leu1007fs, R708W, G908R) and the ATG16L1 risk allele (T300A). The samples were linked to clinical metadata, including body mass index, smoking status and Clostridia difficile infection. The sequences were classified into seven phyla/subphyla categories using the Naïve Bayesian Classifier of the Ribosome Database Project. Centered log ratio transformation of six predominant categories was included as the dependent variable in the permutation based MANCOVA for the overall composition with stepwise variable selection. Polymerase chain reaction (PCR) assays were conducted to measure the relative frequencies of the Clostridium coccoides – Eubacterium rectales group and the Faecalibacterium prausnitzii spp. Empiric logit transformations of the relative frequencies of these two microbial groups were included in permutation-based ANCOVA. Regardless of sequencing method, IBD phenotype, Clostridia difficile and NOD2 genotype were selected as associated (FDR ≤0.05) with shifts in overall microbial composition. IBD phenotype and NOD2 genotype were also selected as associated with shifts in the relative frequency of the C. coccoides – E. rectales group. IBD phenotype, smoking and IBD medications were selected as associated with shifts in the relative frequency of F. prausnitzii spp. These results indicate that the effects of genetic and environmental factors on IBD are mediated at least in part by the enteric microbiota.
The aim of this study was to integrate human clinical, genotype, mRNA microarray and 16 S rRNA sequence data collected on 84 subjects with ileal Crohn’s disease, ulcerative colitis or control patients without inflammatory bowel diseases in order to interrogate how host-microbial interactions are perturbed in inflammatory bowel diseases (IBD). Ex-vivo ileal mucosal biopsies were collected from the disease unaffected proximal margin of the ileum resected from patients who were undergoing initial intestinal surgery. Both RNA and DNA were extracted from the mucosal biopsy samples. Patients were genotyped for the three major NOD2 variants (Leufs1007, R702W, and G908R) and the ATG16L1T300A variant. Whole human genome mRNA expression profiles were generated using Agilent microarrays. Microbial composition profiles were determined by 454 pyrosequencing of the V3–V5 hypervariable region of the bacterial 16 S rRNA gene. The results of permutation based multivariate analysis of variance and covariance (MANCOVA) support the hypothesis that host mucosal Paneth cell and xenobiotic metabolism genes play an important role in host microbial interactions.
Unexplained fever (UF) is a common problem in children under 3 years old. Although virus infection is suspected to be the cause of most of these fevers, a comprehensive analysis of viruses in samples from children with fever and healthy controls is important for establishing a relationship between viruses and UF. We used unbiased, deep sequencing to analyze 176 nasopharyngeal swabs (NP) and plasma samples from children with UF and afebrile controls, generating an average of 4.6 million sequences per sample. An analysis pipeline was developed to detect viral sequences, which resulted in the identification of sequences from 25 viral genera. These genera included expected pathogens, such as adenoviruses, enteroviruses, and roseoloviruses, plus viruses with unknown pathogenicity. Viruses that were unexpected in NP and plasma samples, such as the astrovirus MLB-2, were also detected. Sequencing allowed identification of virus subtype for some viruses, including roseoloviruses. Highly sensitive PCR assays detected low levels of viruses that were not detected in approximately 5 million sequences, but greater sequencing depth improved sensitivity. On average NP and plasma samples from febrile children contained 1.5- to 5-fold more viral sequences, respectively, than samples from afebrile children. Samples from febrile children contained a broader range of viral genera and contained multiple viral genera more frequently than samples from children without fever. Differences between febrile and afebrile groups were most striking in the plasma samples, where detection of viral sequence may be associated with a disseminated infection. These data indicate that virus infection is associated with UF. Further studies are important in order to establish the range of viral pathogens associated with fever and to understand of the role of viral infection in fever. Ultimately these studies may improve the medical treatment of children with UF by helping avoid antibiotic therapy for children with viral infections.
The Human Microbiome Project (HMP) aims to characterize the microbial communities of 18 body sites from healthy individuals. To accomplish this, the HMP generated two types of shotgun data: reference shotgun sequences isolated from different anatomical sites on the human body and shotgun metagenomic sequences from the microbial communities of each site. The alignment strategy for characterizing these metagenomic communities using available reference sequence is important to the success of HMP data analysis. Six next-generation aligners were used to align a community of known composition against a database comprising reference organisms known to be present in that community. All aligners report nearly complete genome coverage (>97%) for strains with over 6X depth of coverage, however they differ in speed, memory requirement and ease of use issues such as database size limitations and supported mapping strategies. The selected aligner was tested across a range of parameters to maximize sensitivity while maintaining a low false positive rate. We found that constraining alignment length had more impact on sensitivity than does constraining similarity in all cases tested. However, when reference species were replaced with phylogenetic neighbors, similarity begins to play a larger role in detection. We also show that choosing the top hit randomly when multiple, equally strong mappings are available increases overall sensitivity at the expense of taxonomic resolution. The results of this study identified a strategy that was used to map over 3 tera-bases of microbial sequence against a database of more than 5,000 reference genomes in just over a month.
The human gut harbors thousands of bacterial taxa. A profusion of metagenomic sequence data has been generated from human stool samples in the last few years, raising the question of whether more taxa remain to be identified. We assessed metagenomic data generated by the Human Microbiome Project Consortium to determine if novel taxa remain to be discovered in stool samples from healthy individuals. To do this, we established a rigorous bioinformatics pipeline that uses sequence data from multiple platforms (Illumina GAIIX and Roche 454 FLX Titanium) and approaches (whole-genome shotgun and 16S rDNA amplicons) to validate novel taxa. We applied this approach to stool samples from 11 healthy subjects collected as part of the Human Microbiome Project. We discovered several low-abundance, novel bacterial taxa, which span three major phyla in the bacterial tree of life. We determined that these taxa are present in a larger set of Human Microbiome Project subjects and are found in two sampling sites (Houston and St. Louis). We show that the number of false-positive novel sequences (primarily chimeric sequences) would have been two orders of magnitude higher than the true number of novel taxa without validation using multiple datasets, highlighting the importance of establishing rigorous standards for the identification of novel taxa in metagenomic data. The majority of novel sequences are related to the recently discovered genus Barnesiella, further encouraging efforts to characterize the members of this genus and to study their roles in the microbial communities of the gut. A better understanding of the effects of less-abundant bacteria is important as we seek to understand the complex gut microbiome in healthy individuals and link changes in the microbiome to disease.
Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences—the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The ‘environmental packages’ apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.
Lactobacillus-dominated vaginal microbiotas are associated with reproductive health and STI resistance in women, whereas altered microbiotas are associated with bacterial vaginosis (BV), STI risk and poor reproductive outcomes. Putative vaginal taxa have been observed in male first-catch urine, urethral swab and coronal sulcus (CS) specimens but the significance of these observations is unclear. We used 16 S rRNA sequencing to characterize the microbiota of the CS and urine collected from 18 adolescent men over three consecutive months. CS microbiotas of most participants were more stable than their urine microbiotas and the composition of CS microbiotas were strongly influenced by circumcision. BV-associated taxa, including Atopobium, Megasphaera, Mobiluncus, Prevotella and Gemella, were detected in CS specimens from sexually experienced and inexperienced participants. In contrast, urine primarily contained taxa that were not abundant in CS specimens. Lactobacilllus and Streptococcus were major urine taxa but their abundance was inversely correlated. In contrast, Sneathia, Mycoplasma and Ureaplasma were only found in urine from sexually active participants. Thus, the CS and urine support stable and distinct bacterial communities. Finally, our results suggest that the penis and the urethra can be colonized by a variety of BV-associated taxa and that some of these colonizations result from partnered sexual activity.