Smoking has been identified in observational studies as a risk factor for bacterial vaginosis (BV), a condition defined in part by decimation of Lactobacillus spp. The anti-estrogenic effect of smoking and trace amounts of benzo[a]pyrene diol epoxide (BPDE) may predispose women to BV. BPDE increases bacteriophage induction in Lactobacillus spp. and is found in the vaginal secretions of smokers. We compared the vaginal microbiota between smokers and non-smokers and followed microbiota changes in a smoking cessation pilot study.
In 2010–2011, 20 smokers and 20 non-smokers were recruited to a cross-sectional study (Phase A) and 9 smokers were enrolled and followed for a 12-week smoking cessation program (Phase B). Phase B included weekly behavioral counseling and nicotine patches to encourage smoking cessation. In both phases, participants self-collected mid-vaginal swabs (daily, Phase B) and completed behavioral surveys. Vaginal bacterial composition was characterized by pyrosequencing of barcoded 16S rRNA genes (V1-V3 regions). Vaginal smears were assigned Nugent Gram stain scores. Smoking status was evaluated (weekly, Phase B) using the semi-quantitative NicAlert® saliva cotinine test and carbon monoxide (CO) exhalation.
In phase A, there was a significant trend for increasing saliva cotinine and CO exhalation with elevated Nugent scores (P value <0.005). Vaginal microbiota clustered into three community state types (CSTs); two dominated by Lactobacillus (L. iners, L. crispatus), and one lacking significant numbers of Lactobacillus spp. and characterized by anaerobes (termed CST-IV). Women who were observed in the low-Lactobacillus CST-IV state were 25-fold more likely to be smokers than those dominated by L. crispatus (aOR: 25.61, 95 % CI: 1.03-636.61). Four women completed Phase B. One of three who entered smoking cessation with high Nugent scores demonstrated a switch from CST-IV to a L.iners-dominated profile with a concomitant drop in Nugent scores which coincided with completion of nicotine patches. The other two women fluctuated between CST-IV and L. iners-dominated CSTs. The fourth woman had low Nugent scores with L. crispatus-dominated CSTs throughout.
Smokers had a lower proportion of vaginal Lactobacillus spp. compared to non-smokers. Smoking cessation should be investigated as an adjunct to reducing recurrent BV. Larger studies are needed to confirm these findings.
Vaginal microbiota; Smoking cessation; Cigarette; 16S rRNA gene analysis; Bacterial vaginosis
The bacterial genus Borrelia (phylum Spirochaetes) consists of two groups of pathogens represented respectively by B. burgdorferi, the agent of Lyme borreliosis, and B. hermsii, the agent of tick-borne relapsing fever. The number of publicly available Borrelia genomic sequences is growing rapidly with the discovery and sequencing of Borrelia strains worldwide. There is however a lack of dedicated online databases to facilitate comparative analyses of Borrelia genomes.
We have developed BorreliaBase, an online database for comparative browsing of Borrelia genomes. The database is currently populated with sequences from 35 genomes of eight Lyme-borreliosis (LB) group Borrelia species and 7 Relapsing-fever (RF) group Borrelia species. Distinct from genome repositories and aggregator databases, BorreliaBase serves manually curated comparative-genomic data including genome-based phylogeny, genome synteny, and sequence alignments of orthologous genes and intergenic spacers.
With a genome phylogeny at its center, BorreliaBase allows online identification of hypervariable lipoprotein genes, potential regulatory elements, and recombination footprints by providing evolution-based expectations of sequence variability at each genomic locus. The phylo-centric design of BorreliaBase (http://borreliabase.org) is a novel model for interactive browsing and comparative analysis of bacterial genomes online.
Lyme disease; Vector-borne relapsing fever; Genome browser; Recombination; Population genomics
High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium’s minimal information (MIxS) and NCBI’s BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant.
The function of the appendix is largely unknown, but its microbiota likely contributes to function. Alterations in microbiota may contribute to appendicitis, but conventional culture studies have not yielded conclusive information. We conducted a pilot, culture-independent 16S rRNA-based microbiota study of paired appendix and rectal samples.
We collected appendix and rectal swabs from 21 children undergoing appendectomy, six with normal appendices and fifteen with appendicitis (nine perforated). After DNA extraction, we amplified and sequenced 16S rRNA genes and analyzed sequences using CLoVR. We identified organisms differing in relative abundance using ANOVA (p<0.05) by location (appendix vs. rectum), disease (appendicitis vs. normal), and disease severity (perforated vs. non-perforated).
We identified 290 taxa in the study's samples. Three taxa were significantly increased in normal appendices vs. normal rectal samples: Fusibacter (p = 0.009), Selenomonas (p = 0.026), and Peptostreptococcus (p = 0.049). Five taxa were increased in abundance in normal vs. diseased appendices: Paenibacillaceae (p = 0.005), Acidobacteriaceae GP4 (p = 0.019), Pseudonocardinae (p = 0.019), Bergeyella (p = 0.019) and Rhizobium (p = 0.045). Twelve taxa were increased in the appendices of appendicitis patients vs. normal appendix: Peptostreptococcus (p = 0.0003), Bilophila (p = 0.0004), Bulleidia (p = 0.012), Fusobacterium (p = 0.018), Parvimonas (p = 0.003), Mogibacterium (p = 0.012), Aminobacterium (p = 0.019), Proteus (p = 0.028), Actinomycineae (p = 0.028), Anaerovorax (p = 0.041), Anaerofilum (p = 0.045), Porphyromonas (p = 0.010). Five taxa were increased in appendices in patients with perforated vs. nonperforated appendicitis: Bulleidia (p = 0.004), Fusibacter (p = 0.005), Prevotella (p = 0.021), Porphyromonas (p = 0.030), Dialister (p = 0.035). Three taxa were increased in rectum samples of patients with appendicitis compared to the normal patients: Bulleidia (p = 0.034), Dialister (p = 0.003), and Porphyromonas (p = 0.026).
Specific taxa are more abundant in normal appendices compared to the rectum, suggesting that a distinctive appendix microbiota exists. Taxa with altered abundance in diseased and severely diseased (perforated) samples may contribute to appendicitis pathogenesis, and may provide microbial signatures in the rectum useful for guiding both treatment and diagnosis of appendicitis.
Lyme disease is caused by spirochete bacteria from the Borrelia burgdorferi sensu lato (B. burgdorferi s.l.) species complex. To reconstruct the evolution of B. burgdorferi s.l. and identify the genomic basis of its human virulence, we compared the genomes of 23 B. burgdorferi s.l. isolates from Europe and the United States, including B. burgdorferi sensu stricto (B. burgdorferi s.s., 14 isolates), B. afzelii (2), B. garinii (2), B. “bavariensis” (1), B. spielmanii (1), B. valaisiana (1), B. bissettii (1), and B. “finlandensis” (1).
Robust B. burgdorferi s.s. and B. burgdorferi s.l. phylogenies were obtained using genome-wide single-nucleotide polymorphisms, despite recombination. Phylogeny-based pan-genome analysis showed that the rate of gene acquisition was higher between species than within species, suggesting adaptive speciation. Strong positive natural selection drives the sequence evolution of lipoproteins, including chromosomally-encoded genes 0102 and 0404, cp26-encoded ospC and b08, and lp54-encoded dbpA, a07, a22, a33, a53, a65. Computer simulations predicted rapid adaptive radiation of genomic groups as population size increases.
Intra- and inter-specific pan-genome sizes of B. burgdorferi s.l. expand linearly with phylogenetic diversity. Yet gene-acquisition rates in B. burgdorferi s.l. are among the lowest in bacterial pathogens, resulting in high genome stability and few lineage-specific genes. Genome adaptation of B. burgdorferi s.l. is driven predominantly by copy-number and sequence variations of lipoprotein genes. New genomic groups are likely to emerge if the current trend of B. burgdorferi s.l. population expansion continues.
Borrelia burgdorferi; Lyme borreliosis; Pan-genome; Single-nucleotide polymorphisms; Phylogenetic tree; Genome evolution simulation
Crohn's disease (CD) is an inflammatory bowel disease of complex etiology, although dysbiosis of the gut microbiota has been implicated in chronic immune-mediated inflammation associated with CD. Here we combined shotgun metagenomic and metaproteomic approaches to identify potential functional signatures of CD in stool samples from six twin pairs that were either healthy, or that had CD in the ileum (ICD) or colon (CCD). Integration of these omics approaches revealed several genes, proteins, and pathways that primarily differentiated ICD from healthy subjects, including depletion of many proteins in ICD. In addition, the ICD phenotype was associated with alterations in bacterial carbohydrate metabolism, bacterial-host interactions, as well as human host-secreted enzymes. This eco-systems biology approach underscores the link between the gut microbiota and functional alterations in the pathophysiology of Crohn's disease and aids in identification of novel diagnostic targets and disease specific biomarkers.
Chronic wounds contain complex polymicrobial communities of sessile organisms that have been underappreciated because of limitations of standard culture techniques. The aim of this work is to combine recently developed next-generation investigative techniques to comprehensively describe the microbial characteristics of chronic wounds. Tissue samples were obtained from 15 patients with chronic wounds presenting to the Johns Hopkins Wound Center. Standard bacteriological cultures demonstrated an average of 3 common bacterial species in wound samples. By contrast, high-throughput pyrosequencing revealed increased bacterial diversity with an average of 17 genera in each wound. Data from microbial community profiling of chronic wounds was compared to published sequenced analyses of bacteria from normal skin. Increased proportions of anaerobes, Gram-negative rods and Gram-positive cocci were found in chronic wounds. In addition, chronic wounds had significantly lower populations of Propionibacterium compared to normal skin. Using epifluorescence microscopy, wound bacteria were visualized in highly organized thick confluent biofilms or as scattered individual bacterial cells. Fluorescent in-situ hybridization allowed for the visualization of Staphylococcus aureus cells in a wound sample. Quorum sensing molecules were measured by bioassay to evaluate signaling patterns amongst bacteria in the wounds. A range of autoinducer-2 activities were detected in the wound samples. Collectively, these data provide new insights into the identity, organization, and behavior of bacteria in chronic wounds. Such information may provide important clues to effective future strategies in wound healing.
Chronic Wound; Microbiome; Biofilm; Microbiology; Pyrosequencing; Epifluorescence Microscopy; Fluorescent In-Situ Hybridization; Quorum Sensing
Obesity has been linked to the human gut microbiota; however, the contribution of gut bacterial species to the obese phenotype remains controversial because of conflicting results from studies in different populations. To explore the possible dysbiosis of gut microbiota in obesity and its metabolic complications, we studied men and women over a range of body mass indices from the Old Order Amish sect, a culturally homogeneous Caucasian population of Central European ancestry. We characterized the gut microbiota in 310 subjects by deep pyrosequencing of bar-coded PCR amplicons from the V1–V3 region of the 16S rRNA gene. Three communities of interacting bacteria were identified in the gut microbiota, analogous to previously identified gut enterotypes. Neither BMI nor any metabolic syndrome trait was associated with a particular gut community. Network analysis identified twenty-two bacterial species and four OTUs that were either positively or inversely correlated with metabolic syndrome traits, suggesting that certain members of the gut microbiota may play a role in these metabolic derangements.
It has been known for decades that human Lyme disease is caused by the three spirochete species Borrelia burgdorferi, Borrelia afzelii, and Borrelia garinii. Recently, Borrelia valaisiana, Borrelia spielmanii, and Borrelia bissettii have been associated with Lyme disease. We report the complete genome sequences of B. valaisiana VS116, B. spielmanii A14S, and B. bissettii DN127.
Human Lyme disease is commonly caused by several species of spirochetes in the Borrelia genus. In Eurasia these species are largely Borrelia afzelii, B. garinii, B. burgdorferi, and B. bavariensis sp. nov. Whole-genome sequencing is an excellent tool for investigating and understanding the influence of bacterial diversity on the pathogenesis and etiology of Lyme disease. We report here the whole-genome sequences of four isolates from two of the Borrelia species that cause human Lyme disease, B. afzelii isolates ACA-1 and PKo and B. garinii isolates PBr and Far04.
Methicillin-resistant Staphylococcus aureus (MRSA) strains are leading causes of hospital-acquired infections in the United States, and clonal cluster 5 (CC5) is the predominant lineage responsible for these infections. Since 2002, there have been 12 cases of vancomycin-resistant S. aureus (VRSA) infection in the United States—all CC5 strains. To understand this genetic background and what distinguishes it from other lineages, we generated and analyzed high-quality draft genome sequences for all available VRSA strains. Sequence comparisons show unambiguously that each strain independently acquired Tn1546 and that all VRSA strains last shared a common ancestor over 50 years ago, well before the occurrence of vancomycin resistance in this species. In contrast to existing hypotheses on what predisposes this lineage to acquire Tn1546, the barrier posed by restriction systems appears to be intact in most VRSA strains. However, VRSA (and other CC5) strains were found to possess a constellation of traits that appears to be optimized for proliferation in precisely the types of polymicrobic infection where transfer could occur. They lack a bacteriocin operon that would be predicted to limit the occurrence of non-CC5 strains in mixed infection and harbor a cluster of unique superantigens and lipoproteins to confound host immunity. A frameshift in dprA, which in other microbes influences uptake of foreign DNA, may also make this lineage conducive to foreign DNA acquisition.
Invasive methicillin-resistant Staphylococcus aureus (MRSA) infection now ranks among the leading causes of death in the United States. Vancomycin is a key last-line bactericidal drug for treating these infections. However, since 2002, vancomycin resistance has entered this species. Of the now 12 cases of vancomycin-resistant S. aureus (VRSA), each was believed to represent a new acquisition of the vancomycin-resistant transposon Tn1546 from enterococcal donors. All acquisitions of Tn1546 so far have occurred in MRSA strains of the clonal cluster 5 genetic background, the most common hospital lineage causing hospital-acquired MRSA infection. To understand the nature of these strains, we determined and examined the nucleotide sequences of the genomes of all available VRSA. Genome comparison identified candidate features that position strains of this lineage well for acquiring resistance to antibiotics in mixed infection.
Lyme disease is the most common tick-borne human illness in North America. In order to understand the molecular pathogenesis, natural diversity, population structure and epizootic spread of the North American Lyme agent, Borrelia burgdorferi sensu stricto, a much better understanding of the natural diversity of its genome will be required. Towards this end we present a comparative analysis of the nucleotide sequences of the numerous plasmids of B. burgdorferi isolates B31, N40, JD1 and 297. These strains were chosen because they include the three most commonly studied laboratory strains, and because they represent different major genetic lineages and so are informative regarding the genetic diversity and evolution of this organism. A unique feature of Borrelia genomes is that they carry a large number of linear and circular plasmids, and this work shows that strains N40, JD1, 297 and B31 carry related but non-identical sets of 16, 20, 19 and 21 plasmids, respectively, that comprise 33–40% of their genomes. We deduce that there are at least 28 plasmid compatibility types among the four strains. The B. burgdorferi ∼900 Kbp linear chromosomes are evolutionarily exceptionally stable, except for a short ≤20 Kbp plasmid-like section at the right end. A few of the plasmids, including the linear lp54 and circular cp26, are also very stable. We show here that the other plasmids, especially the linear ones, are considerably more variable. Nearly all of the linear plasmids have undergone one or more substantial inter-plasmid rearrangements since their last common ancestor. In spite of these rearrangements and differences in plasmid contents, the overall gene complement of the different isolates has remained relatively constant.
Accurate protein identification in large-scale proteomics experiments relies upon a detailed, accurate protein catalogue, which is derived from predictions of open reading frames based on genome sequence data. Integration of mass spectrometry-based proteomics data with computational proteome predictions from environmental metagenomic sequences has been challenging because of the variable overlap between proteomic datasets and corresponding short-read nucleotide sequence data. In this study, we have benchmarked several strategies for increasing microbial peptide spectral matching in metaproteomic datasets using protein predictions generated from matched metagenomic sequences from the same human fecal samples. Additionally, we investigated the impact of mass spectrometry-based filters (high mass accuracy, delta correlation), and de novo peptide sequencing on the number and robustness of peptide-spectrum assignments in these complex datasets. In summary, we find that high mass accuracy peptide measurements searched against non-assembled reads from DNA sequencing of the same samples significantly increased identifiable proteins without sacrificing accuracy.
Human Lyme disease is caused by a number of related Borrelia burgdorferi sensu lato species. We report here the complete genome sequence of Borrelia sp. isolate SV1 from Finland. This isolate is to date the closest known relative of B. burgdorferi sensu stricto, but it is sufficiently genetically distinct from that species that it and its close relatives warrant its candidacy for new-species status. We suggest that this isolate should be named “Borrelia finlandensis.”
Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.
The composition of the oral microbiota from 10 individuals with healthy oral tissues was determined using culture-independent techniques. From each individual, 26 specimens, each from different oral sites at a single point in time, were collected and pooled. An eleventh pool was constructed using portions of the subgingival specimens from all 10 individuals. The 16S rRNA gene was amplified using broad-range bacterial primers, and clone libraries from the individual and subgingival pools were constructed. From a total of 11 368 high-quality, non-chimeric, near full-length sequences, 247 species-level phylotypes (using a 99% sequence identity threshold) and 9 bacteria phyla were identified. At least 15 bacterial genera were conserved among all 10 individuals, with significant interindividual differences at the species and strain level. Comparisons of these oral bacterial sequences to near full-length sequences found previously in the large intestines and feces of other healthy individuals suggest that the mouth and intestinal tract harbor distinct sets of bacteria. Co-occurrence analysis demonstrated significant segregation of taxa when community membership was examined at the level of genus, but not at the level of species, suggesting that ecologically-significant, competitive interactions are more apparent at a broader taxonomic level than species. This study is one of the more comprehensive, high-resolution analyses of bacterial diversity within the healthy human mouth to date, and highlights the value of tools from macroecology for enhancing our understanding of bacterial ecology in human health.
oral microbiota; ribosomal RNA sequences; human microbial ecology
Microbial pathogens have evolved sophisticated mechanisms for evasion of host innate and adaptive immunities. PFam54 is the largest paralogous gene family in the genomes of Borrelia burgdorferi, the Lyme disease bacterium. One member of PFam54, the complement-regulator acquiring surface proteins 1 (BbCRASP-1), is able to abort the alternative pathway of complement activation via binding human complement regulator factor H (FH). The gene coding for BbCRASP-1 exists in a tandem array of PFam54 genes in the B. burgdorferi genome, a result apparently of repeated gene duplications. To help elucidate the functions of the large number of PFam54 genes, we performed phylogenomic and structural analyses of the PFam54 gene array from ten B. burgdorferi genomes. Analyses based on gene tree, genome synteny, and structural models revealed rapid adaptive evolution of this array through gene duplication, gene loss, and functional diversification. Individual PFam54 genes, however, do not show high intra-population sequence polymorphisms as genes providing evasion from adaptive immunity generally do. PFam54 members able to bind human FH are not monophyletic, suggesting that human FH affinity, however strong, is an incidental rather than main function of these PFam54 proteins. The large number of PFam54 genes existing in any single B. burgdorferi genome may target different innate-immunity proteins of a single host species or the same immune protein of a variety of host species. Genetic variability of the PFam54 gene array suggests that universally present PFam54 lineages such as BBA64, BBA65, BBA66, and BBA73 may be better candidates for the development of broad-spectrum vaccines or drugs than strain-restricted lineages such as BbCRASP-1.
Borrelia burgdorferi; Complement-Regulator Acquiring Surface Protein-1; Lyme disease; Factor H; immune evasion; cspA
Caldicellulosiruptor saccharolyticus is an extremely thermophilic, gram-positive anaerobe which ferments cellulose-, hemicellulose- and pectin-containing biomass to acetate, CO2, and hydrogen. Its broad substrate range, high hydrogen-producing capacity, and ability to coutilize glucose and xylose make this bacterium an attractive candidate for microbial bioenergy production. Here, the complete genome sequence of C. saccharolyticus, consisting of a 2,970,275-bp circular chromosome encoding 2,679 predicted proteins, is described. Analysis of the genome revealed that C. saccharolyticus has an extensive polysaccharide-hydrolyzing capacity for cellulose, hemicellulose, pectin, and starch, coupled to a large number of ABC transporters for monomeric and oligomeric sugar uptake. The components of the Embden-Meyerhof and nonoxidative pentose phosphate pathways are all present; however, there is no evidence that an Entner-Doudoroff pathway is present. Catabolic pathways for a range of sugars, including rhamnose, fucose, arabinose, glucuronate, fructose, and galactose, were identified. These pathways lead to the production of NADH and reduced ferredoxin. NADH and reduced ferredoxin are subsequently used by two distinct hydrogenases to generate hydrogen. Whole-genome transcriptome analysis revealed that there is significant upregulation of the glycolytic pathway and an ABC-type sugar transporter during growth on glucose and xylose, indicating that C. saccharolyticus coferments these sugars unimpeded by glucose-based catabolite repression. The capacity to simultaneously process and utilize a range of carbohydrates associated with biomass feedstocks is a highly desirable feature of this lignocellulose-utilizing, biofuel-producing bacterium.
Whole-genome sequencing has been skewed toward bacterial pathogens as a consequence of the prioritization of medical and veterinary diseases. However, it is becoming clear that in order to accurately measure genetic variation within and between pathogenic groups, multiple isolates, as well as commensal species, must be sequenced. This study examined the pangenomic content of Escherichia coli. Six distinct E. coli pathovars can be distinguished using molecular or phenotypic markers, but only two of the six pathovars have been subjected to any genome sequencing previously. Thus, this report provides a seminal description of the genomic contents and unique features of three unsequenced pathovars, enterotoxigenic E. coli, enteropathogenic E. coli, and enteroaggregative E. coli. We also determined the first genome sequence of a human commensal E. coli isolate, E. coli HS, which will undoubtedly provide a new baseline from which workers can examine the evolution of pathogenic E. coli. Comparison of 17 E. coli genomes, 8 of which are new, resulted in identification of ∼2,200 genes conserved in all isolates. We were also able to identify genes that were isolate and pathovar specific. Fewer pathovar-specific genes were identified than anticipated, suggesting that each isolate may have independently developed virulence capabilities. Pangenome calculations indicate that E. coli genomic diversity represents an open pangenome model containing a reservoir of more than 13,000 genes, many of which may be uncharacterized but important virulence factors. This comparative study of the species E. coli, while descriptive, should provide the basis for future functional work on this important group of pathogens.
The extent of chromosome rearrangements in Pyrococcus isolates from marine hydrothermal vents in Vulcano Island, Italy, was evaluated by high-throughput genomic methods. The results illustrate the dynamic nature of the genomes of the genus Pyrococcus and raise the possibility of a connection between rapidly changing environmental conditions and adaptive genomic properties.
The plant cell wall, which consists of a highly complex array of interconnecting polysaccharides, is the most abundant source of organic carbon in the biosphere. Microorganisms that degrade the plant cell wall synthesize an extensive portfolio of hydrolytic enzymes that display highly complex molecular architectures. To unravel the intricate repertoire of plant cell wall-degrading enzymes synthesized by the saprophytic soil bacterium Cellvibrio japonicus, we sequenced and analyzed its genome, which predicts that the bacterium contains the complete repertoire of enzymes required to degrade plant cell wall and storage polysaccharides. Approximately one-third of these putative proteins (57) are predicted to contain carbohydrate binding modules derived from 13 of the 49 known families. Sequence analysis reveals approximately 130 predicted glycoside hydrolases that target the major structural and storage plant polysaccharides. In common with that of the colonic prokaryote Bacteroides thetaiotaomicron, the genome of C. japonicus is predicted to encode a large number of GH43 enzymes, suggesting that the extensive arabinose decorations appended to pectins and xylans may represent a major nutrient source, not just for intestinal bacteria but also for microorganisms that occupy terrestrial ecosystems. The results presented here predict that C. japonicus possesses an extensive range of glycoside hydrolases, lyases, and esterases. Most importantly, the genome of C. japonicus is remarkably similar to that of the gram-negative marine bacterium, Saccharophagus degradans 2-40T. Approximately 50% of the predicted C. japonicus plant-degradative apparatus appears to be shared with S. degradans, consistent with the utilization of plant-derived complex carbohydrates as a major substrate by both organisms.
The accurate description of a microbial community is an important first step in understanding the roles of its components in ecosystem function. A method for surveying microbial communities termed serial analysis of rRNA genes (SARD) is described here. Through a series of molecular cloning steps, short DNA sequence tags are recovered from the fifth variable (V5) region of the prokaryotic 16S rRNA genes from microbial communities. These tags are ligated to form concatemers comprised of 20 to 40 tags which are cloned and identified by DNA sequencing. Four agricultural soil samples were profiled with SARD to assess the method's utility. A total of 37,008 SARD tags comprising 3,127 unique sequences were identified. A comparison of duplicate profiles from one soil genomic DNA preparation revealed that the method was highly reproducible. The large numbers of singleton tags, together with nonparametric richness estimates, indicated that a significant amount of sequence tag diversity remained undetected with this level of sampling. The abundance classes of the observed tags were scale-free and conformed to a power law distribution. Numerically, the majority of the total tags observed belonged to abundance classes that were each present at less than 1% of the community. Over 99% of the unique tags individually made up less than 1% of the community. Therefore, from either a numerical or diversity standpoint, taxa with low abundance comprised a significant proportion of the microbial communities examined and could potentially make a large contribution to ecosystem function. SARD may provide a means to explore the ecological roles of these rare members of microbial communities in qualitative and quantitative terms.
Arthrobacter sp. strains are among the most frequently isolated, indigenous, aerobic bacterial genera found in soils. Member of the genus are metabolically and ecologically diverse and have the ability to survive in environmentally harsh conditions for extended periods of time. The genome of Arthrobacter aurescens strain TC1, which was originally isolated from soil at an atrazine spill site, is composed of a single 4,597,686 basepair (bp) circular chromosome and two circular plasmids, pTC1 and pTC2, which are 408,237 bp and 300,725 bp, respectively. Over 66% of the 4,702 open reading frames (ORFs) present in the TC1 genome could be assigned a putative function, and 13.2% (623 genes) appear to be unique to this bacterium, suggesting niche specialization. The genome of TC1 is most similar to that of Tropheryma, Leifsonia, Streptomyces, and Corynebacterium glutamicum, and analyses suggest that A. aurescens TC1 has expanded its metabolic abilities by relying on the duplication of catabolic genes and by funneling metabolic intermediates generated by plasmid-borne genes to chromosomally encoded pathways. The data presented here suggest that Arthrobacter's environmental prevalence may be due to its ability to survive under stressful conditions induced by starvation, ionizing radiation, oxygen radicals, and toxic chemicals.
Soil systems contain the greatest diversity of microorganisms on earth, with 5,000–10,000 species of microorganism per gram of soil. Arthrobacter sp. strains have a primitive life cycle and are among the most frequently isolated, indigenous soil bacteria, found in common and deep subsurface soils, arctic ice, and environments contaminated with industrial chemicals and radioactive materials. To better understand how these bacteria survive in environmentally harsh conditions, the authors used a structural genomics approach to identify genes involved in soil survival of Arthrobacter aurescens strain TC1, a bacterium originally isolated for its ability to degrade the herbicide atrazine. They found that the genome of this bacterium comprises a single circular chromosome and two plasmids that encode for a large number proteins involved in stress responses due to starvation, desiccation, oxygen radicals, and toxic chemicals. A. aurescens' metabolic versatility is in part due to the presence of duplicated catabolic genes and its ability to funnel plasmid-derived intermediates into chromosomally encoded pathways. Arthrobacter's array of genes that allow for survival in stressful conditions and its ability to produce a temperature-tolerant “cyst”-like resting cell render this soil microorganism able to survive and prosper in a variety of environmental conditions.
In the present study, the chromosomes of two members of the Thermotogales were compared. A whole-genome alignment of Thermotoga maritima MSB8 and Thermotoga neapolitana NS-E has revealed numerous large-scale DNA rearrangements, most of which are associated with CRISPR DNA repeats and/or tRNA genes. These DNA rearrangements do not include the putative origin of DNA replication but move within the same replichore, i.e., the same replicating half of the chromosome (delimited by the replication origin and terminus). Based on cumulative GC skew analysis, both the T. maritima and T. neapolitana lineages contain one or two major inverted DNA segments. Also, based on PCR amplification and sequence analysis of the DNA joints that are associated with the major rearrangements, the overall chromosome architecture was found to be conserved at most DNA joints for other strains of T. neapolitana. Taken together, the results from this analysis suggest that the observed chromosomal rearrangements in the Thermotogales likely occurred by successive inversions after their divergence from a common ancestor and before strain diversification. Finally, sequence analysis shows that size polymorphisms in the DNA joints associated with CRISPRs can be explained by expansion and possibly contraction of the DNA repeat and spacer unit, providing a tool for discerning the relatedness of strains from different geographic locations.
Clustered regularly interspaced short palindromic repeats (CRISPRs) are a family of DNA direct repeats found in many prokaryotic genomes. Repeats of 21–37 bp typically show weak dyad symmetry and are separated by regularly sized, nonrepetitive spacer sequences. Four CRISPR-associated (Cas) protein families, designated Cas1 to Cas4, are strictly associated with CRISPR elements and always occur near a repeat cluster. Some spacers originate from mobile genetic elements and are thought to confer “immunity” against the elements that harbor these sequences. In the present study, we have systematically investigated uncharacterized proteins encoded in the vicinity of these CRISPRs and found many additional protein families that are strictly associated with CRISPR loci across multiple prokaryotic species. Multiple sequence alignments and hidden Markov models have been built for 45 Cas protein families. These models identify family members with high sensitivity and selectivity and classify key regulators of development, DevR and DevS, in Myxococcus xanthus as Cas proteins. These identifications show that CRISPR/cas gene regions can be quite large, with up to 20 different, tandem-arranged cas genes next to a repeat cluster or filling the region between two repeat clusters. Distinctive subsets of the collection of Cas proteins recur in phylogenetically distant species and correlate with characteristic repeat periodicity. The analyses presented here support initial proposals of mobility of these units, along with the likelihood that loci of different subtypes interact with one another as well as with host cell defensive, replicative, and regulatory systems. It is evident from this analysis that CRISPR/cas loci are larger, more complex, and more heterogeneous than previously appreciated.
The family of clustered regularly interspaced short palindromic repeats (CRISPRs) describes a class of DNA repeats found in nearly half of all bacterial and archaeal genomes. These DNA repeat regions have a remarkably regular structure: unique sequences of constant size, called spacers, sit between each pair of repeats. The DNA repeats do not encode proteins, but appear to be transcribed and processed into small RNAs that may have any number of functions, including resistance to any phage (i.e., virus of bacteria) whose sequence matches a spacer; spacers change rapidly as microbial strains evolve. This work describes 41 new CRISPR-associated (cas) gene families, which are always found near these repeats, in addition to the four previously known. It shows that CRISPR systems belong to different classes, with different repeat patterns, sets of genes, and species ranges. Most of these seem to come and go rather rapidly from their host genomes. These possibly beneficial mobile genetic elements may play an important role in driving prokaryotic evolution.