|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: YWL MH TDM FLR. Performed the experiments: YWL MH. Analyzed the data: YWL RS MH TDM MF SP CH DL FLR. Contributed reagents/materials/analysis tools: RS MF RE DC. Wrote the paper: YWL RS HM KW.
The impaired mucociliary clearance in individuals with Cystic Fibrosis (CF) enables opportunistic pathogens to colonize CF lungs. Here we show that Rothia mucilaginosa is a common CF opportunist that was present in 83% of our patient cohort, almost as prevalent as Pseudomonas aeruginosa (89%). Sequencing of lung microbial metagenomes identified unique R. mucilaginosa strains in each patient, presumably due to evolution within the lung. The de novo assembly of a near-complete R. mucilaginosa (CF1E) genome illuminated a number of potential physiological adaptations to the CF lung, including antibiotic resistance, utilization of extracellular lactate, and modification of the type I restriction-modification system. Metabolic characteristics predicted from the metagenomes suggested R. mucilaginosa have adapted to live within the microaerophilic surface of the mucus layer in CF lungs. The results also highlight the remarkable evolutionary and ecological similarities of many CF pathogens; further examination of these similarities has the potential to guide patient care and treatment.
Cystic fibrosis (CF) is a genetic disease caused by mutation of the cystic fibrosis transmembrane conductance regulator (CFTR) gene . In CF lungs, the defective CFTR protein affects trans-epithelial ion transport and consequently leads to the accumulation of thick and static mucus. The resultant hypoxic microenvironment encourages the colonization of opportunistic microbes, viruses, and fungi (reviewed in ), causing acute and chronic infection. A few of the most commonly isolated pathogens are Pseudomonas aeruginosa, Staphylococcus aureus, Haemophilus influenzae, and Burkholderia cepacia. However, an increasing number of microbial species have been detected in the CF airway using culture-independent methods such as metagenomic sequencing –. Metagenomics is a powerful approach that has been used to successfully characterize the microbial and viral communities in CF individuals –. These types of studies have illuminated the complexity of microbial and viral communities, captured the vast diversity of functions encoded by these organisms, and have been used to trace the evolution of whole genomes , .
Previous sequencing of CF metagenomes revealed the presence of Rothia mucilaginosa at relatively high abundances in most patients . R. mucilaginosa was first isolated from milk in 1900 as Micrococcus mucilaginosus . It was later re-isolated and further studied by Bergan et al. in 1970 , and renamed Stomatococcus mucilaginosus in 1982 based on its 16S rDNA and biochemical characteristics . A recent study comparing S. mucilaginosus to Rothia dentocariosa and another unknown species (later known as Rothia nasimurium) led to the reclassification of S. mucilaginosus as R. mucilaginosa .
R. mucilaginosa is an encapsulated, Gram-positive non-motile coccus (arranged in clusters) belonging to the phylum Actinobacteria. It has variable catalase activity, reduces nitrate, and hydrolyses aesculin –. It is a facultative anaerobe commonly found in the human oral cavity and upper respiratory tract , , , and occasionally the gastrointestinal tract , small intestinal epithelial lining , tongue , , teeth , , colostrum , breast milk , and dental plaques , . Although R. mucilaginosa is commonly regarded as normal flora of the oral cavity and upper respiratory tract, its association with a wide range of diseases (Table S1) highlights its potential as an opportunistic pathogen, especially in immuno-compromised patients .
At the genus level, Rothia has been reported by Tunney et al.  as an aerobic species that can be isolated from CF sputum and pediatric bronchoalveolar lavage (BAL) samples. It has also been detected under anaerobic culturing conditions and via 16S rRNA gene surveys . Typically R. dentocariosa was the main species identified in these studies . In addition, Bittar et al.  and Guss et al.  have characterized R. mucilaginosa as a “newly” emerging CF pathogen. Even so, R. mucilaginosa is usually treated as part of the normal oral microbiota in the clinical lab. As a result, the presence of R. mucilaginosa in CF lungs may be under-reported and the significance of infection is underestimated.
Here we confirm that R. mucilaginosa is present and metabolically active in the lungs of CF patients. Comparisons with a non-CF reference genome revealed the presence of unique R. mucilaginosa strains in each patient. A near-complete genome was reconstructed from the metagenomic reads of one patient; comparison of these sequence data with a non-CF reference genome enabled the identification of unique genomic features that may have facilitated adaptation to the lung environment.
Mutations in the CFTR locus affect proper ion transport in lung epithelial cells, impairing the clearance of mucus in the airways and encouraging microbial colonization and persistence. Until recently, most laboratory and clinical microbiology only focused on a few pathogens, particularly P. aeruginosa. This “one mutation, one pathogen” model of the CF ecosystem is being replaced by a “polyphysiology, polymicrobial” view that is expected to improve treatment until gene-therapy is able to fix the underlying genetic cause.
An important step in understanding the CF lung ecosystem, with the ultimate goal of eliminating microbes or altering their pathogenicity, is to determine which microbes are present and the ways in which their survival depends on local chemistry. An initial step in this direction is to use metagenomic and metatranscriptomic data previously generated from microbes and viruses present in CF sputum samples . Metagenomic data provide information on which microbes and viruses are present, and their metabolic capabilities, while metatranscriptomic data provide information on which organisms are metabolically active . Such metabolism data will enable predictions of local lung chemistry that may impose patient-specific selective pressures on the microbes. Data from eighteen microbiomes existing in six CF patients with different health statuses were analyzed here (Supporting Information S1; Table S2).
The metagenomic data showed that 15/18 (83%) sputa contained R. mucilaginosa and 16/18 (89%) sputa contained the common CF pathogen P. aeruginosa. Although P. aeruginosa was present in a greater number of samples, its abundance was lower than that of R. mucilaginosa in 11 of the 14 samples where these species co-existed. Both species abundances ranged from 1% to 62% (Figure 1). The relative percentages of these two opportunistic pathogens varied between patients and within the same patient as their health status changed. The results show no obvious pattern of synergy or competition between the two pathogens.
A health status of ‘Ex’ (for exacerbation) indicates a stark decline in lung function that is typically treated with intravenous antibiotics. Thus, between a health status of ‘Ex’ and ‘Tr’, patients will have been given antibiotics in addition to those that are often prescribed as continued therapy. The abundance data in Figure 1 indicate that these exacerbation-associated antibiotic treatments did little to permanently exclude R. mucilaginosa from CF lung communities. Patients CF1, CF4, CF6, CF7, and CF8 all had appreciable abundances of R. mucilaginosa by the last sampling time point. Only the R. mucilaginosa population in CF5 did not recover from antibiotic treatment by the last sampling time point; however, as this patient was only followed for 21 days (compared to 17–58 for the other patients), it is possible R. mucilaginosa could still rebound from antibiotic treatment. These results indicate that R. mucilaginosa is able to survive the typical CF antibiotic treatment, as is the main CF pathogen P. aeruginosa.
The presence of R. mucilaginosa DNA in sputum samples (as detected by metagenome sequencing) could be explained by its abundance in the oral cavity and subsequent contamination of the sputum during collection. However, this is unlikely for several reasons. Previous studies have indicated little contamination of sampled sputa with oral inhabitants , , and the presence of Rothia has been confirmed in CF lungs . Examination of lung tissue samples was the best way to definitively determine the presence of R. mucilaginosa in CF lungs from our cohort. Between 5 and 6 lung tissue sections from explanted lungs of each of four transplant patients were screened for Rothia-related microbes using 16S rDNA targeted PCR and sequencing. One out of the four patients was positive for Rothia (Supporting Information S3), indicating this bacterium is indeed present within lung airways. Unfortunately this patient was not available for the metagenome sequencing. The R. mucilaginosa population present in the oral cavity may serve as a reservoir and “stepping stone” for lower respiratory infection, as described in many respiratory chronic infections such as CF and chronic obstructive pulmonary disease .
The presence of R. mucilaginosa DNA in sputum or lung tissues does not necessarily indicate this bacterium is metabolically active in the lung environment. Examination of a metatranscriptome dataset indicated that mRNAs and rRNAs are being produced by Rothia species (Supporting Information S2), which suggests this bacterium is metabolically active in the CF lung.
Longitudinal studies of P. aeruginosa , Burkholderia dolosa , and Staphylococcus aureus  within and between CF patients have shown evolutionary adaption to the CF lung. Here, we define adaptation as a process where mutations that alter pathogen behavior (in this case, metabolism) become fixed in response to specific environmental pressures, e.g. the availability of nutrients, oxygen, or redox potential. The power of the metagenomic data is in its ability to uncover the genetic mutations underlying these adaptations, that occur over long periods of selection. Characterizing these mutations thus enables us to infer which selection pressures are strongest in the CF lung, whether they be the dynamic lung physiology, immune system surveillance, and/or antibiotic treatment.
We found evidence for similar evolutionary adaptation in R. mucilaginosa. The metagenomic sequences from each sample were mapped separately against the reference genome R. mucilaginosa DY-18, (GI: 283457089; originally isolated from persistent apical periodontitis lesions . As shown in Figure 2, the mapped sequences reveal gaps where portions of the reference genome sequence were not covered by metagenomic reads (i.e., were absent) in the CF-derived datasets (gap patterns >5 kbp shown in Figure 2; Table S3). The out-group in Figure 2 is due to low coverage of R. mucilaginosa reads in the metagenomes of these patient samples (<1X coverage; Table S4). Most of the gaps occurred in regions of low GC content (Figure 2), which most likely represent genes acquired by DY-18 via horizontal gene transfer .
The gap patterns were most different between patients, indicating unique R. mucilaginosa strains exist in each patient. Within each patient, differences in gap patterns between time points were less numerous, but their existence indicates that the genome of R. mucilaginosa has been evolving independently in each patient. Combined with similar findings for P. aeruginosa  and S. aureus , this suggests that essentially every CF patient harbors a unique strain of R. mucilaginosa that evolves in the lung. If each strain also has a unique antibiotic resistance profile, then CF treatment will need to be tailored to the particular strain present in each patient.
The metagenome from CF1E had over 40,000 reads mapping to the reference genome, indicating enough data may be present to reconstruct the full genome of the R. mucilaginosa strain present. All CF1E metagenomic reads were assembled de novo into 996 contigs with a N50 value (weighted median value of all contigs) of 11,178 bp. Contigs were aligned against the reference genome R. mucilaginosa DY-18 using nucmer , resulting in one single scaffold built from 181 contigs with an 8.8-fold average sequencing depth (Figure 3). The CF1E R. mucilaginosa genome scaffold was then annotated using the RAST server (Genome ID: 43675.9) and compared to DY-18 that had been re-annotated using the same pipeline.
The CF1E genome scaffold consists of one circular chromosome of 2,278,618 bp with a GC content of 59.6%. Only large indels are reported here and SNPs were not examined. No large rearrangements were detected between CF1E and the reference genome DY-18. Phylogenetic analysis of the 16S rDNA loci indicated CF1E and the reference strain DY-18 are close relatives (Supporting Information S3), which is consistent with their average pairwise nucleotide identity of 85%. The sequence reads were relatively equally distributed across the genome, except at the multi-copy rRNA genes and in the highly conserved rhs region (Figure 3).
The rRNA operons assembled into one single contig (contig173) and had an average coverage depth of approximately 3.5 times the average depth of coverage for the rest of the scaffold (i.e. 31X versus 8.8X). The sequence from this contig was used to fill in three gaps that were predicted to correspond to the rRNA operons, based on alignment with the reference genome (Figure 3).
The rearrangement hot spot (rhs) gene region (Figure 3: Region V marked *) also had a high average coverage of 39X. The primary structure of Rhs proteins consists of an N-terminal domain, a “core” domain, a hyperconserved domain, and a DPxGL motif followed by a C-terminus that varies between strains and species . Previous studies have shown that rhs genes play a role in competition between strains or species, similar to the contact-dependent growth inhibition (CDI) system . The variable C-termini of Rhs proteins have toxin activities, and the small genes that typically follow rhs genes are thought to encode proteins that provide immunity to the toxins. Kung et al. (2012) showed that the rhs-CT in P. aeruginosa delivers toxins to eukaryotic cells, activating the inflammasome . The high coverage of the conserved rhs region suggests that rhs is present in high abundance in the CF microbial community. It is possible that the rhs system is widely used by CF microbes for (i) cell-to-cell interactions and communication, particularly for biofilm formation, (ii) direct antagonistic effects on the growth or viability of competitors, and/or (iii) attacking cells of the host-immune system. Additional experimental studies are needed to further assess these possibilities.
The high coverage rhs region in the CF1E genome scaffold included an rhs gene sequence related to one of the two rhs genes of DY-18 (RMDY18_19250). However, there is an apparent gap in the scaffold sequence, beginning 24 amino acids upstream of the DPxGL motif of the encoded Rhs protein. In the DY-18 reference genome, this gap corresponds to the coding sequence for the toxic C-terminal region of Rhs, and the beginning of the gene encoding the RhsI immunity protein. Assuming the presence of multiple rhs-CT/rhsI modules in the metagenome, assembling this region will be challenging.
RAST predicted 1,739 gene products belonging to 248 function subsystems (Table S5). The most abundant functions included biosynthesis and degradation of amino acids and derivatives, protein metabolism, cofactor/vitamin/prosthetic group/pigment biosynthesis and metabolism, and carbohydrate metabolism (Table S6). Thirty-seven ORFs present in the DY-18 genome and absent in the CF1E scaffold are listed in Table 1 (DY-18 specific). Genes only present in CF1E are listed in Table 2 (CF1E specific). Genome regions specific to only one of the two strains ranged from multiple kbp (mostly in gene coding regions) to a few nucleotides in non-coding regions. Additional analyses were performed on several of the genomic regions unique to CF1E; regions were chosen for their potential influences on CF-lung specific evolution of niche utilization and antibiotic resistance.
Phages are an important source of genes in microbial communities. The CRISPRs found in R. mucilaginosa CF1E may correspond to previously attacking phages and plasmids that these cells were able to resist. In order to identify these phage perpetrators, spacer sequences were compared against all host-associated and environmental viromes in MyMgDB . One of the spacers was identified in two human oral cavity viromes , whereas none of the spacers were similar to sequences from other environmental viromes (Table S9). The results suggest these bacteria may have been exposed to phages found in the oral cavity, which suggests cells may have existed in this environment prior to opportunistic infection of the CF lungs. Because these spacer sequences did not match phages in the virome sequenced from the same sample, the phages to which R. mucilaginosa is resistant are not present, or are below the detection limit, in this sample. However, if temperate phages dominate in the CF lung  as in the human gut virome , this result is expected because the virome would largely composed of free-living viruses. However it is also possible that these CRISPRs do not protect the cells against phage infection, but are involved in a CRISPR-dependent modulation of biofilm formation, as described previously in P. aeruginosa (reviewed in ). Biofilm formation has been shown to be important for persistent bacterial infection of CF lungs, as well as an overall decline in lung function. Therefore, the role of these CRISPRs in CF1E and other CF lung isolates’ pathogenesis should be explored further.
The metagenomic and genomic analyses presented here suggest that R. mucilaginosa is a common inhabitant of CF lungs, and that it evolves and adapts to each patient’s lung environment over the course of a persistent infection. Genomic analysis of CF1E highlighted many potential adaptations: multiple genes encoding L-lactate dehydrogenases (LDHs) that could enable utilization of lactate, many multi-drug efflux pumps for antibiotic resistance, and the modification of rhs elements and the type I restriction system. Alterations of the type I restriction system has the potential to influence horizontal transfer of genes. The CF1E genomic sequence indicates extensive phage-host interactions, including the acquisition of a phage lysin and changing CRISPR elements.
Based on these potential metabolic adaptations, we hypothesize that R. mucilaginosa lives in the microaerophilic surface of the viscous mucus layer that is characteristic of CF airways (Figure 4). Under this hypothesis, cytochrome c-dependent LDH would enable R. mucilaginosa to use extracellular lactate. However, this process would require oxygen, which is more readily available at the surface of the mucus layer (e.g., from the blood). As the oxygen level is depleted, metabolism could be supported by fermentation and anaerobic respiration with nitrate as an alternative electron acceptor, as observed in P. aeruginosa . Persistence in low oxygen environments would also allow for evasion of antibiotics and ROS activity. In addition, R. mucilaginosa carries a low-pH induced ferrous ion (Fe2+) transporter along with heme and hemin uptake and utilization systems. Co-occurring CF pathogens including P. aeruginosa and S. maltophilia are known to synthesize redox active phenazines that are able to reduce Fe3+ to Fe2+ ,  potentially giving R. mucilaginosa access to Fe2+ in the low pH sputum where the ferrous ion transporter is induced.
The results presented here highlight the similar evolutionary trajectories and ecological niches of several species of bacteria that colonize the CF lung. These similarities are remarkable because each bacterial species starts with different genetic material: P. aeruginosa has a relatively large genome (>6 Mbp), whereas R. mucilaginosa has only a 2 Mbp genome (Table S10). These findings suggest that obtaining strain specific genome data can illuminate patient-specific bacterial inhabitants of CF patients. This specific information enables predictions to be made regarding the bacteria’s physiological adaptations in each patient, which would further enable physicians to optimize antibiotic treatments.
Induced sputum samples were collected from CF volunteers at the Adult CF Clinic (San Diego, CA, United States) by expectoration. All collection was approved by the University of California Institutional Review Board (HRPP 081500) and San Diego State University Institutional Review Board (SDSU IRB#2121). Written informed consent was provided by study participants and/or their legal guardians. Fresh CF sputum samples were processed as described in . In brief, sputum samples were homogenized, bacterial cells were pelleted by centrifugation, and pellets were repeatedly washed and then treated with DNase to remove human DNA prior to extraction of bacterial DNA.
A total of 18 microbiomes were previously sequenced using Roche-454 GSFLX . The data were downloaded from NCBI sequence read archive (Accession # SRP009392). Reads that were duplicates or of low quality were removed using PRINSEQ , and those that matched human-derived sequences were removed using DeconSeq . Sequence reads with similarity to the phylum Chordata and to vector or synthetic sequences were identified by BLASTn against NCBI nucleotide database (threshold of 40% identity over at least 60% query coverage), and removed from the metagenomes. A detailed description of sample processing and preliminary analyses of these datasets has been published .
The processed metagenomic reads were mapped to the Rothia mucilaginosa DY-18 (GI: 283457089) reference genome using a modified version of BWA-SW 0.5.9. The coverage values based on the reference mapping are shown in Table S4.
The metagenomic reads from CF1E were de novo assembled using the Newbler software version 2.6 with ≥35 bp overlap and ≥95% identity. All resultant contigs were aligned to the reference genome (R. mucilaginosa DY-18) using nucmer with its -maxmatch option (using all anchor matches regardless of their uniqueness). This option will allow repetitive or multi-copy sequences (e.g., rRNA operons) to assemble into a single contig, enabling that contig to be subsequently mapped to more than one genomic region. All alignments were examined manually. Full length contigs were ordered based on their coordinates on the reference alignment, and this ordering was used with an in-house Perl script to build the final scaffold containing 181 contigs.
The CF1E scaffold was annotated using the RAST web annotation service  with the latest FIGfams version 57 (Genome ID: 43675.9). In order to allow a direct comparison, the reference genomes of R. mucilaginosa, DY-18 and R. mucilaginosa M508 (downloaded from the Genome OnLine Database) were also annotated using the same pipeline. CRISPR loci were identified using CRISPRFinder . The spacers between the repeats were extracted and compared to the virome sequenced from the same sample (downloaded from the NCBI (SRX090639)) , and other viromes in mymgdb .
DNA was extracted from 5–6 homogenized lung tissues from explanted lungs of four transplant patients using the Macherey-Nagel Nucleospin Tissue Kit (Macherey-Nagel, Bethlehem, PA) with the Gram-positive variation that included an overnight proteinase K digestion. Extracted DNA was amplified using Actinobacteria-targeted PCR primers (Rothia_1F: 5′-GGGACATTCCACGTTTTCCG-3′, Rothia_1R: 5′-TCCTATGAGTCCCCACCATT-3′) that encompass a 322 bp region of the 16S rRNA gene including the hypervariable regions 6–7. Two of the four patients were positive for Actinobacteria; right lower and lingular (left) lobes for Lung 9, and lingular lobe for Lung 7 (Supporting Information S3). The PCR products were purified and sequenced. Sequencing of the three partial 16S gene fragments indicated Rothia was present in lungs from one of the four CF patients.
Dot Plot matrix view of the alignment of CF1E type I restriction modification system (subunit M, R, S) against DY-18.
Diseases associated with R. mucilaginosa.
Microbiomes used in this study. Clinical status was designated as exacerbation (prior to systemic antibiotic treatment), on treatment (during systemic antibiotic treatment), post treatment (upon completion of systemic antibiotic treatment) or stable (when clinically stable and at their clinical and physiological baseline). The samples collected during exacerbation were designated as Day 0 sample, and the times between samples are cumulatively calculated from Day 0.
(A) Annotation of the gaps ≥5 kbp in CF1 metagenomic reference mapping against R. mucilaginosa DY-18. Refer to Table S2 for detailed patient samples information. (B) Annotation of the gaps ≥5 kbp in CF6 metagenomic reference mapping against R. mucilaginosa DY-18. Refer to Table S2 for detailed patient samples information. (C) Annotation of the gaps ≥5 kbp in CF7 metagenomic reference mapping against R. mucilaginosa DY-18. Refer to Table S2 for detailed patient samples information. (D) Annotation of the gaps ≥5 kbp in CF8 metagenomic reference mapping against R. mucilaginosa DY-18. Refer to Table S2 for detailed patient samples information.
Statistics from BWA mapping of metagenomic reads against the reference genome R. mucilaginosa DY-18.
General features of the CF1E R. mucilaginosa scaffold, DY-18 reference genome, and M508 draft genome.
Subsystem feature counts of R. mucilaginosa CF1E, DY-18, and M508.
Sequence identities of the genes encoding the Type I restriction modification system in CF1E and DY-18, determined using BLAST. The identity value is subjected to >97% query length coverage.
CRISPR positions in the CF1E genome scaffold.
Identification of the spacer sequences in CF1E CRISPR structure from human- and environmental-viral metagenomes at 100% length coverage and ≥90% identity (≤2 mismatches).
A comparison of putative adaptations and predicted metabolisms of R. mucilaginosa and P. aeruginosa that are hypothesized to enable persistence in the CF lung, based on literature and genomic data.
Genes that are missing from the CF1E genome scaffold but present in the DY-18 reference. Genes are considered missing when the gap is within a contig.
Genes present in the CF1E genome scaffold but missing in the reference genome DY-18.
Isolation source and references of sequences extracted and used in the 16S phylogenetic analysis.
Protein-coding genes used for multilocus phylogenetic inference.
Genes missing from the CF1E genome scaffold, based on contig mapping to the reference genome DY18.
Additional samples information.
Rothia mucilaginosa in cystic fibrosis community metatranscriptomes.
Additional genome characteristic of R. mucilaginosa and phylogenetic analysis of Rothia spp. associated with cystic fibrosis.
This work was supported by the National Institutes of Health and Cystic Foundation Research Inc. through grants (1 R01 GM095384-01 and CFRI #09-002) awarded to Forest Rohwer and grant U54 AI065359 awarded to Christopher Hayes and David Low. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.