|Home | About | Journals | Submit | Contact Us | Français|
C.A.D. and J.W.K. contributed equally to this article and should be considered co-first authors.
Cryptococcus gattii recently emerged as the causative agent of cryptococcosis in healthy individuals in western North America, despite previous characterization of the fungus as a pathogen in tropical or subtropical regions. As a foundation to study the genetics of virulence in this pathogen, we sequenced the genomes of a strain (WM276) representing the predominant global molecular type (VGI) and a clinical strain (R265) of the major genotype (VGIIa) causing disease in North America. We compared these C. gattii genomes with each other and with the genomes of representative strains of the two varieties of Cryptococcus neoformans that generally cause disease in immunocompromised people. Our comparisons included chromosome alignments, analysis of gene content and gene family evolution, and comparative genome hybridization (CGH). These studies revealed that the genomes of the two representative C. gattii strains (genotypes VGI and VGIIa) are colinear for the majority of chromosomes, with some minor rearrangements. However, multiortholog phylogenetic analysis and an evaluation of gene/sequence conservation support the existence of speciation within the C. gattii complex. More extensive chromosome rearrangements were observed upon comparison of the C. gattii and the C. neoformans genomes. Finally, CGH revealed considerable variation in clinical and environmental isolates as well as changes in chromosome copy numbers in C. gattii isolates displaying fluconazole heteroresistance.
Isolates of Cryptococcus gattii are currently causing an outbreak of cryptococcosis in western North America, and most of the cases occurred in the absence of coinfection with HIV. This pattern is therefore in stark contrast to the current global burden of one million annual cases of cryptococcosis, caused by the related species Cryptococcus neoformans, in the HIV/AIDS population. The genome sequences of two outbreak-associated major genotypes of C. gattii reported here provide insights into genome variation within and between cryptococcal species. These sequences also provide a resource to further evaluate the epidemiology of cryptococcal disease and to evaluate the role of pathogen genes in the differential interactions of C. gattii and C. neoformans with immunocompromised and immunocompetent hosts.
Fungal pathogens of humans mainly cause disease in people with underlying immune defects due to AIDS, cancer, or immunosuppressive therapy. For example, strains of Cryptococcus neoformans emerged as life-threatening agents of fungal meningitis, with an increase in the number of global cases coincidentally occurring with the HIV/AIDS epidemic (1, 2). These pathogens are responsible for an estimated one million cases of cryptococcal meningitis globally per year in AIDS patients, leading to approximately 625,000 deaths (3). In contrast, the closely related species Cryptococcus gattii has the distinct ability to cause disease in otherwise healthy people, although C. gattii infections have also been reported in immunocompromised patients, including HIV/AIDS and organ transplant patients (4–6). C. gattii was initially classified as a variety of C. neoformans but has recently been recognized as a separate species (7). The two species share all of the major recognized virulence factors such as production of a polysaccharide capsule, melanin deposition in the cell wall, and robust growth at 37°C, but they differ in a number of traits (reviewed in reference 8). For example, unlike those of C. neoformans, C. gattii strains can assimilate d-proline, d-tryptophan, and l-malic acid, use glycine as both carbon and nitrogen sources, and resist growth inhibition by cycloheximide and canavanine. Clinically, C. gattii infections result in a higher incidence of lung and brain granulomas and neurological complications, and these infections often require prolonged treatment with antifungal drugs compared with those caused by C. neoformans (8).
The ability of C. gattii to infect both immunocompromised and immunocompetent individuals has been dramatically demonstrated by the emergence of cryptococcosis on Vancouver Island in British Columbia (BC), Canada, over the last 10 years. Specifically, there have been >240 reported human cases (mainly in immunocompetent people) and hundreds of animal cases of cryptococcosis since 1999 (9–11). With an average annual incidence of 5.8 per million persons, this is a much higher rate of infection than that found in the rest of the world, and 19 of the human cases were fatal despite aggressive antifungal therapy (10). The outbreak observed on Vancouver Island has now spread to the British Columbia mainland as well as the Pacific Northwest region of the United States (9, 12–16). The outbreaks and increased incidence of cryptococcosis in a temperate climatic zone suggest a fundamental shift in the ecological adaptation of this pathogen.
Extensive environmental sampling revealed that C. gattii is present in the air and soil and on multiple tree species in British Columbia (9, 16–18). Previous characterization of C. gattii isolates suggested that the species was primarily a tropical or subtropical pathogen and revealed that the natural habitat of C. gattii in Australia was in tree species such as the eucalypts (19–21). Molecular typing of clinical and environmental isolates by PCR fingerprinting, amplified fragment length polymorphism analysis, and multilocus sequence typing identified four molecular types within the C. gattii complex (VGI, VGII, VGIII, and VGIV), with VGI being the most commonly isolated genotype worldwide (22–25). The VGII molecular type is predominant among clinical and environmental isolates from Vancouver Island, with further identification of VGIIa and VGIIb subtypes among the isolates. The VGIIa subtype is predominant in the environment and responsible for the majority of cases of cryptococcosis, although fatal disease has also been caused by VGI and VGIIb strains (10, 26). In a disquieting trend of continual expansion in the United States, a virulent and novel subtype, VGIIc, has emerged in Oregon and is now contributing to illness in the region along with the VGIIa subtype (13).
In this report, we present the genome sequences of strains representing VGI and VGIIa and compare these genomes with each other and with the sequenced genome of a serotype D strain (JEC21) of the opportunistic species C. neoformans (27). We also describe comparative genome hybridization (CGH) experiments that extend the analysis of the genomes to include C. gattii outbreak strains with different levels of virulence as well as strains resistant to high levels of the antifungal drug fluconazole. Together, these studies reveal extensive rearrangements between the representative C. gattii and C. neoformans genomes that may have contributed to sexual isolation and speciation. Additionally, considerable variation was observed between and within molecular types of C. gattii, and this finding is consistent with the existence of separate species within the C. gattii complex. Overall, the C. gattii sequences provide a reference platform for studying virulence and for further detailed epidemiological characterization of isolates causing the unusual emergence of this pathogen in North America.
The C. gattii strain WM276, which represents the VGI molecular type, which is commonly isolated worldwide, was sequenced at 6.5× coverage and assembled into 14 chromosomes with a combined size of 18.4 Mb (see Table S1a and b in the supplemental material). The assembly of the genome was supported by a physical map constructed by fingerprinting bacterial artificial chromosome (BAC) clones and by the inclusion of sequence reads from the ends of the BAC clones. This strategy, together with several rounds of finishing and gap closure, improved the quality of the genome assembly and resulted in a high level of completion. As a result, only eight internal gaps remain in the sequence, and telomeric sequences comprised of repeats of the motif (TTAGGGG)n were identified at both ends of eight chromosomes; telomeric sequences were identified at only one end of four additional chromosomes, and telomeric sequences have yet to be identified for the remaining two chromosomes (see Table S1b in the supplemental material). With the assembled genome sequence, we identified and annotated 6,565 potential open reading frames (ORFs; excluding pseudogenes). These ORFs were identified and annotated using Pegasys (28), which includes the Atlas database (29) and Apollo (30, 31) for the curators to interpret gene models and make decisions based on the evidence presented to them. The evidence was built from gene prediction models, BLASTp (32) against nonredundant (nr) protein resources, and tBLASTx against fungal dbEST (33) sequences. This annotation processing pipeline also included the identification and the location of retroelements (see annotations in Data Set S1, tab A, in the supplemental material). With regard to the latter elements, the centromeres in the WM276 chromosomes can be defined by the clustering of the retrotransposons TCN1 and TCN6 at these locations, as previously described for the genomes of the C. neoformans strains JEC21 and B3501A (27). We also identified the rRNA gene cluster on chromosome 2 (positions 1516460 to 1924811; ~0.4 Mb) but excluded rRNA and other RNA-carrying genes from the annotations, in keeping with a focus on protein-encoding genes.
The WM276 genome and the detailed manual annotations have been employed as the reference genome for C. gattii and used in the annotation of the C. gattii R265 genome. The genome of this strain, which represents the VGIIa subtype, responsible for the majority of cases of cryptococcosis in British Columbia, was sequenced to 6.5× coverage (see Materials and Methods in Text S1 and sequencing statistics in Table S1c in the supplemental material). Automated annotation of the assembled R265 sequence was performed, and the annotation information obtained from the WM276 genome was employed to compare gene models. The R265 genome assembly was also supported by a BAC physical map and BAC end sequences.
To obtain an overview of the organizations of the C. gattii genomes for strains WM276 and R265, we aligned their chromosomes using the nucleic acid sequence comparison tool Cross-Match (http://www.phrap.org), a Smith-Waterman local alignment algorithm (34, 35); alignments were visualized using the tool XMatchView (http://www.bcgsc.ca/platform/bioinfo/software/xmatchview). Cross-Match analysis of these two genomes revealed that the majority of chromosomes were colinear, as illustrated for chromosome 4 in Fig. 1A. However, some rearranged chromosomes and regions of inversion were also present between the two genomes (see Text S1, p. 19 in the supplemental material). This is in striking contrast to the more substantial genome rearrangements observed in the comparisons with the serotype D genome, as shown in Fig. 1B and described below. The Cross-Match analysis also revealed a higher-than-expected overall nucleotide sequence divergence of 7.6% between the VGI and VGII genomes. This observation suggests potential speciation within the C. gattii complex, although we note that the percent divergence is 10 to 15% between the neoformans and grubii varieties of C. neoformans (36). Perhaps these observations indicate that these C. neoformans varieties should be separate species.
We compared the gene contents of the C. gattii strains by mapping syntenic orthologs between the two strains. First, the assemblies were aligned using NUCmer (37), and alignments separated by less than 200 bases were merged into syntenic regions. Next, gene coordinates were transferred from WM276 to R265 using the base-to-base correspondence of the alignments. Upon mapping 6,120 genes from WM276 to 6,102 genes in R265, we found some 2:1 and 1:2 mappings that appeared mostly to be splits or merges of genes (see Data Set S1, tab B, in the supplemental material). A total of 445 WM276 genes did not map to the R265 genome, but closer inspection allowed the alignment of 291 loci to R265 regions that did not contain good gene structures. This group had a small average polypeptide size (205 amino acids [aa]) compared to the overall average (522 aa), and the small size and lack of conservation suggest that these are dubious genes. The remaining 154 genes did not fall into alignment with regions of the R265 genome. The full set of 6,210 genes that were predicted in R265 also included 108 that were annotated uniquely for this genome.
The majority of genes specific to one genome or the other encoded hypothetical proteins. However, some specific examples of differential gene content between the two C. gattii strains included the genes encoding the predicted Argonaute proteins Ago1 and Ago2 (CGB_D9320W and CGB_D9160C, respectively), which were present in WM276 but not R265; this gene deficiency in R265 was not associated with gaps in the R265 genome assembly (see Text S1, p. 20 and Table S2 in the supplemental material). The highly conserved Argonaute proteins are part of the RNA-induced silencing complex (RISC) (38) and function in RNA interference and related phenomena. Another example of a difference in gene content is an 880-bp deletion that removes upstream sequence and the coding region for 209 N-terminal amino acids of the pheromone receptor-like protein Cpr2 (CNBG_5530) in R265; this gene is intact in WM276 (CGB_A1720W) (see Text S1, p. 21 and Table S2 in the supplemental material) (39). The WM276 ortholog CGB_A1730W that lies downstream of this locus is also absent from R265. Aside from these differences, the CPR2 flanking region was conserved in both of the C. gattii strains, suggesting that the Cpr2 gene may be under selective pressure. In this regard, it is possible that selective loss of some gene functions may contribute to the higher virulence of the R265 strain relative to that of WM276 (24, 40). It has been proposed, for example, that pathogens may become adapted to the selective pressure of a host niche through inactivation of so-called antivirulence genes (41). Identification of the compendium of antivirulence genes for a particular pathogen could lead to a better understanding of the emergence of novel virulence traits, perhaps including factors relevant to the emergence of cryptococcosis in western North America in the case of C. gattii. Overall, the observed gene differences between the two C. gattii strains will provide opportunities for functional examination of their contributions to virulence differences.
The sequences of the WM276 chromosomes were each aligned with the sequences of chromosomes of the C. neoformans strain B3501A (27) using the Cross-Match approach. At the whole-genome level, the WM276 genome shows 87.0% identity to the B3501A genome, and the C. gattii R265 genome has 85.6% identity with that of B3501A. In contrast to the alignments observed for the C. gattii strains, which showed extensive conservation of synteny (described above and see Text S1, p. 19 in the supplemental material), many rearrangements were observed for the comparisons of the C. gattii and C. neoformans chromosomes (see Text S1, p. 22 and 23 in the supplemental material). In particular, a striking three-part chromosomal rearrangement was observed that involved chromosomes 4, 9, and 10 in the two strains; the rearrangements that resulted in chromosome 4 are illustrated in Fig. 1B. Within the chromosomes from each genome, two out of six breakpoints involved in the rearrangements were associated with the retrotransposons TCN1 and TCN6. For example, we found TCN1 and TCN6 elements at the junctions within WM276 chromosomes 4 and 10 at the locations where sequences aligned to B3501A chromosomes 4 and 9 were juxtaposed. These junctions are located within the centromeres of these chromosomes, a finding consistent with the highly repetitive nature of these regions. We hypothesize that three steps would be needed to rearrange the chromosomes in the manner observed in Fig. 1B starting from common ancestral sequences and that one of the steps may have involved the TCN1/TCN6 retroelements at the centromeres of the respective chromosomes. These rearrangements were also observed when the R265 genome was compared with that of B3501A, indicating that the chromosome rearrangements are ancient in the C. gattii lineage (data not shown).
The differences in chromosome arrangements between C. gattii and C. neoformans may have functional significance with regard to virulence, and it is possible that the observed rearrangements are indicative of dynamic genome changes that contributed to speciation. The mating type locus (MAT) has been associated with virulence in C. neoformans (42, 43), and this locus was found on rearranged chromosome 9 in both of the C. gattii genomes. In general, chromosomal rearrangements, segmental duplications and whole-chromosome copy number variations have been described for C. neoformans and are well documented in other fungi (44–51). Genomic rearrangements can serve as direct targets for natural selection and can accumulate in different lineages to contribute to the genotypic divergence and speciation through inhibition of proper pairing and recombination of rearranged chromosomes (52). Certainly, the presence of the rearrangements in both of the C. gattii genomes supports the idea that these chromosomal changes contributed to speciation within cryptococci.
To assess gene content differences, we also compared the C. gattii and C. neoformans genomes by reciprocal BLAST (32) of their gene sets and identified genes exclusively found in JEC21 (254 genes) or WM276 (565 genes) (see Data Set S1, tabs C and D, in the supplemental material). The JEC21 genome was used in this analysis because of the detailed gene annotations available for this genome and because of the nearly identical gene sets between JEC21 and B3501A (27). The majority of the genes encoded hypothetical proteins. However, examples of orthologs found in JEC21 but absent in WM276 included those encoding the 60S ribosomal protein L31, a haloacid dehalogenase, an inositol/phosphatidylinositol kinase, an alpha-l-arabinofuranosidase, and a sphingosine-1-phosphate phosphatase (see Data Set S1, tab C, in the supplemental material). Examples of orthologs found in WM276 but absent in JEC21 included those encoding the Rad51-like DNA repair protein and several enzymes such as phenylacrylic acid decarboxylase, arsenate reductase, 6-phosphogluconolactonase, haloalkanoic acid dehalogenase, and isochorismatase (see Data Set S1, tab D, in the supplemental material). The functional significance of these differences in gene contents warrants further investigation; in particular, the function of the isochorismatase is interesting because this enzyme has a predicted role in catecholic siderophore/secondary metabolite biosynthesis (53–55). In addition, the isochorismatase gene was part of a deletion associated with loss of virulence in a WM276 mutant, as described below.
Cognate isochorismatase orthologs of the WM276 protein were found in C. gattii strain R265 and C. neoformans var. grubii strain H99 but not in C. neoformans var. neoformans. Although the latter finding agrees with the inability of C. neoformans var. neoformans to produce siderophores (56, 57), divergent paralogs with weaker similarity to the isochorismatase domain were present; this property has not been investigated in C. gattii to our knowledge. Moreover, catecholic siderophore biosynthesis has not been well characterized in fungi. While a BLAST query of the nonredundant database with the WM276 ortholog using default criteria did not find any significantly similar proteins in other basidiomycetes or Saccharomyces cerevisiae, we found significant hits to proteins in other ascomycete fungi such as Aspergillus spp. (E value = 5E−58) and Botryotinia (E value = 8E−68).
All of the predicted protein sequences from the five Cryptococcus genomes (WM276, R265, JEC21, B3501A, and H99) were clustered into orthologous groups using OrthoMCL (58), (see Materials and Methods in Text S1 in the supplemental material). OrthoMCL includes recent paralogs within ortholog groups as within-species BLAST hits that are reciprocally better than between-species hits. Single-copy orthologs were identified as the clusters with exactly one member per species. Following the identification of 5,171 groups of single-copy orthologs conserved among the five strains, alignments for each of these groups were concatenated (2,817,121 characters), and a phylogenetic tree was generated using the maximum likelihood analysis method implemented in PhyML3.0 (59). We calculated the time since divergence of C. neoformans var. grubii versus C. gattii to be ~34 million years (myr) based on the average branch length between C. neoformans var. grubii representative strain H99 and the C. gattii strains in the phylogenetic tree (0.1355) and assuming a commonly utilized neutral mutation rate of 2E−9 per nucleotide per year for protein-coding genes (60, 61). The multiortholog-based ultrametric tree was then generated by the PATHd8 algorithm (62) using the age estimate of 34 myr for the most recent common ancestor of the C. gattii and C. neoformans var. grubii lineages (Fig. 2A; see also Materials and Methods in Text S1 in the supplemental material). The divergence between the VGI and VGII C. gattii strains WM276 and R265 was found to be 12.4 myr, and this result advocates for the existence of speciation between these molecular types. In fact, phylogenetic analysis of a selection of globally collected isolates indicates considerable genetic variation within the Cryptococcus species complex, warranting C. gattii molecular types to be considered individual varieties, if not species (24, 63).
To examine phylogenetic relationships in the broader context of other basidiomycete fungi, we also carried out phylogenetic analysis with the five Cryptococcus genomes and other basidiomycete genomes, including the human pathogen Malassezia globosa (64), the plant pathogen Ustilago maydis (65), the wood-rotting fungus Phanerochaete chrysosporium (66), and the mushroom Coprinus cinereus (67). The ascomycetous fungus Saccharomyces cerevisiae was included as a distantly related outgroup taxon. Following identification of 1,519 single-copy ortholog groups conserved among the 10 genomes, alignments for each of these groups were concatenated (837,857 characters), and a phylogenetic tree was generated by maximum likelihood analysis, as described above. The resultant phylogenetic tree was calibrated based on a recent estimate of ~500 million years of divergence between ascomycetous and basidiomycetous fungi (68) (Fig. 2B). According to this calibration, the two C. gattii molecular types would have diverged about 11 myr ago, an estimate similar to the one based solely on the Cryptococcus phylogeny described above. It appears that the cryptococci, representatives of the class Tremellomycetes, diverged about 291 myr from the common ancestor of Phanerochaete chrysosporium and Coprinus cinereus (class Agaricomycetes). We also carried out analysis of the evolution of gene families identified among the five cryptococci and the other fungi mentioned above, and we compared the mitochondrial gene contents of all of the sequenced Cryptococcus genomes (see Text S1 in the supplemental material).
Comparative genome hybridization (CGH) studies were performed with the WM276 genome to begin an analysis of genome variation within the VGI molecular type of C. gattii. A whole-genome tiling array was initially employed to characterize a transformant of strain WM276 (WM276gfp2) that displayed a change in chromosome 11, as discovered by electrophoretic karyotyping (see Text S1, p. 24 in the supplemental material). This strain, which was generated by transformation with a gene encoding the green fluorescent protein, was also found to be avirulent in a mouse inhalation model of cryptococcosis (data not shown). CGH analysis revealed that a telomeric region of ~75 kb was missing on chromosome 11 in strain WM276gfp2 (see Text S1, p. 25 and 26 in the supplemental material). This region contained 24 genes, including a number of putative sugar transporters and glycosyl hydrolases (see Data Set S1, tab I, in the supplemental material). The potential for phenotypic consequences of the deletion was demonstrated by confirming that the mutant had a growth defect on raffinose, as expected from the deletion of the invertase gene CGB_K4300C (data not shown). The deleted region also encoded an arsenite transporter, a peptidyl-prolyl cis-trans-isomerase, a Ras GTPase, an isochorismatase, and a copper-exporting ATPase. As mentioned above, the cognate WM276 isochorismatase ortholog was not found in the C. neoformans var. neoformans genomes. Overall, this analysis with the WM276 genome illustrated the utility of CGH to characterize genome variability, although additional work is needed to determine whether the loss of specific genes and/or mutations elsewhere in the WM276gfp2 genome account for the virulence defect. The number of variant genes in each C. gattii strain analyzed by CGH (including the clinical, environmental, and fluconazole-resistant strains) is indicated in Table S3 in the supplemental material.
An extensive collection of clinical and environmental isolates has been obtained as part of the analysis of the C. gattii outbreak on Vancouver Island (26). A survey of selected isolates of the VGI and VGII molecular types revealed differences in virulence in the mouse inhalation model of cryptococcosis (Fig. 3). For example, the virulence of strains R794 and KB3864 in the VGI set of strains was attenuated in comparison to WM276, and we therefore analyzed the genomes of these isolates by CGH with the WM276 tiling array. Examples of regions of difference (insertions/deletions/sequence divergence) between WM276 and the clinical isolate R794 are shown in Text S1, p. 27 in the supplemental material (all chromosomes are shown in Text S1, p. 26 in the supplemental material), and variant regions are described in Data Set S1, tab J, in the supplemental material. A substantial number of variant regions were found on different chromosomes, and many of these were at telomeric and subtelomeric regions. The genes that were deleted or highly diverged encoded, for example, an alpha-glucosidase (CGB_E0010C), an inositol oxygenase (CGB_G2390W), a hexose transport-related protein (CGB_H0010C), and myo-inositol transporters (encoded on three different chromosomes: chr7, CGB_G2420C; chr10, CGB_J2530W; and chr12, CGB_L0070C). We also found a substantial number of genome differences in the comparison of the environmental isolate KB3864 with WM276 (see Text S1, p. 26 and Data Set S1, tab K, in the supplemental material). Some of these differences were the same as those found in R794, including variation in the regions containing genes encoding the alpha-glucosidase, the hexose transport-related protein, and a myo-inositol transporter.
Variations that potentially impact inositol metabolism are interesting, considering that this metabolite is found in high concentrations in the brain. Given the predilection of Cryptococcus for the central nervous system, the pathogen could potentially utilize Myo-inositol (a stereoisomer of inositol) as a sole carbon source through conversion to glucuronic acid by the action of Myo-inositol oxygenase (MIOX) (69). Variations in myo-inositol transporters in the VGI isolates are also noteworthy because myo-inositol transport has been implicated in mating and virulence (70). Additionally, transcriptome studies revealed that the transcript for myo-inositol phosphate synthase (MYO1) is abundant in vivo and that an inositol/phosphatidyl inositol phosphatase is upregulated upon phagocytosis of C. gattii by rat peritoneal macrophages (71–73). Recently, it was demonstrated that phosphatidylinositol 4-kinase is required for survival ex vivo in the hostile cerebrospinal fluid environment and within macrophages and for full virulence (74).
Attenuated virulence of the VGIIb environmental strain RB28 was observed relative to VGIIa strain R265 in the mouse model (Fig. 3). In addition, attenuated virulence in the VGIIb clinical strain R272 had been demonstrated by other research groups and in our previous studies of the immune response to C. gattii (24, 40, 75, 76). The tiling array for the VGIIa genome of strain R265 was therefore used in CGH experiments to examine the genomes of these VGIIb subtype strains. Examples of regions of difference are shown in Text S1, p. 28 in the supplemental material (all chromosomes are shown in Text S1, p. 29 in the supplemental material). A smaller number of differences were identified in the tiling array than in the VGI analysis, and the variant regions in strains R272 and RB28 are listed in Data Set S1, tabs L and M, respectively, in the supplemental material. Genes encoding a putative oxidoreductase and a hexose carrier protein were either deleted or highly diverged in both strains. Genes specifically deleted or diverged in strain R272 encoded a putative endoribonuclease L-PSP, a TPR domain-containing protein, and a tartarate transporter. Genes specifically amplified in strain RB28 encoded a putative 2,4-dichlorophenoxyacetate alpha-ketoglutarate dioxygenase, a deoxyribose-phosphate aldolase, and a beta-1,4-glucosidase. Overall, this analysis revealed extensive variation between strains, thus precluding simple explanations of differences in virulence based on genome content.
The WM276 and R265 tiling arrays were also used in CGH experiments to examine genome changes in isolates of C. gattii from Canada, Australia, and India that showed heteroresistance to 64 µg/ml of fluconazole. Heteroresistance is defined as the phenotypic manifestation of both drug resistance and susceptibility in mixed populations of a single clinical isolate (77). The two resistant VGI strains R1413F and R1412F each showed a different pattern of genome change compared with those of their parental strains. For strain R1413F, chromosomes 9 and 11 appear to have an elevated copy number of ~1.4 (based on a log2 ratio of ~0.5), and chromosome 10 appeared to be disomic (log2 ratio of ~1.0) (Fig. 4). These results suggest that the strain may contain a mixed population of cells, with different copy numbers present for the indicated chromosomes. Similarly, entire chromosomes 2, 9, and 10 have elevated copy numbers in the R1412F strain, and chromosomes 1 and 13 have elevated copy numbers for specific chromosomal segments. The latter chromosomes may have segmental duplications or more complicated rearrangements (e.g., translocations or the formation of isochromosomes). Surprisingly, this strain also had a reduced copy number for chromosome 14. One explanation for this observation is that the baseline ploidy of the strain may be diploid, and some chromosomes (e.g., 2, 9, and 10) may have copy numbers above 2N while chromosome 14 is present in a single copy. This conclusion is supported by fluorescence-activated cell sorting (FACS) experiments (78) that indicate that R1412F has a diploid character (see Text S1, p. 31 in the supplemental material). Variant regions in VGI fluconazole-resistant isolates are listed in Data Set S1, tabs N and O, in the supplemental material).
The four VGII isolates that showed heteroresistance to 64 µg/ml of fluconazole also each had a different pattern of chromosome changes relative to those of the parental strains. Three of the strains showed relatively simple changes, with R1401F displaying an elevated copy number for a portion of chromosome 1 (see Text S1, p. 30 in the supplemental material), R1346F showing an increased copy number for all of chromosome 3 (see Text S1, p. 30), and R1402F having elevated copy numbers for chromosomes 1 and 10 (Fig. 4). Variant regions in fluconazole-resistant VGII isolates are listed in Data Set S1, tabs P to S, in the supplemental material.
FACS analysis supported the conclusion that R1346F, R1401F, and R1402F were primarily haploid (see Text S1, p. 32 and 33 in the supplemental material). Strain R1347F had a more complicated hybridization pattern, with log2 ratios above 0 for all of chromosomes 3, 4, 5, 7, 8, 9, 10, and 14 and for segments of chromosomes 1 and 2 (Fig. 4). In contrast, log2 ratios below 0 were observed for chromosomes 6, 11, and 12, indicating that changes in ploidy had occurred relative to that of the parental strain R1347. FACS analysis of R1347F supported this conclusion because a mixed population of cells with ploidies above 2N was observed (see Text S1, p. 32 in the supplemental material). Overall, these results indicate that heteroresistance to fluconazole in strains of both the VGI and VGII molecular types of C. gattii is associated with changes in chromosome copy number and ploidy.
Disomic chromosomes have previously been identified in clinical isolates of C. neoformans var. grubii, and disomy has been recently associated with fluconazole heteroresistance in strains of both varieties of C. neoformans (49, 51). In particular, fluconazole resistance was attributed to an elevated copy number of chromosome 1 carrying the genes AFR1 (ATP binding cassette [ABC] transporter; major exporter of azoles) and ERG11 (cytochrome P450 lanosterol 14a-demethylase; target of fluconazole). We found an elevated copy number for chromosome 1 in four of the six strains that we examined, a result consistent with amplification of the azole-transporter gene AFR1 on this chromosome. We hypothesize that this may contribute in part to the observed fluconazole resistance, but amplification of chromosomes other than chromosomes 1 and 2 (carrying AFR1 and ERG11, respectively) suggests that other mechanisms of fluconazole resistance that are independent of either AFR1 or ERG11 may occur in the strains. We should also note that changes in chromosome copy number may be a more general response to selective pressure because increased ploidy has recently been described in C. neoformans during the process of giant cell formation in infected animals (79, 80).
The genome sequences of the VGI and VGIIa genotypes of C. gattii revealed that the majority of the 14 chromosomes are colinear, with some minor rearrangements, but that the strains show considerable variation in gene content and overall sequence identity. In addition, multiortholog phylogenetic analysis supports the existence of speciation within the C. gattii complex. CGH analysis also revealed considerable variation in clinical and environmental isolates as well as changes in chromosome copy numbers and ploidy in C. gattii isolates displaying fluconazole heteroresistance. The genome sequences and the comparative studies reported here provide an opportunity to further define virulence functions that are distinct from or similar to those of the better-characterized sibling species C. neoformans. In particular, the genome sequences support detailed examinations of cryptococcal traits that might eventually explain the predilection of C. gattii for immunocompetent hosts, in contrast to the predilection of C. neoformans for immunocompromised people. For example, the genomes will facilitate the analysis of the high intracellular proliferation rate observed in C. gattii strains from the outbreak (13, 75) as well as the finding that C. gattii strains induce less protective inflammation in mice than C. neoformans strains (40).
The whole-genome shotgun sequence assembler ARACHNE (81) was used to assemble Sanger sequencing reads for the WM276 genome, and the assembly was improved using information from bacterial artificial chromosome (BAC)-based physical maps (82, 83). Automated annotation of the WM276 genome was performed with an in-house genome annotation algorithm called Pegasys (28), based on comparisons with annotated gene models and expressed sequence tags (ESTs) for C. neoformans strain JEC21, followed by manual curation with the genome browser and editor Apollo (30, 31). A summary of the sequencing details is presented in Table S1, and see below for GenBank accession numbers.
The sequence of the R265 genome was obtained using Sanger sequencing and assembled with ARACHNE (81). The R265 genes were annotated primarily by transferring annotations from WM276 and also calling a small number of novel genes. R265 was aligned to WM276 using PatternHunter (84), and gene calls in aligned blocks were mapped from WM276 to R265 using an in-house mapping program. To call genes specific to R265, candidate gene structures were identified using GENEID (85), FGENESH (86), and GLEAN (87), and the resulting 108 genes were supported by predictions of a Pfam protein domain or alignment with an EST sequence.
PCR assays were employed to examine the AGO1, AGO2, CPR2, and CGB_A1730W genes in C. gattii strains (see Table S2 in the supplemental material). Additional details on DNA isolation, primer sequences, and PCR conditions, as well as sequence analysis, are provided in Text S1 in the supplemental material. Text S1 also contains a table of strains and lists of the software and data sources employed in the work.
Orthology data sets were generated, including 10-way clusters among selected basidiomycetes and S. cerevisiae and 5-way clusters between the sequenced Cryptococcus spp. For each data set, all predicted protein sequences from the appropriate genomes were searched against each other with BLASTP (32) and clustered into orthologous groups using OrthoMCL (58) with the default criteria (E value < 1E−5). Among the five Cryptococcus strains, 5,171 single-copy orthologs were identified as the clusters with exactly one member per species. Multiple sequence alignments were constructed with MUSCLE (88), and the alignments were trimmed using a heuristic method implemented in trimAl (89), with the automated option that selects optimal parameters to trim the input alignment. Alignments for all 5,171 clusters were concatenated into a single file containing 2,817,121 characters, converted to the Phylip format. Phylogenetic analysis of the five Cryptococcus strains was performed using maximum likelihood method PhyML3.0 (59) implemented in SeaView 4 (90). The JTT (Jones, Taylor, Thornton) amino acid substitution model (91) was used, along with the tree topology search operation that combines NNI (Nearest Neighbor Interchange) and SPR (Subtree Pruning and Regrafting) moves, the proportion of invariable sites and category of substitution rate were optimized by the program, and gaps were treated as unknown characters. The starting tree to be refined by the maximum likelihood algorithm was a distance-based BIONJ (BIO Neighbor Joining) tree estimated by the program (59). Statistical support for phylogenetic grouping was assessed by approximate likelihood ratio tests based on a Shimodaira-Hasegawa-like procedure (SH-aLRT) (92) and by bootstrap analysis (500 resamplings). An ultrametric tree was generated using PATHd8 (62), with the maximum likelihood tree as a starting point and fixing the age of the most recent common ancestor (MRCA) involved in the C. neoformans (H99)-C. gattii split at 34 myr. This age was derived using the neutral mutation rate of 2E−9 per nucleotide per year for protein-coding genes. While PATHd8 does not assume a molecular clock exists, it runs a clock test, allowing for substitution rate variation along all lineages. Molecular clock tests indicated that three-fourth of nodes were rejected at a confidence level of 0.95. Pathd8 parameters are provided in Materials and Methods of Text S1 in the supplemental material. This file also provides details of the methods used for testing the phylogenetic relationships of the five Cryptococcus strains with S. cerevisiae, U. maydis, C. cinereus, M. globosa, and P. chrysosporium.
The MCL (Markov Cluster) algorithm was used to globally identify gene families in the fungal genomes in our data set. MCL detects proteins with very similar domain architectures rather than attempting to detect each domain individually, thus accurately assigning proteins (even ones with different domain structures) into distinct multigene families (93). The algorithm CAFE (computational analysis of gene family evolution) was used to detect significant gene family size changes between any two lineages (94, 95). For additional details, see Text S1 in the supplemental material.
Two representative strains each from Canada (R.B.-13, R.B.-14), Australia (RAM-15, VPB571-058), and India (B-5765, B-5788) were used to study heteroresistance to fluconazole and for CGH and FACS analysis. The methods for isolation of heteroresistant strains of C. gattii have been described (96), and additional details of strain processing, CHG analysis, and FACS are provided in Text S1 in the supplemental material (and in reference 49). Note that we have redesignated the previously isolated C. gattii strains (in parentheses) as follows: R-1346 (R.B.-13), R-1347 (R.B.-14), R-1401 (RAM-15), R-1402 (VPB571-058), R-1412 (B-5765), and R-1413 (B-5788). For FACS analysis, cells of fluconazole-resistant C. gattii strains were grown in yeast extract-peptone-dextrose (YPD) medium with fluconazole, harvested from liquid medium at log phase, and processed for flow cytometry as described previously (78, 97).
Two virulence assays were performed using a mouse model of cryptococcosis. In the first, the virulence of the C. neoformans var. grubii and C. gattii strains expressing green fluorescent protein (GFP) fusions was tested using 10 C57BL/6 female mice per strain. The second assay analyzed the virulence of VGI and VGII strains of C. gattii. In this experiment, 10 A/JCr female mice were inoculated intranasally with 5 × 104 cells of each strain (RB28, R794, and KB3864) and monitored for illness over 2 months. The experiment also included the VGI strain WM276, the VGIIa strain R265, and the VGIIb strain R272. The virulence data for these three strains were previously reported (40), as described in the legend to Fig. 3. Survival data were analyzed using Kaplan-Meier curves, and the groups of mice infected with different strains were compared by using the log rank test to assess statistical significance and confidence of the virulence data. Additional details are provided in Text S1 in the supplemental material.
The virulence assays employing mice were carried out in strict accordance with the guidelines of the Canadian Council on Animal Care. The protocol for the assays was approved by the University of British Columbia Committee on Animal Care (protocol A07-0117).
Sequences were deposited in GenBank with the following accession numbers: chr1, CP000286; chr2, CP000287; chr3, CP000288; chr4, CP000289; chr5, CP000290; chr6, CP000291; chr7, CP000292; chr8, CP000293; chr9, CP000294; chr10, CP000295; chr11, CP000296; chr12, CP000297; chr13, CP000298; and chr14, CP000299. The assembled R265 genome and annotations were submitted to GenBank (project accession number AAFP01000000).
We thank Thomas Sharpton (Gladstone Institute, CA) and Scott DiGuistini (Vancouver, British Columbia, Canada) for unconditional advice on the gene family evolution analysis. We also thank the Cryptococcal working group and the British Columbia Centre for Disease Control, especially Eleni Galanis, for advice, as well as Chris Walsh, Han Hao, and Brett Finlay for additional assistance.
The sequencing and annotation of the WM276 genome was funded by Genome Canada and Genome British Columbia and by grants from the National Institute of Allergy and Infectious Disease (R01 AI053721) and the Canadian Institutes of Health Research to J.W.K. Sequencing and annotation of the R265 genome at the Broad Institute was supported by the National Human Genome Research Institute (grant U54HG003067). Additional support was obtained from an R01 grant (AI50113) to J.H. and an intramural program grant to K.J.K.-C. and A.V. from the National Institute of Allergy and Infectious Diseases and the National Institutes of Health (NIH/NIAID, MD), the National Health and Medical Research Council Australia Research grant 990738 to W.M. for the construction of physical maps of the C. gattii genomes, and a grant to D.C. by the Howard Hughes Medical Institute under the International Scholars Program (55000640).
Citation D’Souza, C. A., J. W. Kronstad, G. Taylor, R. Warren, M. Yuen, et al. 2011. Genome variation in Cryptococcus gattii, an emerging pathogen of immunocompetent hosts. mBio 2(1):e00342-10. doi:10.1128/mBio.00342-10.