Seven major clades of the Fusarium genus and the incongruence of gene trees
The ML trees inferred from each of the concatenated parts of the rRNA cluster region (rDNA cluster), E-tub EF-1 D and lys2 are displayed in Figures , , and , respectively. The tree topologies of the gene sequences were not consistent with each other. However, all of the gene trees supported the classification of Fusarium species into 7 major clades, namely, clades I to VII. Most of the support values for these clades were very high (more than 95% bootstrap value; BP), with the exceptions of clade I (71% BP) and clade II (75% BP) of the β-tub ML tree, clade V (88% BP) and clade VII (55% BP) of the rDNA cluster ML tree, and clade VII (<50% BP) of the lys2 ML tree. Our ML tree clearly indicated 7 major clades within the Fusarium genus for the first time.
Figure 1 Maximum likelihood trees for the Fusarium genus and related genera inferred from the rDNA cluster including 18S rDNA, ITS1, 5.8S rDNA and 28S rDNA. The GTR + I + Γ model was used as the model for nucleotide substitution. Branch lengths are proportional (more ...)
Figure 2 Maximum likelihood trees for the Fusarium genus and related genera inferred from β-tub. The GTR + I + Γ model was used as the model for nucleotide substitution. Branch lengths are proportional to the estimated number of nucleotide substitutions. (more ...)
Figure 3 Maximum likelihood trees for the Fusarium genus and related genera inferred from EF-1α. The GTR + I + Γ model was used as the model for nucleotide substitution. Branch lengths are proportional to the estimated number of nucleotide substitutions. (more ...)
Figure 4 Maximum likelihood trees for the Fusarium genus and related genera inferred from lys2. The GTR + I + Γ model was used as the model for nucleotide substitution. Branch lengths are proportional to the estimated number of nucleotide substitutions. (more ...)
Many of taxonomic studies based on morphological characters have reported that some "sections", including closely related species, share some "synapomorphic" character states. Clade I consists of F. larvarum and F. merismoides which belong to different "sections", namely, Eupionnotes and Arachnites, respectively. Although the β-tub and EF-1α sequences supported the monophyly of F. merismoides, the rDNA cluster supported a paraphyletic relationship for this species. Clades II, III, and IV consist of single species, namely, F. dimerum, F. solani, and F. decemcellulare, respectively. Clade V contains 2 "sections": Elegans, which consists of F. oxysporum, and Liseola, which consists of F. subglutinans, F. proliferatum and F. verticillioides. Clade VI consists of F. lateritium, F. avenaceum, and F. tricinctum, which belong to different "sections", namely, Lateritium, Roseum, and Sporotrichiella respectively. The paraphyly of F. avenaceum and F. lateritium was supported by all the genes. Clade VII contains 4 "sections" with 9 species: Eupionnotes consisting of F. incarnatum, Gibbosum consisting of F. equiseti and F. acuminatum, Discolor consisting of F. graminearum and F. culmorum, and Sporotrichiella consisting of F. poae, F. kyusyuense, F. sporotrichioides, and F. langsethiae. Our ML trees based on each of the rDNA cluster and the 3 genes indicate that the species in each of the clades (I to VII) have close relationships with each other.
Some taxonomic groups which were not previously identified using the morphological species concept, have already revealed by molecular phylogenetic analyses previously reported. O'Donnell et al. [14
] showed that there are species complexes including more than 2 species, such as the Gibberella fujikuroi
species complex. O'Donnell et al. [14
] and O'Donnell and Cigelnik [14
] showed that some sections proposed by morphological studies form paraphyletic or polyphyletic groups, such as Sporotrichiella
. This study supported their results. Moreover, our results indicated some new taxonomic groups, such as clade I and VI. The close relationships in two clades are discussed in the paragraph of "Verification of monophyly of the sections and species of the Fusarium
genus", as described below.
Evaluation of incongruence of the gene trees
The phylogenetic relationships among clades I to VII differ from each other, as described above. To evaluate this incongruence, we compared the log-likelihood scores of the relationships among clades I to VII (Figure ) (see Additional file 1
). The absolute values of the log likelihood scores of the ML trees, and the differences between the log-likelihood scores of the ML tree ± 1SE and of the alternative trees are displayed in Table . It can be concluded that. no significant differences were observed between the EF-1α
, rDNA cluster, and β-tub
ML trees, but all of these differed from the lys2
ML tree (p-value of the SH test <0.001).
The comparison of the tree topologies for relationships of clades I to VII based on each gene
What is the reason for the differences in tree topology observed among the genes? One possibility is that the tree inference is an artefact. Another possibility is that the genes have different evolutionary histories. In the former case, the main factors which can lead to misleading tree inferences include (1) long-branch attraction, (2) composition bias of nucleotide and amino acid, and (3) convergent evolution.
Long-branch attraction mainly occurs as a consequence of rapidly evolving sites, and removal of such sites from the analysis can reduce the effects of long-branch attraction [18
]. Accordingly, we excluded the fast evolving sites of lys2
. After the 3rd
codon position sites or synonymous substitution sites had been excluded, the tree topology remained essentially the same as the ML tree displayed in Figure . Furthermore, even when the ML tree was inferred based upon the 2nd
codon positions only, the tree topology remained essentially the same as the ML tree displayed in Figure . Therefore, the possibility of long-branch attraction is unlikely.
Extreme composition bias of nucleotides and amino acids can strongly mislead tree inference [19
]. Therefore, we examined composition bias of lys2
nucleotides and amino acids in the lineages within our dataset. However, no composition bias was observed in the amino acid sequences or the combined 1st
codon position sites. Composition bias of the 3rd
codon position sites was detected in F. lateritium
(MAFF235344 and MAFF840045), and of the combined 1st
codon position sites in F. tricinctum
(CBS393.93, ATCC38183, and MAFF235551). These species both belong to clade VI. Therefore, it is unlikely that nucleotide composition bias can explain the phylogenetically "misleading" observed among clades I to VII in ML trees displayed in Figures , , and .
Although the principal theory of molecular phylogenetics is based on the neutral theory of molecular evolution [21
], convergent evolution is known to occur at the molecular level and can mislead the reconstruction of phylogenetic trees [22
]. As natural selection generally acts at the level of amino acid sequences, synonymous substitutions are unlikely to be affected by convergence. Therefore, we inferred the phylogenetic tree using only the 3rd
codon positions because substitutions at these sites are mainly synonymous. However, the lys2
ML tree inferred only using the 3rd
codon positions was essentially the same as the tree presented in Figure (data not displayed).
For these 3 reasons, it is unlikely that the incongruence of the lys2 ML tree and the other gene trees was due to an analytical artefact. Instead, the differences in tree topology may reflect differences in the evolutionary histories of the considered genes. Therefore, when we reconstructed the species tree, we excluded lys2 from the analysis.
Evaluation of genetic markers for phylogenetic reconstruction
To accurately reconstruct the phylogenetic tree, we selected the genes which displayed an adequate evolutionary rate. ML trees based on each individual gene or the rDNA regions are displayed in Figures , and and additional files 2
. The substitution rates of the 3 rDNA regions (5.8S, 18S, and 28S) were all slow. Although the substitution rate of ITS1 was faster than that of the rDNA genes, the sequence length was very short (101 bp). Therefore, the nucleotide sequences of each 4 rDNA regions were identical in several Fusarium
species. When the 3 rDNA genes and ITS1 were combined in a cluster, we could distinguish most of the species from the nucleotide sequence data. However, since only small differences were observed among species, some resolutions among the species in the same clade were unclear (Figure ). In contrast, the nucleotide substitution rates of the protein coding genes, namely, β-tub, EF-1α
, and lys2
, were rapid, and each of these genes had a high resolution for the relationship among conspecific strains or closely related species in the same clade of I to VII (the lower taxa) (Figures , and). However, in the cases of β-tub and EF-1α
, the alignment of amino acid sequences among Fusarium
species indicated that almost all of the substitutions were singletons, and parsimonious informative sites were limited (data not shown). Therefore, almost all of the substitutions which occurred within the genomes of Fusarium
species were synonymous.
Pairwise comparisons of the substitution distances for the nucleotide and amino acid sequences are displayed in Figure (Additional file 6
). The distances in these graphs were divided into 4 groups; between conspecific Fusarium
-strains, between Fusarium
-strains of different species in the same clade of I to VII, between Fusarium-
strains in different clades of I to VII, and between strains of Fusarium
species and other genera. As for the non-synonymous substitutions of β-tub
, the numbers of non-synonymous substitutions in these genes were too few to reconstruct the phylogenetic relationships especially among clades I to VII and among Fusarim
and its related genera (the higher taxa) in the lineage of the Fusarium
genus (Figure ). The other 4 regions, namely, the rDNA cluster, the synonymous substitutions of β-tub
, and the introns within EF-1α
, displayed the graphs forms split up into 2 groups: the synonymous substitutions of β-tub
/the introns within EF-1α
and the rDNA cluster/the synonymous substitutions of EF-1α
(Figure ). The graphs of the former group (Figures and ) were almost parallel to the × axis for the higher taxa. This implies that these 2 regions are completely saturated in the case of examining among the higher. However, these substitutions are useful for examining among the lower taxa. The graphs of the latter group (Figures and ) are not linear but still ever-increasing, and the substitution rates of this group are not so slow. This result suggests that these substitutions can provide information for phylogenetic reconstruction for both the higher and lower taxa.
Figure 5 Comparisons of the evolutionary distances for each gene. Number of non-synonymous substitutions per non-synonymous site (dN) plotted against the number of synonymous substitutions per synonymous site (dS) with all of the strains in this study. In all (more ...)
One of the aims of this study was to provide a comprehensive description of the phylogenetic relationships among Fusarium species and closely related species, including both the higher and lower taxa. Therefore, we require information obtained from multiple substitutions in multiple genes to reconstruct the phylogenetic tree. In this study, we considered all of the substitution information obtained from the rDNA cluster and the β-tub and EF-1α genes. Substitutions within lys2 were removed from our dataset because this locus is not suitable for the phylogenetic analysis of Fusarium and its related species. We had to modify the weighting of some nucleotides in the analysis because the information obtained from substitutions varied among sites with a partition model, as described in the materials and methods.
Phylogenetic relationships among clades of the Fusarium genus
The ML tree based on the combined data from the rDNA cluster and the β-tub
is displayed in Figure . The BP values of more than 75% are presented on the nodes. Our ML tree showed that M. nivale
does not belong to any of the 7 major clades of the Fusarium
species, and that this species, which is the common ancestor of Fusarium
species and N. crassa
, rapidly diverged during short term. Therefore, our analyses supported the previously described separation of M. nivale
from the Fusarium
Figure 6 Maximum likelihood tree of the Fusarium genus and related genera inferred from the combined sequences of the rDNA cluster and the β-tub, and EF-α genes. Taking into account the different tempos and modes of nucleotide substitutions, all (more ...)
] proposed that closely related sections have the same teleomorph genus, and O'Donnell et al. [25
] indicated that all Fusarium
species with teleomorphs belonging to the Gibberella
genus form a clade known as the Gibberella
clade. In this study, all of the examined species with teleomorphs belonging to the Gibberella
genus clustered in a super-clade consisting of clades V to VII (Figure ). Our results thus support the classification of the Gibberella
Marasas et al. [1
] and Pitt and Hocking [2
] were reviewed that many Fusarium
species produced many kinds of mycotoxins and these toxins are associated with human and animal health hazard. One of the main mycotoxins which naturally pollute agricultural crops, is trichotecenes or fumonisins. Our ML tree indicated that most Fusarium
species which produce trichothecenes, clustered in a super-clade consisting of clade VI and VII (Figure ). This relationship suggested that a common ancestor had acquired the capacity to produce trichothecenes, and then some species might lose this capacity.
Verification of monophyly of the sections and species of the Fusarium genus
In our ML tree (Figure ), many sections, which have previously been defined based upon morphological characteristics, were not verified as monophyletic. In this analysis, only Discolor
(100% BP) was supported by a BP value greater than 75%. Sections other than Discolor
, namely Arachnites, Eupionnotes, Gibbosum
, and Sporotrichiella
, which include 2 or more species in this study, formed the paraphyletic and polyphyletic groups. These results are consistent with previous studies reporting that taxonomic groups, which were traditionally classified using the morphological species concept, are not always recovered in molecular phylogenetic analyses of the Fusarium
genus. In particular, in the Eupionnotes
section, species were assembled using the morphological species concept with section-specific characteristics such as a very slow growth rate on potato dextrose agar (PDA; Eiken) a yeast-like appearance due to the absence of aerial mycelia, absence of microconidia, and small macroconidia generally having only 1 to 2 septa [5
]. Our results indicated that these characteristics are not synapomorphies shared among only species in this section, such as F. dimerum
and F. merismoides
. Moreover, morphological characteristics such as slow growth rate, color of colonies on PDA from below (white to tan), absence of microconidia, and small macroconidia appear in the basal lineages containing species of clades I and II, and M. nivale
. Therefore, we consider these characteristics to be ancestral, and that they remain in species of the basal lineages as symplesiomorphies. It is difficult to distinguish among the synapomorphies, the symplesiomorphies, and convergent derived characters. Species recognition on the basis of morphology often comprises several species that are recognized on the basis of molecular phylogeny.
Although the morphological species concept does not reflect the phylogenetic tree of the Fusarium
genus, this does not imply that morphological characteristics are not useful for identification and taxonomy. On the contrary, species recognition based on morphological characteristics is also useful for identifying unknown species because morphological characteristics can be widely applied to any species, not only those of the Fusarium
genus but also other fungi [3
isolates can be initially classified on the basis of morphological similarity, with the awareness that sections are in fact a means of artificial grouping. Thus, it is still necessary to use recognition methods based upon morphological characteristics in combination with the phylogenetic recognition method.
Although our ML tree was constructed with a high resolution, the relationships among species that together form a complex could not be resolved. Previous molecular phylogenetic analyses have suggested that a species of the Liseola
section, defined by traditional taxonomy based on the morphological species concept, actually includes multiple species, and that the re-defined novel species, which are recognized mainly by mating types, constitute the Gibberella fujikuroi
species complex [15
]. The ML tree obtained in this study (Figure ) demonstrates convoluted, nested structures of species within the Liseola
section. The monophylies of F. subglutinans, F. proliferatum
, and F. verticillioides
, which have been recognized by the traditional morphological species concept, were not recovered. When we recognize species based on the novel taxonomic system (see Table : species re-identified by molecular method), our ML tree could not resolved the phylogenetic relationships among re-identified species in the "Gibberella fujikuroi
species complex" excluding the sister relationship between F. phylophilum
and F. fujikuroi
. This difficulty of resolving phylogenetic relationships in this complex is probably caused by rapid divergence events occurred in this species complex. It is implied by the shortened internal branches in this species complex in our ML tree.
Strains of the genus Fusarium and Fusarium-related spece is used in this study
Furthermore, our study detected an additional intermingled, nested structure as a species complex containing the F. avenaceum/F. tricinctum/F. lateritium
clade (clade VI in Figures and ). Regarding F. avenaceum
, the affinities with F. acuminatum
have been suggested by previous morphological and molecular studies [8
]. However, other molecular studies have suggested that F. avenaceum
is more closely related to F. tricinctum
than to F. acuminatum
]. Moreover, the ML tree in this study suggested the presence of a F. avenaceum
clade (Figure ). The phylogenetic hypothesis of a sister-species relationship between F. avenaceum
and F. acuminatum
was completely rejected by our ML tree in Figure . Further studies with more strains of each of the species within these complexes are required for the clarification of taxonomic ambiguities.
Adaptive evolution of the lys2 gene within the Fusarium genus
The topology of the lys2 tree was very different from that of the other trees (Figure , , , and Table ), and it indicated that the Fusarium genus is paraphyletic. Interestingly, by further investigation using the branch-site model, we detected many branches which displayed evidence of positive selection (The p value of the likelihood ratio test <0.001), indicated by bold branches in our lys2 tree (Figure ). Generally, the incongruence of the species trees and the species trees were caused by the following three reasons: (1) ancestral polymorphism and incomplete lineage sorting, (2) gene duplication, and (3) horizontal gene transfer. We briefly describe these three hypotheses respectively as well as the difficulties of them in the following paragraphs.
In support of the first hypothesis, there was the polymorphism in the ancestral population of the Fusarium
genus and its related genera, and some alleles appeared to have been positively selected, and finally fixed in each lineage independently. However, we should assume that the ancestral polymorphism may have been maintained for a very long time, such as several hundred million years. For the 18S rDNA gene, the average number of nucleotide substitutions between Fusarium
and M. nivale
is 17.0 and that between Fusarium
and N. crassa
is 30.9. This is similar to the number of differences observed between the human and chicken 18S rDNA genes (24 substitutions), and these lineages are thought to have diverged approximately 320 million years ago [31
]. In the other hand, the average numbers of the nucleotide substitutions among Fusarium
species is 1.9. The large difference among three genera and the small difference within the Fusarium
genus indicate that it had taken long time until the emergence of the latest common ancestor of the Fusarium
genus after the split of three genus.
The second hypothesis is that a gene duplication of lys2 event may have occurred in the common ancestor of the Fusarium
genus and its related genera. Therefore, the amino acid sequences of these groups could radically change. In each lineage, 1 copy may have been positively selected, while the other may have become a non-functional pseudogene which was eventually purged from the genome. We found additional lys2
copies in the genomes of F. oxysporum
and Nectria haematococca
(anamorph; F. solani
), but not in the genomes of F. verticillioides
and Gibberella zeae
(anamorph; F. graminearum
), which were obtained from GenBank and the Fusarium
Comparative Database (http://www.broadinstitute.org/annotation/genome/fusarium_group/MultiHome.html
). We re-constructed the gene tree of lys2
with sequences from this study and from GenBank (Figure ). The tree suggests that gene duplications may occur repetitively. However according to our analysis using the branch model, the additional lys2
gene copies may still be subject to purifying selection (data not shown). Therefore, it is unlikely that 1 copy will be completely purged from the genome in each lineage after the alternative copy has been positively selected.
Figure 7 Maximum likelihood tree inferred from lys2. The GTR + I + Γ model was used as the model of nucleotide substitution. The bootstrap probability (BP; 1000 replicated) values more than 75% are shown on the nodes. Branch lengths are proportional to (more ...)
The third hypothesis is that horizontal gene transfer has repeatedly occurred from the genome of 1 lineage to another. Horizontal gene transfer in the fungal genome has previously been observed among various fungal species including F. oxysporum
]. We can infer when and where such events might have occurred during the evolutionary history of lys2
by comparing between the species tree (Figure ) and the gene tree (Figure ). The first horizontal gene transfer appears to have occurred from N. crassa
(or M. nivale
) to the Fusarium
species in clade III. A second horizontal gene transfer appears to have occurred from clade III to clades I and V, respectively. The hypothesis that repetitive gene transfer events have occurred explains not only the positive selection but also the distantly related phylogenetic positions between the original lys2
gene and the additional lys2
copies found only in F. oxysporum
and N. haematococca
(Figure ). However, the mechanism by which the new gene copy completely replaced the original copy in the hosts remains unclear.
Among these three hypotheses, the first hypothesis; ancestral polymorphism and incomplete lineage sorting, is unlikely. As mentioned above, although other two hypothesis can partially explain the incongruence of the gene trees and positive selections of lys2, the difficulties remain in all of the hypotheses. Hence, we cannot identify the reasons for them from our results.
gene is fungus specific and is related to the synthesis of lysine [15
]. Our results indicate that this gene has been subjected to positive selection within the Fusarium
genus. Therefore, the metabolism of lysine is expected to be similar among the species within clades I, III, and V and within clades II, IV, VI, and VII. However, this should be confirmed using biochemical experiments. An et al. [15
] and Watanabe et al. [16
] did not report multiple copies of lys2
in the genomes of fungal genera such as Aspergillus, Byssochlamys, Saccharomyces
, and others excluding Fusarium
. Therefore, research and detection of other genera containing multiple copies of lys2
in their genomes is required to understand the diversity of lysine-metabolizing systems of fungi. At the same time, we should note the difficulties of the estimation of the ω
ratios (the number of non-synonymous substitutions per non-synonymous site/the number of synonymous substitutions per synonymous site; dN/dS) in such a divergent taxon. The detection of the positive selection based on the ω
ratios is highly sensitive to the saturation of the synonymous substituions. Since the relative evolutionary rate of the lys2
is high among the genetic markers used in this study [34
], it is possible that the synonymous substitutions of lys2
were already saturated. However, the numbers of the multiple synonymous substitution at the same synonymous sites (multiple hits) were well estimated by the ML method using the codon substitution model, and ML method effectively correct the effect of multiple hits (Additional files 7
). Moreover, w e applied the strict criterion to evaluate the statistical significance (p < 0.001) to completely exclude the possibility of the overestimation of the detection for the positive selection. Therefore, we could completely exclude the possibility that some of the detected positive selections were false-positive.
Further understanding of the evolutionary processes of lys2 and other genes is very important. To date, the full genome sequences of Fusarium are available for only 4 species of the Gibberella clade. The full genome sequences from other clades of Fusarium may elucidate the evolutionary processes of the genome, and such studies are currently in progress.