|Home | About | Journals | Submit | Contact Us | Français|
We hypothesized that some amino acid substitutions in conserved proteins that are strongly fixed by critical functional roles would show lineage-specific distributions. As an example of an archetypal conserved eukaryotic protein we considered the active site of β-tubulin. Our analysis identified one amino acid substitution—β-tubulin F224—which was highly lineage specific. Investigation of β-tubulin for other phylogenetically restricted amino acids identified several with apparent specificity for well-defined phylogenetic groups. Intriguingly, none showed specificity for “supergroups” other than the unikonts. To understand why, we analysed the β-tubulin Neighbor-Net and demonstrated a fundamental division between core β-tubulins (plant-like) and divergent β-tubulins (animal and fungal). F224 was almost completely restricted to the core β-tubulins, while divergent β-tubulins possessed Y224. Thus, our specific example offers insight into the restrictions associated with the co-evolution of β-tubulin during the radiation of eukaryotes, underlining a fundamental dichotomy between F-type, core β-tubulins and Y-type, divergent β-tubulins. More broadly our study provides proof of principle for the taxonomic utility of critical amino acids in the active sites of conserved proteins.
The online version of this article (doi:10.1007/s00239-010-9338-y) contains supplementary material, which is available to authorized users.
Phylogeny reconstruction methods aim to establish common descent between taxa, and advances in not only molecular sequence data generation, but also phylogenetic analysis tools and techniques have led to considerable progress in the quest to better understand evolution. Unfortunately though, phylogenies reconstructed from different markers can exhibit discrepancies which can be caused by the underpinning marker undergoing complex evolutionary processes such as parallel or convergent evolution rendering the signals left behind by these processes ambiguous. Among many conceivable markers, those that are based on “rare molecular events” have the potential of being good markers for phylogeny reconstruction. Those used to date include gene fusion, recombination, insertion or deletion, SINESs and LINEs (Rokas and Holland 2000). These can have advantages over traditional phylogenetic comparisons which sometimes conflict depending on the sequence chosen (Edgcomb et al. 2001; Stechmann and Cavalier-Smith 2002; Steenkamp et al. 2006; Van de Peer et al. 2000). Once found, analysis of such rare event markers gives rise to straight-forward and readily interpretable phylogenetic analyses. Confounding such analyses are the risks of multiple similar events rather than a single instance, and the possibility of reversion. These risks are reduced as more such events are discovered and analyzed but such markers remain difficult to identify and are by definition scarce.
Single amino acid polymorphisms in even the most evolutionarily conserved proteins are not rare; substitutions in coding sequences are the coin of protein evolution. The selection pressure on particular amino acids critical for structure and function to remain invariant can, though, be enormous making transitions in some amino acids in the functional regions of conserved proteins rare. When such amino acids do overcome this intense selection pressure and mutate to a different amino acid, particularly when the change is not a conservative one, it may only be possible by way of commensurate and contemporaneous changes elsewhere in the protein or its partner proteins, potentially enabled by alternative strong selective pressures. In this study, we focussed initially on the active site of the canonical eukaryotic cytoskeletal GTPase β-tubulin as a prospective source of amino acids under intense conservative selective pressure. We considered whether transition of amino acids within the active site would show lineage-specific distributions useful for phylogenetic analysis and therefore might be associated with functional differences observed between taxa including microtubule dynamics and pharmacological susceptibility.
Microtubules are a defining feature of eukaryotes representing key components of the cytoskeleton and mitotic spindle. They are built from repeating αβ-tubulin heterodimers and under physiological conditions α- and β-tubulin bind one molecule of GTP each. While the nucleotide bound to α-tubulin is non-exchangeable, the intrinsic GTPase activity of β-tubulin catalyses the hydrolysis of GTP bound to this site. This enzymatic activity is a prerequisite for the dynamic instability of microtubules, which in turn influences their biological function (Downing and Nogales 1999). As a consequence, GTP analogues which bind to β-tubulin and inhibit its activity cause disruption of microtubule function (Muraoka et al. 1999).
All eukaryotic lineages so far discovered possess β-tubulin and the amino acids of the GTP binding site of β-tubulin are closely conserved with many showing no polymorphism at all across the eukaryotic tree. A number of three-dimensional structures have been solved for tubulin of mammalian origin, including bovine αβ-tubulin (Lowe et al. 2001; Nettles et al. 2004; Nogales et al. 1998; Ravelli et al. 2004) and human γ-tubulin (Aldaz et al. 2005), no such data is yet available for tubulins from other eukaryotic lineages. Based on what we know from bovine β-tubulin, the amino acid residues in the GTP-binding site of β-tubulin that make direct contact with the nucleotide are particularly highly conserved, which is unsurprising given that the correct binding of the nucleotide is essential for β-tubulin function. Residues that are involved in aligning the nucleotide in the correct orientation include N206 (Fygenson et al. 2004) and N228, which participate in hydrogen bonding with the exocyclic amino and oxo groups of GTP; a string of mostly polar amino acids (S174, N179, E183) that interact with the ribose; and a group of mostly lipophilic residues (I16, L227, V231) which form a hydrophobic pocket accommodating the nucleobase (Fig. 1) (Lowe et al. 2001). All of these residues are strictly conserved across eukaryotes.
One incompletely conserved residue that makes direct contact with GTP in the bovine β-tubulin structure is Y224 (Fig. 1). This residue is particularly critical for the correct orientation of GTP as it participates in a complex network of interactions with the nucleotide (e.g. hydrogen bonding with 2′-OH, π–π stacking with the nucleobase) (Lowe et al. 2001). From the sequence data available, this residue is conserved in the β-tubulins of animals, fungi, the choanozoa, amoebozoa and alveolates (Table 1). Intriguingly, in plants, Y224 is replaced with F224. In this context, this is not a conservative change, rather, the absence of the aromatic hydroxy group compared to the mammalian protein means that the pattern of protein-nucleotide interactions (in particular those involving hydrogen bonding and π–π stacking of the GTP nucleobase) is significantly different.
In order to determine the distribution of the F224 substitution, we interrogated against a subset of β-tubulins from 15 discrete lineages which encompass most of the eukaryotic diversity so far described (Keeling et al. 2005). We found that only green plants, discicristates, jakobids, haptophytes and cryptophytes possessed F224 (Table 1) and that the Unikonts which are composed from animals, fungi, choanozoa and amoebozoa did not. Since this lineage-specific distribution made 224 a potentially valuable marker for resolving relationships deep in the eukaryotic phylogeny, we validated that this was always the case by interrogating the Uniprot database which provides a complete and nonredundant database of known, distinct β-tubulin sequences. Uniprot now contains over 2,000 distinct β-tubulin sequences spanning the major phyla of eukaryotes. Importantly, for organisms which possess multiple β-tubulin isotypes, all of the organism’s β-tubulin isotypes encoded by the same nucleus were found to have the same amino acid 224.
An interesting case was noted, however; the cryptomonad Guillardia theta which has two nuclei arising as a result of a presumed secondary endosymbiosis of a red algae cell into a non-photosynthetic host. The minor nucleus or nucleomorph retains its own complement of actin and tubulin genes the function of which is enigmatic (Douglas et al. 2001; Keeling et al. 1999). The β-tubulin genes of the G. theta host nucleus encode an F224 type protein while the nucleomorph β-tubulin genes encode Y224 type protein consistent with different eukaryotic lineages for the host and endosymbiont.
The change of state from tyrosine to phenylalanine can be accomplished by a single nonsynonymous point mutation (nsSNP), so the fact that identity of the β-tubulin 224 amino acid does not vary stochastically across the eukaryotic tree is remarkable. It implies considerable evolutionary restraint which is consistent with the critical functional role of this amino acid for the correct binding of GTP. However, β-tubulins are extremely well conserved across their entire sequence and thus a similar situation may occur with other amino acids. We therefore investigated whether other amino acids of β-tubulin might be lineage specific.
A straightforward way to screen for phylogenetic markers is to evaluate their transitions on an evolutionary tree. To this end, we generated a tree based on the one put forward by Keeling, which proposes 5 discrete supergroups encompassing the tree of life (Keeling 2007; Keeling et al. 2005) but then merged the Chromoalveolates and Rhizaria in light of subsequent publication (Burki et al. 2007; Hackett et al. 2007) (Fig. 2). Extending our analysis of 224 to an amino acid by amino acid analysis of the whole of β-tubulin indicated that whereas several of the phylogenetic groups could be recovered by specific amino acids, this was not the case for supergroups other than the (opisthokonts and unikonts). In doing so, however, we identified a number of amino acids, insertions and deletions in the β-tubulin sequences which appeared to have lineage specific distributions (notably amino acids 157, 211, 224, 232, 296, 357, 363, 389) and thus might be good candidate phylogenetic markers.
Outside of the green plants, discicristates, jakobids, haptophytes and cryptophytes all β-tubulins currently sequenced (including all the bacterial β-tubulins—BtubBs—of Prosthecobacter) (Schlieper et al. 2005) express Y224—with just five exceptions, all of which are found within the Fungi. In three cases (Geotrichum candidum, Microbotryum violaceum and Sporidiobolus pararoseus), H224 is expressed in place of Y224, in a fourth, Yarrowia lipolytica, I224 replaces Y224. In one case, a symbiotic fungus of plant roots does express a β-tubulin with an F224 substitution. This fungus is Paxillus fumigatus, a common, ectomycorrhizal fugus, which is particularly associated with mutualistic growth on the roots of forest plants.
The anomalous presence of F224 in the β-tubulin of P. fumigatus, combined with its close relationship with plant roots led us to consider the possibility that the β-tubulin was conferred horizontally from plants. In basic local alignment searches (BLAST) (Altschul et al. 1990) of the P. fumigatus β-tubulin sequence against the Uniprot and nr databases, the top hundred homologies were restricted to fungi, animals and choanozoa, indicating a likely lineage from fungal rather than plant β-tubulins. Plants and plant roots are associated with the production of anti-microtubule drugs, in particular taxols, vinca alkaloids and colcemids, all of which bind in close proximity or juxtaposition to the F224 site (Downing and Nogales 1999) and would therefore represent a powerful selective force on a root endosymbiont to evolve a more plant-like β-tubulin. Although the P. fumigatus protein is most similar to other fungal β-tubulins, if only those residues which show an evolutionary bias in distribution are considered when comparing P. fumigatus to other fungi and to plants, the β-tubulin of P. fumigatus is plant-like at some key residues. For instance, the amino acids 224, 231, 259, 260, 315 are normally Y, V, M, V, V for fungi, choanozoa and animals but are F, I, L, I, A for plants—P. fumigatus is F, I, L, I, A at these residues.
The most similar β-tubulin sequence to that of P. fumigatus in the database is the β-2-tubulin from another boletales fungus—Suillus bovinus. Neighbor-Net analysis suggests these two β-tubulins to be a highly divergent subgroup of β-tubulins with no relationship to plant tubulin (Supplemental Fig. 1). The S. bovinus β-tubulin is also plant like at several of these residues but importantly retains the Y224. Since we could find no direct evidence for even partial gene conversion, we investigated whether the similarity of these two β-tubulin sequences might give some insight into the requirements to allow switching between F224 and Y224. The sequences of S. bovinus and P. fumitgatus are plantlike at several residues (Fig. 3) but only at the 224 and 260 residues does P. fumigatus possess a plant-like amino acid when S. bovinus does not. This implies a role for the V to I substitution at 260 in enabling the switch from Y224 to F224. The sequence either side of F224 is also remarkable in P. fumigatus. These residues are normally highly conserved in evolution, and residue 223 is normally either S or T. P. fumigatus is highly unusual in substituting an R at this position. Similarly, residue 225 is almost invariably a G but is substituted for D in P. fumigatus. These residues are also different to those of S. bovinus (themselves divergent from other fungi) and it may be therefore that these very unusual substitutions are also part of the requirement for Y224 to F224 transition.
Although amino acid 224 was group specific, when mapped to the supergroup tree the supergroups were not recovered completely. The unikonts were homogenously Y224 whereas F224 and Y224 were present in all the other supergroups (see Fig. 2). To help understand why we performed a Neighbor-Net analysis (Bryant and Moulton 2004) on the same 37 β-tubulins. As can be clearly seen, the Neighbor-Net in Fig. 4 is overall rather netted in appearance, suggesting that the β-tubulins might simultaneously support conflicting evolutionary scenarios. However, the β-tubulins are resolved into two distinct groups: one group of less divergent or “core” β-tubulins and a second group of more divergent β-tubulins (both groups are marked by a correspondingly labelled arc in Fig. 4). F-type β-tubulins (marked blue) are almost completely restricted to the core β-tubulins but this group also contains some Y-type β-tubulins (marked red). Conversely, the part of the network connecting the divergent β-tubulins is almost exclusively composed from Y-type β-tubulins.
To gain more insight into the two distinct parts of the network in Fig. 4, we repeated our Neighbor-Net analysis by excluding all core β-tubulins (supplemental Fig. 2a) or all divergent β-tubulins (supplemental Fig. 2b). Not surprisingly, the Neighbor-Net of the divergent β-tubulins remains highly netted, however, the Neighbor-Net of the core β-tubulins now resolves. In common with the aforementioned amino acid analysis, both Neighbor-Nets recover several of the recognized groups. Notably within the divergent β-tubulins (supplemental Fig. 2b) the animal, choanozoa and fungi groups are recovered correctly individually, and as the well-defined opisthokonts group. Within the core group of β-tubulins all groups are recovered correctly by the network; while F-type β-tubulins group to the exclusion of Y-type β-tubulins (supplemental Fig. 2a). Interestingly, except for the unikonts, the supergroups are separated. In particular, the rhizaria are clearly split between the foraminifera which are divergent and the cercozoa which are core. Similarly for the plantae, in which the red algae are divergent, while the green plants are core.
One F-type β-tubulin (Goniomonas) segregates with the divergent β-tubulins indicating that F224 is not an absolute restraint on β-tubulin diversification. It is important to note, however, that this β-tubulin does not associate with any of the other groups in the divergent part of the network and that it has a clear relationship with the other cryptophyte in our analysis (Guillardia theta—host) which groups with the core tubulins. Indeed, it is even possible to recover F-type β-tubulins as a group including Goniomonas. If this is done and the Neighbor-Net analysis is repeated on F-type and Y-type β-tubulins respectively, then recovery of phylogenetic groupings is as with the subanalyses of core and divergent β-tubulins (supplemental Figs. 3a and 3b).
A major feature of the tree proposed by Keeling in Keeling (2007) and Keeling et al. (2005) is the unresolvedness of its root implying that there is uncertainty regarding how the supergroups are related. This uncertainty correlates with the nettedness of the β-tubulin Neighbor-Net (Fig. 4) which is indicative of conflicting information relating to heredity. Such conflict is potentially driven by environmental pressures on the β-tubulin molecule driving convergent and parallel evolution. Nevertheless, group recovery is remarkably good and the clear resolution of more and less divergent groups of β-tubulins tempts speculation that the ancestral state may lie closer to the core β-tubulins than the divergent ones and indeed that within the core group, some groups such as the jakobids may possess β-tubulin with a sequence rather closer to the ancestral state than the β-tubulins of other groups. Although this would need to be explored using alternative means.
Taken together our data suggests a fundamental dichotomy in β-tubulins with core F-type β-tubulins being possessed by plant-like lineages and diverging Y-type β-tubulins by animals and fungi. The functional importance of this dichotomy is yet to be experimentally tested across the predicted range of organisms. Two areas in particular suggest themselves in which these differences may have functional consequences. First, the dynamic instability of microtubules is a characteristic closely linked to GTP hydrolysis. Microtubules incorporating core F-type β-tubulin may be more dynamic than those which incorporate divergent Y-type β-tubulins (Hush et al. 1994; Moore et al. 1997; Shaw et al. 2003). Second, some drug susceptibility profiles may also correlate. For instance, some β-tubulins incorporating core F-type β-tubulins (plant and trypanosome) are more refractory to colcemids than Y-type (animal) β-tubulins—with the colcemid binding site having been mapped to this area of β-tubulin and with substitution mutation of nearby amino acids (213, 226, 236) having already been associated with colcemid resistance (Hari et al. 2003).
In conclusion, our analysis identified an amino acid substitution in the active site of a conserved protein with good discriminative power. It can be argued that single amino acid substitutions which occur very rarely because they are under strong selective pressure not to, are likely to evolve convergently under a strong functional positive pressure rather than neutrally—so that in such cases any instances of loss of homoplasy may be informative. In the case of the identified F224 substitution, homoplasy arose in a root symbiont and it is interesting to speculate whether other closely symbiotic and lichenized fungi will prove similar exceptions. As far as we are aware, our analysis of the distribution of F224 is not inconsistent with the rare molecular events elucidated to date such as the TS-DHFR gene fusion (giving rise to the opisthokonts) and the EF1α 12 bp insertion. However, a tree based on these and the F224 transition as a single rare event would split the haptophytes and cryptophytes from the rest of the chromoalveolata and rhizaria, the jakobids and euglenoids from other members of the excavatae and the green and brown algae from the red algae. Thus F224 may have arisen independently in each of these superkingdoms, perhaps as a result of convergent selection pressure exerted in similar niches on cell size, shape, motility or mode of replication. In this context it is interesting to note that to none of the groups with F-type β-tubulins have centrioles or defined spindle poles a phenotypic aspect which could certainly be envisaged as a fulcrum for selective pressure in some ecological niches (Delattre and Felix 2009). To investigate this possibility we performed a Neighbor-Net analysis of the analyzed eukaryotic β-tubulins and the inferred network clearly segregated these into more and less divergent subsets. Interestingly although lineages were conserved on the network, several of the supergroups were clearly separated by it. Those which were more divergent were almost exclusively Y224 which may indicate that the presence of F224 in the GTP binding site acts as a restraint on β-tubulin diversification that can only be overcome by substitution. If this is the case and F224 were the ancestral state, then environmental pressure might act as a driver for convergent evolution at this residue in diverse lineages. Additional corroborating datasets from other highly evolutionarily restricted single amino acid polymorphisms or rare molecular events should help to discriminate these possibilities in the future. Similar analyses of other essential and evolutionarily conserved proteins are, of course, likely to contain single amino acid polymorphisms which may provide an important but finite pool of markers for further dissection of the tree of life. Regardless of the eventual resolution of the tree of life, the observations of very stable and lineage-specific biochemical adaption in the functional site of β-tubulin remains an important one in our understanding of the evolution of one of the defining structural proteins of eukaryotes.
Below is the link to the electronic supplementary material.
A Neighbor-Net of the considered β-tubulins does not support horizontal gene transfer as an explanation for the presence of F224. The Fig. shows that the tubulin is not generally plant-like but is quite divergent to other fungal β-tubulins and that the boletales fungi Suillus bovinus possesses a similar fungal β-tubulin but with the Y-type version of 224. (PDF 32 kb)
Neighbor-Nets restricted to the groups of “core” β-tubulins (2a) and “divergent” β-tubulins (2b) (respective LSFits of 99.86 and 99.71). In both networks, recovered kingdoms are marked by an arc which is labelled by the kingdom’s name. To help clarify, the kingdom and species names are coloured similarly. (PDF 245 kb)
Neighbor-Nets restricted to the F-type (3a) and Y-type (3b) β-tubulins respectively, give good recovery of the phylogenetic kingdoms. Recovered kingdoms are marked by an arc which is labelled by the kingdom’s name. (PDF 153 kb)
Thanks to Keith Gull for his initial input into the analyses and to Simon Topp whose MSc thesis provided context to the work. Thanks also to Richard Luduena and Tom Cavalier-Smith for helpful discussion, to Vincent Moulton, Francisco Ayala, Bill Wickstead, Enrico Coen and Clive Lloyd for critical reading of the manuscript. QW was supported by a UEA School of Computing Sciences studentship during her PhD studies from which most of her contribution was drawn.
Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
All authors contributed to the study design, data analysis and writing of the paper.