|Home | About | Journals | Submit | Contact Us | Français|
Positive selection in the α-crystallin domain (ACD) of the chloroplast small heat shock protein (CPsHSP) gene was found in a previous study and was suggested to be related to the ecological adaptation of Rhododendron species in the subgenus Hymenanthes. Consequently, it was of interest to examine whether gene duplication and subsequent divergence have occurred in other sHSP genes, for example class I cytosolic sHSP genes (CT1sHSPs) in Rhododendron in Taiwan, where many endemic species have evolved as a result of habitat differentiation.
A phylogeny of CT1sHSP amino acid sequences was built from Rhododendron, Arabidopsis thaliana, Oryza sativa, Populus trichocarpa, Vitis vinifera and other species for elucidation of the phylogenetic relationships among CT1sHSPs. Phylogenies of Rhododendron CT1sHSP nucleotide and amino acid sequences were generated for positive selection and functional divergence analysis, respectively. Positively selected sites and amino acid differences between types of Rhododendron CT1sHSPs were mapped onto the wheat sHSP16·9 protein structure. Average genetic distance (Dxy) and dN/dS ratios between types of Rhododendron CT1sHSP genes were analysed using sliding window analysis. Gene conversion was also assessed between types of Rhododendron CT1sHSPs.
Two types of Rhododendron CT1sHSP were identified. A high level of genetic similarity and diversity within and flanking the ACD, respectively, between types of Rhododendron CT1sHSP were found. Main differences between the two types of Rhododendron CT1sHSPs were: (1) increased hydrophobicity by two positively selected amino acid sites and a seven-amino-acid insertion in the N-terminal arm; and (2) increased structural flexibility and solubility by a seven-amino-acid insertion in the N-terminal arm and one positively selected amino acid site in the C-terminal extension.
Functional conservation of the ACD of Rhododendron CT1sHSP genes was inferred because of strong purifying selection. However, sequence variations flanking the ACD in Rhododendron CT1sHSP gene duplicates may have resulted in functional divergence and played important roles in chaperon function enhancement.
During a plant's life cycle, a variety of environmental factors influence development and growth. Plant cells synthesize multiple families of small heat shock proteins (sHSPs) in order to protect themselves against environmental stresses (Vierling, 1991). It is thought that sHSPs function as molecular chaperones, preventing intracellular proteins from degradation and inappropriate aggregations in an ATP-independent manner (Lee et al., 1997; Lee and Vierling, 2000; van Montfort et al., 2002; Sun et al., 2002). Plant sHSPs show unusual abundance and diversity and are unique in expressing both cytosolic and organellar sHSPs (Scharf et al., 2001; Sun et al., 2002; Siddique et al., 2008). These diverse and abundant classes of sHSPs allow plants quickly to adapt to continually changing physical conditions such as temperature, light intensity and oxidative stress (Sun et al., 2002; Sundby et al., 2005). At least seven different plant sHSP subfamilies were found in the Arabidopsis thaliana genome (Scharf et al., 2001). Recently, genomic surveys of A. thaliana, Oryza sativa and Populus trichocarpa have found one additional mitochondrial and three additional cytosolic classes of sHSP (CTsHSP) (Waters et al., 2008). Among the 11 classes of plant sHSPs, five were found to be localized in plastids, the endoplasmic reticulum, the peroxisome, and the mitochondria (MI and MII) along with six localized in the cytosol (CI–CVI) (Waters et al., 2008). A similar finding of additional classes of sHSPs was also described to be present in the cytosol by Siddique et al. (2008). Six class I cytosolic small heat shock protein (CT1sHSP) gene members found in A. thaliana constituted the largest CTsHSP subfamily (Siddique et al., 2008; Waters et al., 2008). It was estimated that class I and class II CTsHSPs in the cytosol share only approx. 50 % identity in the α-crystallin domain (ACD) and are estimated to have diverged over 400 Mya (Waters and Vierling, 1999). Organellar forms of sHSPs in plants were all derived from one nuclear-encoded CTsHSP during the evolution of land plants (Waters and Vierling, 1999).
Most new genes are thought to arise via duplication events (Ohno, 1970). Ohno's theory hypothesizes that the duplicates are mostly silenced by degenerative mutations due to a redundancy in function. Although the probability of duplicates acquiring novel and advantageous functions is lower than duplicates being silenced (Lynch and Force, 2000), the classical model of the fates of duplicate genes proposed by Ohno is coming under increasing scrutiny (Massingham et al., 2001). Preservation of duplicate genes is believed to be a common fate, particularly for genes that contain multiple regulatory regions. This process is possibly promoted at an early stage of the gene duplication process by positive selection (Lynch and Force, 2000; Moore and Purugganan, 2005), resulting in the maintenance of multiple gene copies in the genome (Ohta, 1987; Clark, 1994; Nowak et al., 1997; Wagner, 1999).
Duplicates can be preserved through the process of neofunctionalization and subfunctionalization. The ancestral gene possibly possesses more than one function, and degenerative mutations occur in each copy after duplication. This procedure may follow the duplication–degeneration–complementation (DDC) model of gene evolution (Force et al., 1999). Subsequently, subfunctionalization comes into play with the functional divergence of gene duplicates. The fitness of duplicated genes is the most important factor determining their fixation (Kondrashov and Kondrashov, 2006). Individuals from different populations carry multiple copies of duplicated genes so as to enhance their fitness in the process of adaptation to environmental heterogeneity across broad geographical scales. It was recently proposed that selective neutral duplications must be very rare, and preservation of duplicates is related to their immediate impact on fitness (Kondrashov and Kondrashov, 2006). A new model of the fate of duplicate genes was proposed by He and Zhang (2005), in which duplicate genes undergo rapid subfunctionalization and evolve new functions that are not possessed by the ancestral gene. Yang et al. (2006) further proposed that all of the models on the fate of duplicate genes are complementary to each other. Functional divergence of different alleles of the same gene may also be considered to be one form of subfunctionalization (Adams et al., 2003; Adams and Wendel, 2005).
Nuclear-encoded sHSPs exhibit a high degree of conservation in three consensus regions (Waters and Vierling, 1999). Near the carboxy-terminal end of CT1sHSPs is a consensus sHSP region I, whereas a consensus sHSP region II is located in the middle of CT1sHSP sequences after the more variable N-terminal class I region. Sequences encompassing sHSP consensus regions I and II exhibit homology to consensus regions found in other families of low-molecular-weight HSPs and the ACD (Sun et al., 2002). Selective constraints were proposed to be exerted on sHSP consensus region II of species including Funaria hygrometrica and angiosperms (Vierling, 1991; Waters, 1995). Lee et al. (1997) reported that sHSP consensus region II is highly conservative in function, and is involved in hydrophobic interactions with denatured substrates.
In Taiwan, species of Rhododendron (Ericaceae) include R. rubropunctatum, R. morii, R. pseudochrysanthum, R. hyperythrum, R. formosanum, R. ellipticum, R. kawakamii, R. ovatum, R. mariesii, R. noriakianum, R. nakaharae, R. simsii, R. kanehirai, R. breviperulatum, R. rubropilosum and R. oldhamii. Endemic Taiwanese Rhododendron constitutes two major groups classified in either the subgenus Hymenanthes or the subgenus Tsutsusi. The subgenus Tsutsusi contains the endemics R. oldhamii, R. noriakianum, R. kanehirai, R. nakaharae and R. breviperulatum. The subgenus Hymenanthes contains R. rubropunctatum, R. morii, R. pseudochrysanthum, R. hyperythrum and R. formosanum. Endemic Rhododendron species in the subgenus Hymenanthes are mainly distributed on high peaks in northern, central and southern Taiwan with similar habitats in the temperate zone, and the speciation of these taxa was related to climatic changes and population fragmentation (Chung et al., 2007). Species in the subgenus Hymenanthes show only minor morphological differences. Endemic Rhododendron species in the subgenus Tsutsusi are found in different ecological zones, from R. kanehirai limited to river banks in northern Taiwan to R. rubropilosum distributed on sunny mountain slopes. These species display different shapes, which vary from small shrubs (R. breviperulatum) to 3- to approx. 4-m-tall shrubs (R. oldhamii). Most of these Taiwanese Rhododendron species have a limited distribution range, but some are widespread. Taiwanese Rhododendron species occur from lowland areas to the subalpine regions of the Yushan massif, indicating a high degree of adaptation in the genus.
CT1sHSPs originated very early in evolutionary history of plants and selective forces may have played an important role in relation to protection against environmental stress during the course of evolution of this gene. Positive selection in the ACD of the chloroplast small heat shock protein (CPsHSP) gene has been shown in species of the subgenus Hymenanthes relative to species of the subgenus Tustsusi of Rhododendron. This selection was suggested to be related to the ecological adaptation of species in the subgenus Hymenanthes (Wu et al., 2007). Gene duplication of CT1sHSP has been found in A. thaliana, O. sativa and P. trichocarpa (Siddique et al., 2008; Waters et al., 2008). Consequently, it was of interest to examine whether gene duplication has occurred in CT1sHSPs in a plant system such as Rhododendron in Taiwan, where many endemic species have evolved due to climatic changes or habitat differentiation. If gene duplication does occur, what are the fates of gene duplicates? The aim of the present study was to understand whether the molecular evolution of CT1sHSP duplicates in Rhododendron involved: (1) positive selection as a factor accounting for the functional divergence; and (2) gene duplicates under the siege of selective constraints that restrained functional divergence or gene conversion involved in homogenizing sequence variations resulting in functional redundancy or complementary duplicates.
Rhododendron specimens used in this investigation for CT1sHSP gene cloning are listed in Table 1. Amino acid sequences of CT1sHSPs from A. thaliana, O. sativa, P. trichocarpa and Vitis vinifera acquired from the respective genome database (A. thaliana: http://www.arabidopsis.org/Blast; O. sativa: http://tigrblast.tigr.org; P. trichocarpa: http://genome.jgi-psf.org/Poptr1_1/Poptr1_1.home.html; and V. vinifera: http://www.cns.fr/cgi-bin/ggb/vitis/gbrowse/vitis/) and from other plant species acquired from the National Center for Biotechnology Information (NCBI) were used for comparison. Four CT1sHSP amino acid sequences of Funaria hygrometrica as well as class II and III CTsHSPs of A. thaliana, O. sativa and P. trichocarpa were used as outgroups. Total DNA was extracted from ground leaf powder according to a modified hexadecetyltrimethyl ammonium bromide (CTAB) procedure (Doyle and Doyle, 1987). DNA was precipitated with ethanol, and after washing with 70 % ethanol it was dissolved in 200 µL TE buffer (pH 8·0) and stored at –20 °C. The DNA concentration was determined for each sample using the GeneQuant II RNA–DNA Calculator (Amersham Biosciences, Little Chalfont, UK).
PCR amplification of partial sequences of the CT1sHSPs was performed with degenerate primers derived from the amino acid alignment (5′-TTCGACCCGTTCTGCGACGATNTGNGA-3′ and 5′-GCCTGAGATNTCDATNGCCTTTAC-3′). The concept developed in CODEHOP was used to design the degenerate primers (Rose et al., 1998) by taking advantage of the conserved amino acid sequences in the CT1sHSPs (Waters and Vierling, 1999). Amplifications were performed on a DNA Programmable Thermal Cycler (PTC-100; MJ Research, Waltham, MA, USA): initial denaturation at 94 °C for 3 min followed by 42 cycles of 1 min at 94 °C, 1 min annealing at 53 °C, 90 s at 72 °C and a subsequent final extension for 10 min at 72 °C. The PCR mixture (50 µL) contained 50 mm KCl, 1·5 mm MgCl2, 0·001 % gelatin, 10 mm Tris-HCl (pH 8·3), 100 µm dNTPs, 0·2 µm primer, 20 ng template DNA, 1 µg RNase and 0·5 U Taq polymerase (Amersham Pharmacia Biotech, Taipei, Taiwan). PCR products were cloned with a yT&A cloning kit (Yeastern Biotech, Taipei, Taiwan) following the manufacturer's protocol. Plasmids containing the PCR product were further screened using colony PCR and purified with a QiaGen kit (Qiagen). Subsequently, three plasmid clones were sequenced in both directions using a Taq Dye Dideoxy Terminator Cycle Sequencing Kit (Applied Biosystems) and a model ABI373A automated sequencer (Applied Biosystems). For sequencing, the M13 forward and reverse primers were used for amplification. All sequence polymorphisms were visually rechecked from the chromatograms.
All sequences were deposited in the NCBI nucleotide sequence database under the following accession numbers: FM172919–FM172932 (type 2 CT1sHSPs) and FM172933–FM172948 (type 1 CT1sHSPs).
Consensus sequences were generated by aligning three cloned CT1sHSP partial sequences for each species of Rhododendron using the program CLUSTAL X (Thompson et al., 1997). Consensus sequences were subsequently translated into amino acid sequences with the aid of the BCM search launcher in six open reading frames, with the +1 frame resulting in amino acid sequences with no intron or stop codon. The deduced amino acid sequences were used in a BLASTP search, which resulted in high similarities to CT1sHSP gene sequences with very low E values. The deduced amino acid sequences of Rhododendron and those of CT1sHSP sequences acquired from specific databases and GenBank were aligned for the phylogenetic analysis. The aligned amino acids in Rhododendron were then used for the manual alignment of DNA sequences for CODEML analysis using PAML version 4.0 (Yang, 2007). DNA sequence divergence and amino acid sequence divergence were estimated based on Kimura's two-parameter (K2P) model for DNA (Kimura, 1980) and the Jones, Taylor and Thornton (JTT) distance matrix for amino acids (Jones et al., 1992) using MEGA version 4.0 (Tamura et al., 2007). Sliding window analysis for average genetic distance (Dxy) between each type of CT1sHSP gene was calculated using DnaSP version 4.5 (Rozas et al., 2003). Sliding window analysis of dN/dS ratios was done using K-Estimator version 6.1 (Comeron, 1999).
A neighbour-joining (NJ) tree was generated based on the aligned amino acids of CT1sHSP sequences from Rhododendron and amino acid sequences of A. thaliana, O. sativa, P. trichocarpa and V. vinifera obtained from the respective genome database together with amino acid sequences of other species acquired from GenBank. The aligned amino acid sequences are available online as Supplementary Material. The NJ tree was generated according to the amino acid JTT matrix (Jones et al., 1992) and bootstrap values for the gene tree were produced using the ‘bootstrap’ function of MEGA based on 1000 replications (Tamura et al., 2007). For CODEML analysis aligned DNA sequences based on the aligned amino acids of Rhododendron CT1sHSP were first generated. Hierarchical likelihood ratio tests, implemented in ModelTest version 3.7 (Posada and Crandall, 1998, 2001) identified the TrN + I + G plus a gamma-shaped (G, a gamma distribution shape parameter α of 0·5949) model as the best fitting model. These parameters were used to generate an NJ tree based on nucleotide sequences of Rhododendron CT1sHSPs using MEGA for CODEML analysis (Fig. 1A).
The NJ tree of CT1sHSP gene nucleotide sequences in Rhododendron was subjected to analysis by the CODEML program within the PAML package (Yang, 2007). To assess if the evolutionary rates or non-synonymous/synonymous substitution rate ratios (ω = dN/dS) of the CT1sHSPs differed between duplicated CT1sHSP gene sequences of Rhododendron, a likelihood ratio test (LRT) was applied (Yang and Nielsen, 1998). To test the hypothesis of equal evolutionary rates between the two types of CT1sHSPs at the nucleotide level, one-ratio and two-ratio models were compared using the type 2 CT1sHSP lineage as the background branch against the type 1 CT1sHSP lineage. Clade model D (Bielawski and Yang, 2004) was also applied to the CT1sHSP data in order to understand what proportion of the codon sites in foreground branch (type 1 CT1sHSP) evolved with ω > 1 and its ω value. The LRT was used to evaluate whether clade model D (k = 3) has a significantly better fit for the codon site with ω > 1 in comparison with site model M3 (k = 2 and k = 3, respectively) and clade model D (k = 2).
Molecular adaptive evolution at individual sites along the CT1sHSP sequences was investigated using the site models of the CODEML program implemented in PAML (Yang, 2007). A range of codon models were evaluated for measuring selection and nested models were compared via the LRT statistic (Yang and Nielsen, 1998), 2δL, against the χ2 distribution with degrees of freedom equal to the difference in the number of parameters. Site models that allow for heterogeneous non-synonymous/synonymous substitution rate ratios (ω = dN/dS) among sites [models M0, M1a, M2a, M3 (k = 2), M3 (k = 3), M7 and M8] (Yang et al., 2000; Swanson et al., 2003) were used to test for diversifying selection at individual sites. The LRT was used to evaluate whether M2a, M3 (k = 3) and M8 had a significantly better fit for the codon site with ω > 1 in comparison with M1a, M0 and M7, respectively. We accepted the presence of positive selection when a significantly better fit of the selective model was found and the posterior probabilities according to either naïve empirical Bayes (NEB) or Bayes empirical Bayes (BEB) analysis for the sites of interest exceeded the 90 % level (Yang et al., 2005).
Positions of the sites under positive selection which were detected by site model (models M2a, M3 and M8) and the amino acid differences between two types of CT1sHSP including the insertion/deletion were mapped onto the tertiary structure of wheat sHSP16·9 (van Montfort et al., 2001). The tertiary structure of wheat HSP16·9 in ASN.1 format was downloaded from the ENTREZ STRUCTURES database at NCBI and corresponding amino acids were illustrated with Cn3D version 4.1 (Hogue, 1997).
Analysis for possible gene conversion events was carried out using the GeneConv algorithm (Sawyer, 1989) implemented in the program RDP2 (Martin et al., 2005). Default settings were used for the gene conversion analysis.
A maximum-likelihood (ML) analysis (Gu2001) was performed with DIVERGE version 2.0 software (Gu and Velden, 2002), using the phylogenetic tree generated from amino acid alignment of Rhododendron CT1sHSPs with addition of four outgroup sequences from F. hygrometrica (Fig. 1B). DIVERGE calculates a theta ML value that indicates the level of functional divergence between proteins in different clusters of the tree, and a posterior probability to trace the amino acid positions that are likely to be responsible for the functional divergence between proteins in both clusters (types 1 and 2 of CT1sHSPs).
To gain insight into the evolutionary relationships of CT1sHSPs, an NJ tree was constructed using amino acid sequences conceptually translated from the cloned nucleotide sequences of CT1sHSP genes from Rhododendron with addition of amino acid sequences of CT1sHSPs from a number of monocots and dicots, and sequences from F. hygrometrica acquired from specific genome databases or from GenBank (Fig. 2). Class II and III CTsHSPs from A. thaliana, O. sativa and P. trichocarpa were also included in the analysis. Two major groups of CT1sHSPs were found. The constructed gene tree clearly indicates that two types of CT1sHSPs can be found in Rhododendron species, corresponding to the two major groups of CT1sHSP observed and together with CT1sHSPs of other species grouped separately from other classes of CTsHSPs. However, no type 2 CT1sHSP genes from R. formosanum or R. noriakianum were obtained. Although bootstrap support within each type of Rhododendron CT1sHSPs was low, the separation of two distinct clusters is clear. Moreover, bootstrap values for the two types of Rhododendron CT1sHSPs separate from other CT1sHSPs were relatively high (74 for type 2 CT1sHSPs and 98 for type 1 CT1sHSPs, Fig. 2). Two types of Rhododendron CT1sHSPs were clustered with a portion of the A. thaliana, Helianthus annuus, Nicotiana tabacum, V. vinifera and P. trichocarpa CT1sHSPs in two separate groups, indicating that these two types of Rhododendron CT1sHSPs are indeed different (Fig. 2). Moreover, monocot CT1sHSPs were only found in one of the two groups of CT1sHSPs. It is most probable that the two types of Rhododendron CT1sHSP genes are paralogues as the amino acid sequences of each type of CT1sHSP from different Rhododendron species clustered together (Fig. 2). The topology of the gene tree indicated that gene duplication occurred before speciation of these Rhododendron species and they may have been derived from the same most recent common ancestor. Moreover, the duplication event of the CT1sHSP gene resulting in two major groups of CT1sHSPs occurred not only in Rhododendron but also in A. thaliana, H. annuus, N. tabacum, V. vinifera and P. trichocarpa. However, only one group of CT1sHSP was found for the species in grass members. Therefore, duplication into two major groups of CT1sHSP genes may have occurred after eudicot speciation (Fig. 2). Nevertheless, CT1sHSP gene duplication within each group is also found for both the eudicot and grass families.
The most notable difference in the two types of CT1sHSPs in Rhododendron was a 21-bp insertion (corresponding to seven amino acids) when comparing the type 1 CT1sHSP gene to type 2 CT1sHSP DNA sequences (Fig. 3). The insertion was located between the α2 and α3 helices. Insertion of a stretch of amino acids is also observed in group I CT1sHSPs in many other species (Supplementary Data, available online). There were also many differences in amino acids found throughout the aligned sequences of these two types of CT1sHSPs (Fig. 3). The sequences obtained for the type 1 Rhododendron CT1sHSP gene consists of 432 nucleotides and codes for 144 amino acids. The type 2 Rhododendron CT1sHSP gene consists of 411 nucleotides and codes for 137 amino acids. Levels of sequence divergence of nucleotides and amino acids were compared within and between the two types of Rhododendron CT1sHSPs (Table 2). The two types of Rhododendron CT1sHSPs differed considerably at both the nucleotide (26·7 %) and amino acid (24·2 %) levels (Table 2). Rather small differences were observed at both the nucleotide and the amino acid levels within each type of Rhododendron CT1sHSP, with a greater variation found among type 2 Rhododendron CT1sHSPs (Table 2). A sliding window comparison between the two types of Rhododendron CT1sHSPs showed greater levels of divergence in average genetic distance (Dxy) and ω in the N-terminal region flanking the ACD (Fig. 4).
We tested for a difference in the evolutionary rate (or non-synonymous/synonymous substitution rate ratios) with the one- and two-ratio models using the CODEML analysis implemented in PAML, and the results showed no significant difference in the evolutionary rate when comparing the two- vs. one-ratio model with the LRT (2δL = 2·7147, P = 0·0994) (Table 3). However, type 1 Rhododendron CT1sHSPs showed a mean ω value (0·0883) greater than the value observed for type 2 Rhododendron CT1sHSPs (0·0515), in line with the general increase of non-synonymous substitution rates observed after gene duplication events (Bielawski and Yang, 2003; Lynch and Conery, 2000).
Gene conversion is any process that results in a segment of DNA being copied onto another segment of DNA (Sawyer, 1989). It is thought that gene conversion is an important force in evolution (Hilliker et al., 1994). Using the GeneConv algorithm implemented in the RPD2 software, no gene conversion events were detected in the 30 aligned sequences of the two types of Rhododendron CT1sHSP genes.
A higher non-synonymous/synonymous substitution rate was found for type 1 Rhododendron CT1sHSPs, and hence it is interesting to understand how much of the proportion of codon sites evolved with ω > 1 and its value. Clade model D found 2·4 % of codon sites in type 1 of Rhododendron CT1sHSP had ω > 1 with a mean ω value of 2·73521 (Table 3). It is therefore interesting to understand which codon evolved with ω > 1. Subsequently, three site model comparisons [M1a vs. M2a, M0 vs. M3 (k = 3) and M7 vs. M8] were used to test positive selection of the two types of CT1sHSPs in Rhododendron, which three LRTs to be constructed. Four positively selected sites were detected by the M3 (k = 3) and M8 models when compared with the M0 and M7 models based on NEB posterior probability, three which were significant. Model M8 detected one significant positively selected site (site 9) based on BEB posterior probability with significant LRT in comparison with the M7 model. Site model M2a also detected site 9 as a positively selected site (ω = 4·2115) based on BEB, although the LRT was not significant. Therefore, at least one site (site 9) was consistently detected as a significant positively selected site based on BEB. Nevertheless, all models that allowed detection of positive selection revealed that only a very minor proportion of amino acid sites evolved under positive selection and more than 90 % of sites were under strong purifying selection (Table 3). Interestingly, the significant positively selected sites detected with site models M2a, M3 and M8 were all located in the sequences flanking the ACD.
Wheat HSP16·9 was used as template for positioning the sites under positive selection and the site differences between the two types of Rhododendron CT1sHSPs (Fig. 5). Amino acid residues detected under positive selection in Rhododendron CT1sHSPs corresponded to residues Ala18 and Phe21 in the N-terminal arm and Gln148 in the C-terminal extension of the wheat HSP16·9. No intramolecular interaction of these sites with other residues was found according to van Montfort et al. (2001). Some amino acid residues that distinguished two types of Rhododendron CT1sHSPs are located at or surrounding the α2 helix in the N-terminal arm and some are in the ACD. The seven-amino-acid indel that distinguished the two types of Rhododendron CT1sHSPs were localized between the α2 and α3 helices. The variable region surrounding the N-terminal arm indicated that this region is likely to have higher structural variability.
The Gu2001 LRT in the DIVERGE algorithm was used to examine whether proteins from different clusters in the phylogenetic tree have different functional constraints, and whether they have functionally diverged. Low posterior probabilities of functional divergence across aligned amino acid sequences were displayed (data not shown) and the level of functional divergence between proteins was relatively low (theta ML value = 0·202944, P = 0·5382) according to Wang and Gu (2001), when type 1 and type 2 Rhododendron CT1sHSPs were compared. This result suggests that these two types of Rhododendron CT1sHSP are likely to have similar and complementary biochemical functions.
We have identified a duplication event of the CT1sHSP gene within Rhododendron. CT1sHSPs belonging to the grass family found only in one of the two groups suggests that the duplication of CT1sHSP genes occurred independently in the eudicot and grass families (Fig. 2). However, CT1sHSP gene duplication is probably a common phenomenon in both the eudicot and grass families because more than one cluster of CT1sHSPs were found in members of these two families (Fig. 2). Functional divergence occurring after gene duplication is a common evolutionary event at the molecular level (Zhang et al., 1998; Long, 2001; Yang et al., 2002). A large family of sHSPs is encoded by multigene families and each gene family exhibits a variety of diverse functions (Waters and Vierling, 1999). Different clusters of CT1sHSP found in species such as A. thaliana, O. sativa, P. trichocarpa and V. vinifera indicate frequent events of gene duplication during the evolutionary history of the CT1sHSP gene. Duplicate copies of CT1sHSPs in Rhododendron are highly divergent at both the nucleotide (26·7 %) and amino acid (24·2 %) levels, and a phylogenetic analysis placed one copy from each species in separate clusters. This evidence is consistent with an ancient duplication in the ancestral lineage resulting in two paralogues. Both copies are probably functional because highly similar matches with very low E values can be found in GenBank through a BLASTP search. Moreover, both copies contain no stop codon, and deletion of a stretch of 21 bp in the type 2 CT1sHSP did not alter the reading frame.
Neutral mutation plays a major role in the evolution of new gene functions due to relaxation of selective constraints and environmental changes (Kimura, 1983). In contrast, positive Darwinian selection accelerates the fixation of advantageous mutations, and as a consequence functional divergence occurs (Yang and Bielawski, 2000). By comparing the rates of non-synonymous and synonymous substitutions, adaptive molecular evolution driven by positive selection can be suggested; however, most amino acids in a functional protein are under structural and functional constraints, and adaptive evolution probably affects only a few sites in an episodic fashion (Yang et al., 2000). Positive selection can be tested at the individual codon site with the CODEML program. Estimating the ratio of dN/dS in a single parameter model (M0) revealed a low overall ω ratio of 0·085 (Table 3). This is indicative of fairly strong purifying selection across sites. Six comparisons were performed for positive selection, M1a vs. M2a, M0 vs. M3 (k = 3) and M7 vs. M8, and comparisons between model D (k =2) vs. model D (k = 3), model D (k = 3) vs. M3 (k =2) and model D (k = 3) vs. M3 (k = 3), in which the M1a, M0, M3 (k = 2), M7 and model D (k =2) are the null models. The models that allow for detection of positive selection found that only a very minor proportion of amino acid sites are under positive selection (Table 3), and only site 9 was consistently detected as significantly positively selected based on BEB, and provide evidence for strong purifying selection of the ACD in CT1sHSP sequences. Sites 12 and 141 were detected as significantly positively selected by the M3 (k = 3) and M8 models based on NEB. If all alternative models significantly differ from the null model and different comparisons find the same sites under positive selection, then there is strong support that positive selection has acted on those sites (Yang et al., 2000). The inability to detect positively selected sites with significant LRT may have been related to the influence of strong purifying selection throughout the ACD. Overall, the result suggests positive selection in the sequences flanking the ACD but strong purifying selection in the ACD, which might have played a role in maintenance of the function of the ACD in Rhododendron CT1sHSPs.
The reason for the failure to find a signal for positive selection in the ACD may possibly be caused by the force of purifying selection in the ACD. In each of the pairwise comparisons performed on CT1sHSP nucleotide sequences, the dS value was greater than the dN value, suggesting that the evolution of CT1sHSP has been dominated by purifying selection (data not shown). It is known that the ACD displays the highest degree of conservation within the sHSP family (de Jong et al., 1993). Both types of Rhododendron CT1sHSP displayed low nucleotide and amino acid variations but larger diversity existed between the two types of CT1sHSP (Table 2). Most of the variations found between the two types of Rhododendron CT1sHSP were contributed by non-synonymous substitutions. In the ACD, dN/dS ratios were relatively low compared with the flanking N-terminal region, which suggests that strong purifying selection acted on sequences of the ACD (Fig. 4B). These results, however, are not surprising, as pairwise comparisons might not detect positive selection if the selective pressure is heterogeneous across the protein (Bielawski and Yang, 2003), or when high rates of substitutions at synonymous sites lead to an underestimation of the value of dN. Strong purifying selection in the ACD can also be supported by the result of the Gu2001 analysis, which is indicative of similar biochemical functions of the two types of CT1sHSP in Rhododendron.
In multigene families, gene conversion plays an important role as it causes the exchange of genetic material between related sequences (Schimenti, 1994; Posada et al., 2002). Gene conversion is an evolutionary change that can act to homogenize genes through concerted evolution or to introduce novelty among homologous genes. No case here showed more similar relationships among paralogous genes than orthologous genes in any species, suggesting that no recent gene conversion has occurred, and this was also suggested by a statistical analysis with the GeneConv algorithm. Failure to detect gene conversion was also reported for different sHSP genes in A. thaliana and P. trichocarpa (Waters et al., 2008). However, gene conversion among sHSP genes of rice has been reported (Waters et al., 2008). Detection of no gene conversions between the ACD of the two types of Rhododendron CT1sHSPs suggests that it is not a factor accounting for the conservation of the ACD in Rhododendron CT1sHSPs. Therefore, gene conversion cannot explain the great similarities observed between the ACD of different types of Rhododendron CT1sHSP sequences, which may have resulted from purifying selection. No evidence for recent positive selection and abundant evidence of purifying selection among the CT1sHSPs were reported (Waters et al., 2008). Consequently, maintenance of functional similarity on the ACD of different CT1sHSP paralogues in A. thaliana, P. trichocarpa, V. vinifera and Rhododendron could have been manifested via strong purifying selection.
The DIVERGE algorithm suggested no functional divergence between the two types of Rhododendron CT1sHSPs despite the obvious amino acid differences flanking the ACD region. Gene duplication is thought to be a major contributor to the evolution of the molecular network (Korbel et al., 2008). The newly generated paralogues may escape from the selective constraints in order to acquire a new function or complementary to the function of the counterpart (Force et al., 1999). The plant CT1sHSP (wheat HSP16·9) consists of two discs, each comprising six α-crystallin domains organized in a trimer of dimers, folded as a dodecamer (van Montfort et al., 2001). There are four substrate binding sites found in the wheat HSP16·9 dimers: the C-terminal binding groove and N-terminal arm interaction site on opposite sides of the α-crystallin β-sandwich, the hydrophobic N-terminal arm and the IXI/V motif in the C-terminal extension (van Montfort et al., 2001).
There are only three codons of Rhododendron CT1sHSPs detected as positively selected sites; however, numerous codon differences among the two types of Rhododendron CT1sHSPs were observed in the alignment, indicating relaxation of selective constraints in the regions flanking the ACD. This indicates that neofunctionalization or subfunctionalization may have occurred in the duplicated CT1sHSPs of Rhododendron. Protection of substrate involving multiple sites in the sHSP and the N-terminal arm is important for both substrate specificity and chaperon activity (Basha et al., 2006). Moreover, protection efficiency is related to the level of hydrophobicity in the N-terminal arm (Basha et al., 2006). The changes from Trp or Ala to Cys (site 9) and Cys to Phe (site 12) all demonstrated an increase in the level of hydrophobicity. However, no change in the level of hydrophobicity for Gln or Glu to Asp (site 141, located in the IXI/V motif) was observed. This result indicated the importance of an increase in the level of hydrophobicity in the N-terminal arm for CT1sHSP in Rhododendron and is consistent with that reported by Basha et al. (2006). In contrast to the high sequence diversity of the N-terminal arms of the sHSP, the C-terminal extensions play an important role in stabilization of quaternary structure and are typically a short, polar segment that is unstructured and highly flexible and protrudes from the oligomeric structure (van Montfort et al., 2001; Morris et al., 2008). Both the polarity and the flexibility of the C-terminal extensions are important for the maintenance of sHsp solubility and for complexation with its target protein (Morris et al., 2008). Glu in the C-terminal extension has been reported to be important for the structural and functional integrity of sHSP (Morris et al., 2008). Changes from Glu or Gln to Asp did not alter the level of hydrophobicity, but the smaller Asp rotates easily and adds flexibility to the protein chain (Huang and Nau, 2003). The change of residue 141 from Glu or Gln to Asp in some sequences of both types of Rhododendron CT1sHSPs under positive selection suggests increased flexibility and solubility of CT1sHSP and subsequently its enhancement of chaperon activity. These amino acid changes in the N-terminal arm and C-terminal extension are indicative of general enhancement of chaperon activity in both types of Rhododendron CT1sHSPs. In contrast to the positive selection in the sequences flanking the ACD, the ACD was negatively selected. Functionally important protein-coding regions of the duplication genes tend to be conserved and are presumed to be under purifying selection to reduce genetic costs (Redon et al., 2006; Korbel et al., 2008).
The seven-amino-acid insertion between the α2 and α3 helices is specific to the type 1 Rhododendron CT1sHSPs and a similar amino acid insertion is also commonly observed among group I CT1sHSPs in other plant species (Supplementary Material). Moreover, comparing the pea HSP18·1 with its closely related dodecameric wheat HSP16·9 found a six-amino-acid insertion in the N-terminal arm and this insertion provided better substrate protection efficiency due to increased hydrophobicity and structural flexibility (Basha et al., 2006). Therefore, it is likely that the seven-amino-acid insertion in the N-terminal arm of the type 1 CT1sHSPs in Rhododendron could reflect structural differences in the efficiency of substrate binding of the dimer. Studies have shown that protein structure affects substrate-binding efficiency, and even alters the specificities of substrates and the biochemical function of the protein (Owen et al., 2002; Baden et al., 2008). The two types of Rhododendron CT1sHSPs are most likely to differ in the chaperon activities due to differences in the substrate-binding efficiencies resulting from structural alterations and increased hydrophobicity contributed by amino acids such as Val and Ala in the unique insertion of the type 1 Rhododendron CT1sHSP. The seven-amino-acid insertion is probably paving the foundation for simultaneous maintenance of two types of CT1sHSPs in Rhododendron and other eudicots due to changes in substrate binding as well as substrate protection efficiency. In general, the divergence of the two types of Rhododendron CT1sHSPs is mostly affected in their protein folding in order to gain more efficiency in chaperon activity and change the substrate specificity. The identification of these protein paralogues is important because they may represent newly arisen proteins that are evolving new perspectives in function (Waters et al., 2008).
It is thought that an increase in the non-synonymous substitution rate immediately after gene duplication is a common consequence of gene duplication. The increase in the non-synonymous substitution rate is probably due to relaxation of selective pressures or the action of positive selection (Mazet and Shimeld, 2002). Gene duplication is thought to permit one copy to maintain its original function, and the other copy is then free to accumulate mutations that provide raw material for functional innovations (Ohno, 1970; Henikoff et al., 1997). It is likely that over time all but one gene copy will be silenced by random mutations. In this study, two types of CT1sHSP displayed greater levels of genetic diversity in the N-terminal region flanking the ACD. The variation of the amino acid sequences between the two types of class I CTsHSPs in the N-terminal arm is likely to influence chaperon activity by varying the efficiency of the substrate binding and/or substrate specificity. The amino acid changes in the C-terminal extension would alter the protein structure affecting protein solubility. The structural variation of the duplicate genes by the amino acid changes flanking the ACD is directed by positive selection. However, the level of similarity within the ACD was high between the two types of CT1sHSP, which might have been caused by the high selective constraints that resulted in strong purifying selection acting on this region of CT1sHSPs. Several lines of evidence support the inference of purifying selection of the ACD during evolution after duplication. The two types of CT1sHSP should have existed long enough for the accumulation of mutations in the ACD that would have impaired or revolutionized the function of the second copy of the CT1sHSP in Rhododendron. Therefore, tremendous pressure of purifying selection must have been exerted on the ACD region of the two types of Rhododendron CT1sHSP genes.
We dedicate this work to the memory of Dr Yu-Pin Cheng who passed away in a car accident during a field collection trip. We are grateful to one anonymous reviewer and Dr J. S. (Pat) Heslop-Harrison for helpful comments on an earlier version of the manuscript. This study was financially supported by the National Science Council (grant number: NSC97-2313-B-003-002-MY3), Executive Yuan, Taiwan.