|Home | About | Journals | Submit | Contact Us | Français|
Gene duplication provides raw genetic materials for evolutionary novelty and adaptation. The evolutionary fate of duplicated transcription factor genes is less studied although transcription factor gene plays important roles in many biological processes. TFIIAγ is a small subunit of TFIIA that is one of general transcription factors required by RNA polymerase II. Previous studies identified two TFIIAγ-like genes in rice genome and found that these genes either conferred resistance to rice bacterial blight or could be induced by pathogen invasion, raising the question as to their functional divergence and evolutionary fates after gene duplication.
We reconstructed the evolutionary history of the TFIIAγ genes from main lineages of angiosperms and demonstrated that two TFIIAγ genes (TFIIAγ1 and TFIIAγ5) arose from a whole genome duplication that happened in the common ancestor of grasses. Likelihood-based analyses with branch, codon, and branch-site models showed no evidence of positive selection but a signature of relaxed selective constraint after the TFIIAγ duplication. In particular, we found that the nonsynonymous/synonymous rate ratio (ω = dN/dS) of the TFIIAγ1 sequences was two times higher than that of TFIIAγ5 sequences, indicating highly asymmetric rates of protein evolution in rice tribe and its relatives, with an accelerated rate of TFIIAγ1 gene. Our expression data and EST database search further indicated that after whole genome duplication, the expression of TFIIAγ1 gene was significantly reduced while TFIIAγ5 remained constitutively expressed and maintained the ancestral role as a subunit of the TFIIA complex.
The evolutionary fate of TFIIAγ duplicates is not consistent with the neofunctionalization model that predicts that one of the duplicated genes acquires a new function because of positive Darwinian selection. Instead, we suggest that subfunctionalization might be involved in TFIIAγ evolution in grasses. The fact that both TFIIAγ1 and TFIIAγ5 genes were effectively involved in response to biotic or abiotic factors might be explained by either Dykhuizen-Hartl effect or buffering hypothesis.
Transcription factors are large families in the genome of most eukaryotic organism and often act as switches between discrete developmental programs  and play important roles in many biological processes in plants, such as developmental regulation, control of metabolic pathways, response to environment stimuli and harmful stress [2,3]. Unlike regulatory transcription factors, general transcription factors are conserved proteins that are used by organisms as diverse as human, rat, Drosophila, and yeast to initiate mRNA synthesis . TFIIA is one of general eukaryotic transcription factors required by RNA polymerase II and has been demonstrated to stimulate transcription by stabilizing TBP binding to the TATA box and by regulating TBP or TFIID dimerization to accelerate DNA binding [4,5]. All three polypeptides in TFIIA including the small subunits (TFIIAγ) showed high sequence and structural conservation across different organisms, highlighting their significance in eukaryotic transcription [6,7]. Recent studies showed that there were two TFIIAγ-like genes in rice genome, in contrast to Arabidopsis where only one copy was found . Sequence comparison indicated that two rice TFIIAγ-like genes had 85.5% identity at the amino acid level and shared high degrees of nucleotide and amino acid sequence similarity with the Arabidopsis TFIIAγ-like gene [7,8]. Interestingly, a mutant (V39E substitution) in the copy on rice chromosome 5 (xa5) was confirmed to confer resistance to rice bacterial blight [7,8] and the other copy on chromosome 1 (TFIIAγ1) was found to be highly expressed when induced by pathogen invasion .
Gene duplication is widely recognized as a major evolutionary force shaping genome evolution, and provides raw genetic materials for evolutionary novelty and adaptation [10,11]. Duplication of transcription factor genes has been recently investigated, but almost all studies focused on regulatory transcription factors (e.g., [12-16]) and little is known about the evolution of basic transcription factor duplicates. The duplication and divergence of TFIIAγ gene in rice and their resistance reactions to rice bacterial blight raise a few of interesting questions. First, whether the new function of disease resistance is facilitated by the redundancy of TFIIAγ gene, as suggested by previous study ? Evidence showed that gene duplication might contribute to the ability of plants to obtain a defense response against disease and herbivory through the functional diversification of genes but empirical study is still scarce in plants [17,18]. Second, when the TFIIAγ duplication happened in history and what model fits the fate of the duplicated genes. The classic models of gene duplication predict that one of the duplicated genes is either lost by accumulation of deleterious mutations (pseudogenization or nonfunctionalization) [19,20] or acquires a new function because of positive Darwinian selection (neofunctionalization) [11,21]. Additional possible fates of the duplicated genes were also proposed, including maintenance of the ancestral function by both copies (redundancy) and subdivision of the ancestral function between copies (subfunctionalization and subneofunctionalization) [21-25]. Jiang et al. (2006) suggested that duplication of the TFIIAγ gene in rice gave rise to a new function for disease resistance during evolution. This hypothesis, however, remains to be justified by empirical molecular data. Molecular evolutionary analyses have been successfully used to test the alternative explanations for the retention and evolution of the duplicated genes (e.g., [14,16,26-29]). To reconstruct the phylogenetic relationships of TFIIAγ genes will help better elucidate the duplication history of two TFIIAγ and further reveal their evolutionary fates after the duplication.
Finally, we ask what role of selection plays on the evolution of duplicated TFIIAγ genes? Is there any change in the strength and mode of selection that have acted on the duplicate genes? What is the relative importance of relaxation of purifying selection and positive selection in the evolution of TFIIAγ genes? Previous studies often treated relaxation of purifying selection as the null hypothesis but positive selection after gene duplication has been well demonstrated (e.g., [28,30,31]). A few of current statistical methods provide effective ways to evaluate the role of positive selection following gene duplication and allow more specific cases can be addressed [28,32,33].
In the present study, we investigate the molecular evolution of the general transcription factorTFIIAγ in grasses, including a dense sampling of species of the rice tribe (Oryzeae). Based on the TFIIAγ gene phylogeny, we found that the duplication event giving rise to TFIIAγ1 and TFIIAγ5 happened in the common ancestor of extant grasses. Our molecular evolutionary analyses and likelihood ratio tests revealed the relaxation of selective constraint on TFIIAγ genes following gene duplication and an acceleration of TFIIAγ1 gene evolution. In conjunction with expression data, we demonstrated that both TFIIAγ genes following the duplication were functional and under strong selection constraint in Oryzeae and its relatives, providing no evidence that either gene evolved new functions or became a pseudogene despite their long-term coexistence for at least 50 MYA. Instead, the evolutionary fates of two TFIIAγ genes could be explained either by the Dykhuizen-Hartl effect  which predicts that one of duplicate genes evolves under relaxed purifying selection and later convey a selective advantage under particular environments, or by the buffering hypothesis which suggests that selection for a buffering effect is a mechanism for duplicate gene preservation after whole genome duplication.
The rice tribe (Oryzeae) includes approximately 12 genera and more than 70 species distributed across the tropical and temperate regions of the world [35,36]. In this study, we sampled 13 diploid species that represent the main lineages of Oryzeae, including six Oryza species, two Leersia species, and one each of other five genera in the tribe (Figure (Figure1;1; Additional file 1). One species in the tribe Ehrhartoideae that is sister to Oryzeae, Ehrharta erecta, was used as an outgroup [35,37]. To infer the duplication event of the two TFIIAγ genes, we selected additional 12 monocots and 24 dicots to generate the phylogenetic tree of the TFIIAγ genes. In total, 30 sequences were isolated here and the remaining sequences were extracted from GenBank by BLAST searches . Detailed information of the species and the sequences and their GenBank accession numbers is listed in additional file 1.
On the basis of the TFIIAγ-like sequences from rice, wheat and maize [Additional file 1], we designed two pairs of universal PCR primers to amplify the TFIIAγ genes. They are the forward primers P1 (5'-TTCgAgCTSTACMggMggTC-3') or P3 (5'-ATggCCACCTTCgAgCTSTA-3') and reverse primers P2 (5'-AggCCACRATCTTCACCTTg-3') or P4 (5'-TCRCAggCCACRATCTTCAC-3'). The regions amplified and the locations of the PCR and internal primers (P7 and P8) are shown in additional file 2. Genomic DNA was extracted from fresh young leaves or silica-gel dried leaves using the CTAB methods as described in . PCR amplification was performed in a volume of 25 μl reaction using exTaq polymerase (TaKaRa, Dalian, China). The cycling procedure was 35 cycles of denaturation at 94°C for 45 s, annealing at 56°C for 45 s and extension at 68°C for 8 min with 2 min of pre-denaturation and 10 min of final extension. PCR product was run on 1.2% agarose gel and all bands were excised under UV light, purified using Dinggou gel purification kit (Dingguo, Beijing, China), and sequenced using ET Terminator Kit (Amersham Pharmacia Biotech). All the PCR products were cloned into pGEM T-easy vectors (Promega, Madison, WI, USA) and at least 6 independent clones were sequenced. The purified fragments were also sequenced directly to make confirmation. If more than one copy was isolated in one species, we first construct a phylogeny including all the copies. If multiple copies from the same species clustered together, one copy was randomly selected in further analysis.
Reverse transcriptase-polymerase chain reaction (RT-PCR) was performed to investigate whether there is difference of expression between two TFIIAγ genes. Total RNA was extracted from fresh leaves of eight species and young panicles of three species of Oryzeae [Additional file 1] using Plant RNA Reagent (Invitrogen, Carlsbad, California, USA). The first strand cDNA was reverse-transcribed with oligo dT20 primer. Subsequent detection was performed by PCR using up-stream primer P3 and low-stream specific primers P7 (5'-AYARWAACCTTgCTCTTgACTTgg-3') and P8 (5'-gACNNTAACCTTgCTCTTCACCTSA-3') (P7 for TFIIAγ5 copy and P8 for TFIIAγ1 copy). The actin gene was taken as control using primers ACT-59F (5'-AggCTggTTTCgCTggggATgATg-3') and ACTIN-764R (5'-ggACCTCggggCACCTgAACCTCT-3') . The PCR procedure was 2 min of pre-denaturation at 94°C, 35 cycles of denaturation at 94°C for 30 sec, annealing at 54°C for 30 s, and extension at 72°C for 1 min, with a final extension of 10 min at 72°C. RT-PCR products were confirmed by sequencing.
In EST database search, all the hits of Poaceae species with e-value lower than 1e-10 were collected. The sequences retrieved were aligned with rice TFIIAγ1 and TFIIAγ5 genes. By a neighbor-join phylogeny construction, all the sequences can be divided into two classes, corresponding to the TFIIAγ1 and TFIIAγ5 clades, respectively. We used the number of hits as an indicator of the expression level of the two copies, because a highly expressed gene would have greater chance to be picked from cDNA library than a lowly expressed gene [39,40].
Sequences were aligned using a combination of methods implemented in BioEdit  and ClustalX 1.81 , with further manual refinements. The unalignable intron regions were excluded from the analyses. The GC content of all three codon positions and pairwise synonymous and nonsynonymous distances were calculated by MEGA3.1 . Codon usage bias of the sequences was estimated by ENC (effective number of codons) that varies between 20 and 61, with the lower the value, the more biased codon usage . We used Tajima's relative rate test , as implemented in MEGA3.1, to test for rate variation between two TFIIAγ genes using Ananas comosus as outgroup. To visualize conservation and check the rate variation along the TFIIAγ sequences, a sliding window analysis was performed by the K-estimator program . Given relatively small length of TFIIAγ genes, we used a window size of 10 amino acid (30 bp) and a step size of 3 amino acid (9 bp) in the sliding window analysis. Poaceae species were used in the sliding analysis. To avoid sampling bias, only Oryza punctata was used to represent the Oryzeae species.
Phylogenetic analyses were performed using maximum likelihood (ML) method, implemented in PAUP 4.0b10 , and Bayesian inference (BI) with MrBayes v.3.12 . For ML, heuristic searches were run with random taxon addition, tree bisection reconnection swap for 100 replications. The reliability of branches was evaluated by 500 bootstrap replications. In each bootstrap heuristic search replication, the same parameter settings were used, except that number of heuristic search replications was set to 10. In ML and analyses the best nucleotidfe substitution models for each data set were selected using Modeltest 3.7 by corrected Akaike information criterion . In BI with GTR+I+G model, Markov chain Monte Carlo (MCMC) analysis was run for 1,000,000 generations, sampled every 100th generations. The first 250,000 generations were set as burn-in.
We generated a phylogenetic tree of all TFIIAγ or TFIIAγ-like sequences to explore the duplication history of the two duplicates. For this purpose, we used only coding sequences to construct the gene tree because the intron sequences between TFIIAγ1 and TFIIAγ5 were unalignable. The phylogenetic tree was rooted by the TFIIAγ-like genes of Liriodendron tulipifera and Persea americana that belong to two families (Magnoliaceae and Lauraceae) of the basal angiosperms .
The ratio of nonsynonymous to synonymous substitution sites (dN/dS or ω) is an effective measure to detect selection on a gene or gene region . If the ratio is significantly less than 1 (ω < 1), purifying selection is inferred, while positive selection is evoked if the ratio is significantly greater than 1 (ω > 1). An estimate of the ratio close to 1 (ω = 1) indicates the presence of neutral evolution. To explore the selective processes acting on TFIIAγ genes, we performed likelihood-based analyses using the codeml program of PAML version 4 . We first tested whether the average ω ratio differed among lineages of the gene tree by using the branch models that allow ω to vary among lineages and assume different ω ratios assigned to the branches before and after the duplication event. The one ratio model (M0) assumes a single ω for all branches and all sites, whereas the other models allow for different ω ratios among branches of the tree. The free ratio model (Mf) assumes an independent ω ratio for each branch of the tree. The two ratio model M2r assumes one ω ratio to all branches predating the duplication event (ω0), and the other ratio to all branches postdating the duplication event (ωd1 = ωd5 = ω1 = ω5). The three ratio model (M3r) assumes one ratio restricted to all branches predating the gene duplication (ω0) and the other two to the branches of TFIIAγ1 (ωd1 = ω1) and TFIIAγ5 (ωd5 = ω5), respectively, following the duplication event. A more complex model, the four ratio model (M4r), assumes four independent ω ratios: one ratio restricted to all branches predating the gene duplication (ω0), one ratio to the branches immediately following the duplication (ωd1 = ωd5), and the last two assigned to the branches leading to TFIIAγ1 (ω1) and TFIIAγ5 (ω5) of grass species, respectively. Finally, the five ratio model (M5r) extends M4r to allow ω ratios to differ between the TFIIAγ1 and TFIIAγ5 branches immediately postdating the duplication (ωd1 ≠ ωd5) (Figure (Figure1;1; Table Table1).1). A likelihood ratio test (LRT) was conducted to determine whether there is statistically significant difference between two models. If the LRT is significant, the null hypothesis that two models are not significantly different is rejected, and the model with higher likelihood value is assumed to be a better model [28,52].
We next used site-specific models to examine whether particular amino acid residues were subject to positive selection because the ω ratio is seldom detected greater than 1 if all the sites are averaged . The nested codon models [28,54] were performed. In addition to one ratio model (M0), nearly neutral model (M1) classifies all the sites into 2 categories, one category under strict constraint (0 < ω < 1) and the other under neutral (ω = 1). Positive selection model (M2) is based on M1 and assumes a third category under positive selection (1 < ω). The discrete model (M3) classifies all the sites into several categories, each with a different ω ratio. Beta model (M7) assumes a beta distribution of the ω ratios, and beta&ω model (M8) extends an independent ratio estimated by the data. Models assuming positive selection M8 and M2 are compared with null models M7 and M1, respectively. Positive selection is invoked if the LRT is significant and there is site with ω > 1 . A comparison between M3 and M0 can tell whether the ω ratio is homogeneous across different part of the gene.
We further performed the branch-site models A and B  to test for sites potentially under positive selection on TFIIAγ1 and TFIIAγ5 branches, respectively. Model A assumes 0 < ω0 < 1 and ω1 = 1 and was compared with nearly neutral model (M1); while model B determines ω0 and ω1 as free parameters to be estimated and compared with discrete model (M3) .
Using genomic DNA we cloned and sequenced two TFIIAγ genes from all sampled Oryzeae species except for Leersia tisserantti for which only TFIIAγ1 was isolated, mainly because the second intron of TFIIAγ5 in this species was too long to be amplified successfully by exTaq DNA polymerase. However, when using cDNA template, we obtained the coding region of TFIIAγ5 and the first intron sequence using an internal primer for this species. Two TFIIAγ copies were also isolated and sequenced from other Poaceae species, including Ehrharta erecta, Zea mays and Sorghum bicolor. Only single TFIIAγ-like gene was isolated from both Cyperus rotundus and Zingiber officinale despite different attempts have been tried, including optimization of PCR amplification, recombination of up and down stream primers. All the TFIIAγ genes obtained in this study have three exons and two introns, with about 261 bp in coding sequence. The downloaded TFIIAγ-like sequences are cDNAs with full coding region. The TFIIAγ1-like sequences of rice, maize and sorghum were 327 bp in length and 9 bp (three codons) longer than the sequences of grass TFIIAγ5-like gene and those from the remaining species outside Poaceae. In Oryzeae, sequence length ranged from 1.3 to 1.8 kb for TFIIAγ1 and from 2.5 to 5.5 kb for TFIIAγ5. The first intron is about 70 ~100 bp in length for both genes, whereas the length of the second intron varied greatly [Additional file 2]. In coding regions, there is no indels between the two copies and can be aligned perfectly. We did not find the V39E substitution that lead to TFIIAγ5 (xa5) to confer resistance to rice bacterial blight in all Oryzeae species, indicating that such a mutation arises within O. sativa. The GC contents for the total and three individual codon positions were similar for the same gene but those at the 3rd position (GC3) is higher in TFIIAγ1 than in TFIIAγ5 (75.9% vs. 70.1%, P < 0.001) [Additional file 3]. Estimates of the codon usage showed that TFIIAγ5 had significantly lower ENC value than TFIIAγ1 (42.9 vs. 48.5, P < 0.001), paralleling its higher expression level in grasses (see below).
The alignment of all the coding sequences was 318 bp in length including gaps. Of them, 152 sites were parsimony informative. A Bayesian phylogeny indicated that all monocot species except for Zostera marina of Zosteraceae formed a monophyletic group, which forms polytomy with the other angiosperm clades. Such unsolved relationship reflects our current understanding of angiosperm phylogeny on which monocots were not resolved fully with many other basal angiosperms . It is noted that all the TFIIAγ sequences from the Poaceae species formed two clades supported by Bayesian posterior probability > 90, one consisting of TFIIAγ1 homologs and the other TFIIAγ5 homologs (Figure (Figure1).1). All the Oryzeae species and most grass species outside Oryzeae have two distinct types of TFIIAγ sequences that fell into the two clades. In some grass species, only one TFIIAγ-like sequence was isolated, which formed a cluster with either TFIIAγ1-like or TFIIAγ5-like clade. In contrast, a single TFIIAγ-like copy was found in two species from the families closely related to Poaceae, Cyperus rotundus of Cyperaceae and Zingiber officinale of Zingiberaceae. Moreover, the monocot clade is sister to the TFIIAγ-like sequences from the remaining angiosperm species (Figure (Figure1).1). ML analyses produced similar tree topologies [Additional file 4]. These observations indicated that the duplication event giving rise to TFIIAγ1 and TFIIAγ5 occurred at the ancestors of Poaceae or before the divergence of Poaceae.
We performed a sliding window analysis by calculating the nucleotide divergence of the entire sequence with JC model (K), of nonsynonymous (dN) and synonymous substitution sites (dS). The dN values for both genes were lower than those of dS (dN/dS≤ 0) in almost all sliding windows but all three parameters fluctuated across the genes (Figure (Figure2).2). The conserved regions in TFIIAy1 are different from those in TFIIAy5 and some sites in TFIIAγ1 might experience relaxation of selective constraints with elevated dN/dS values relative to those of TFIIAγ5 (Figure (Figure2A2A and and2B).2B). In addition, both the K and dN values of TFIIAγ1 were higher than those of TFIIAγ5, suggesting higher rate of evolution in TFIIAγ1 genes. To detect the potential impact of intergenic conversion on molecular evolution , we further calculated the parameters between two paralogs (Figure (Figure2C).2C). We did not find significant difference in evolutionary rates between two domains in which heterogeneity occurred across the sequences. It is evidence that low sequence differentiation was found around the functional regions (e.g., the region that interact with TBP), inconsistent with variation pattern of gene conversion that sequence divergence would occurred around the functional site .
Relative rate test was used to compare the TFIIAγ1 and TFIIAγ5 sequences from the main lineages in grasses in relations to the TFIIAγ-like sequence from Ananas comosus of the family Bromeliaceae that is closely related to Poaceae . For all paralogs from 12 species tested, TFIIAγ1 evolved 1.14 to 1.34 times faster than TFIIAγ5 (Table (Table2).2). The tests were statistically significant or marginal significant for six out of 12 species. When more distinctly related species Zingiber officinale was used as an outgroup, the results were similar in that TFIIAγ1 evolved faster than TFIIAγ5 in all 12 species though the tests were not significant (Table (Table2).2). We calculated the synonymous and nonsynonymous substitution rates of TFIIAγ1 and TFIIAγ5 between the Oryzeae species and found that the average dN value of TFIIAγ1 was significantly higher than that of TFIIAγ5 (0.033 vs. 0.011, P < 0.001); the pairwise dS values of TFIIAγ1 and TFIIAγ5 were also significant (0.155 vs. 0.131, P = 0.001). The accelerated dN in TFIIAγ1 is obvious when we examined the amino acid alignments for the two genes, in which 21 sites had amino acid mutations in TFIIAγ1 in contrast to 14 sites in TFIIAγ5 [Additional file 5]. The overall ω (dN/dS) values for both genes were far below 1 (0.213 for TFIIAγ1 and 0.084 for TFIIAγ5), indicating both genes were subjected to selection constraint, but the constraint on TFIIAγ5 was stronger.
We used different kinds of likelihood ratio tests to examine whether there was variation of ω ratios on different lineages and, in particular, whether there is any increase in the ω ratio after the TFIIAγ duplication. Free ratio (Mf) and two ratio (M2r) models both have significantly higher likelihood scores than one ratio model (M0), rejecting the null hypothesis that the TFIIAγ-like genes have evolved at constant rates along branches (Table (Table1).1). However, branch-specific ω values under Mf model were all lower than one (ranging from 0 ~0.513), suggesting that purifying selection or constraint on amino acid sequence best explains the evolution of TFIIAγ-like genes in angiosperms. Two ratio model, with ω0 = 0.046 for all branches before the TFIIAγ duplication and ωd1 = ωd5 = ω1 = ω5 = 0.077 for the branches after the duplication, fits the data significantly better than one ratio models (M2r vs. M0, 2ΔL = 10.38, P < 0.001), indicative of a significant increase in ω value following the duplication event. We further calculated the likelihood under comparison between models M3r and M2r to explore the assumption of the same selective constraints at two TFIIAγ genes after the duplication event. Likelihood of model M3r was significantly better than M2r (2ΔL = 12.84, P < 0.0001), suggesting that different selective pressures occur in the two TFIIAγ genes with stronger purifying selection in TFIIAγ5 (ωd5 = ω5 = 0.060) than in TFIIAγ1 gene (ωd1 = ω1 = 0.118). Finally, the comparison between M5r and M4r indicated that the ω ratios of the two branches immediately following the duplication event were not significantly different from each other (2ΔL = 2.12, P = 0.145) (Table (Table1),1), implying that the asymmetric rates of TFIIAγ evolution occurred mainly after diversification of grasses.
Given the fact that the selective constraints on TFIIAγ genes relaxed after duplication and conferred disease resistant or induced by pathogen in cultivated rice, it is interesting to ask whether any accelerated rate of relaxation happen and any amino acid residue is potentially under positive selection. Because the branch model test averages the ω ratios across all sites and is a very conservative test of positive selection , we applied site-specific and branch-site models to TFIIAγ dataset. As shown in Table Table3,3, site-specific modelsindicate that TFIIAγ genes were under strong purifying selection with ω = 0.055 in one-ratio model (M0). The discrete model (M3) was significantly better than M0 (2ΔL = 193.62, P < 0.001), indicating that the ω ratio was not homogeneous among sites along the sequence. This is also obvious in the sliding window analysis (Figure (Figure2)2) and the amino acid alignment of TFIIAγ genes [Additional file 5]. Models M2 and M8 assuming positive selection were not significantly better than the null models M1 and M7 (for M1 vs. M2, 2ΔL = 0.0, P = 1.0; for M7 vs. M8, 2ΔL = 0.0, P = 1.0), and no site was found to be under positive selection by Bayes Empirical Bayes (BEB) inference  using a probability criterion of 95%. Thus, the nearly neutral model was better to explain the data. In model M1, about 94% of the codons are under strict constraint (ω = 0.030), and the other 6% codons are under neutral evolution (ω = 1.0) (Table (Table33).
We further tested for evidence of positive selection on two TFIIAγ genes separately using branch-site models (Table (Table3).3). Branch-site models A and B specifying branch TFIIAγ1 as the foreground branch were not significantly better than the null models M1 (2ΔL = 0.1, P = 0.95) and M3 (2ΔL = -44.92, P = 1.0). In analyses of the branch TFIIAγ5, however, model A was significantly better than the null model (2ΔL = 13.66, P < 0.001) with ω ratio greater than 1, but model B was not significantly better than the null model (Table (Table3).3). We checked the inferred positive selection site (90T) across all protein sequences and found that it was fixed in both copies, with all TFIIAγ1 genes being T and TFIIAγ5 genes Q [Additional file 5]. This observation suggests it unlikely that positive selection occurs in either copy in grasses. Alternatively, this site might experience positive selection immediately after duplication of TFIIAγ gene in ancestor of grasses and then fixed under strong purifying selection in grasses. It should be noted that the TFIIAγ5 gene was highly expressed with significantly lower ENC relative to TFIIAγ1 gene [Additional file 3]. Therefore, the ω value greater than one at 90 site of TFIIAγ5 gene might be caused by low dS value rather than positive selection because synonymous sites are likely to be under negative selection in highly expressed genes due to codon usage bias .
Two rounds of RT-PCR were performed to determine the expression of TFIIAγ1 and TFIIAγ5 genes in tribe Oryzeae species. In the first round, equal amount of template cDNA was added in the reaction of TFIIAγ1 and TFIIAγ5. The expression of TFIIAγ5 was detected in all the leaves and young panicles, while the expression of TFIIAγ1 was weaker than that of TFIIAγ5 for most expected bands, and were almost invisible in O. officinalis, O. australiensis and Leersia tisserantti (Figure (Figure3).3). The weaker bands of TFIIAγ1 indicated that it was expressed at lower level relative to TFIIAγ5. When a second round PCR was taken, the expected bands appeared in all the species. To avoid contamination, all RT-PCR products of TFIIAγ1 and TFIIAγ5 were confirmed by sequencing, and the resulting sequences were identical to the coding regions of genomic sequences in each species. These results showed that both copies were expressed in leaf and young panicle of Oryzeae species, but the TFIIAγ5 was expressed at higher level.
Different expression levels of two TFIIAγ genes were further confirmed by the GenBank EST database search using rice TFIIAγ1 and TFIIAγ5 sequences. Both copies were found in rice, maize and sorghum, but the hits of TFIIAγ5 far outnumbered those of the TFIIAγ1 copy in rice and maize [Additional file 6]. In several other Poaceae species, only the TFIIAγ5 copy was found. The low number of hits indicated that the TFIIAγ1 expression was much lower than that of TFIIAγ5, consistent with our RT-PCR findings. In addition, the matches of TFIIAγ5 expression were found in all types of cDNA libraries, including the callus, mature or immature tissue, stressed or unstressed and different developing stage libraries; whereas the TFIIAγ1 hits appeared mainly in drought-stressed tissue, pollen, immature and meristematic and mixed libraries [Additional file 6]. These observations suggest that TFIIAγ5 might be constitutively expressed and TFIIAγ1 be expressed under stress induction or expressed in specific tissues.
This study identified two TFIIAγ genes for all Oryzeae species and the representatives of grass species, which formed two monophyletic clades corresponding to the rice TFIIAγ1 and TFIIAγ5 genes; whereas only a single copy was found for the remaining monocots and angiosperm species. Phylogenetic analyses of all the TFIIAγ-like sequences indicated that the duplication of TFIIAγ into TFIIAγ1 and TFIIAγ5 occurred before the divergence of rice and maize (Figure (Figure1).1). This implies that the duplication event that gave rise to TFIIAγ1 and TFIIAγ5 genes might occur before the common ancestor of extant grasses because rice (subfamily Ehrhartoideae) and maize (subfamily Panicoideae) are two distinctly related lineages in the grass family [58,59].
It has been demonstrated that the rice genome experienced two large-scale duplications, one whole genome duplication occurred about 70 MYA, and an additional segmental duplication happened 5 ~ 21 MYA involving chromosomes 11 and 12 [60-62]. Previous studies found that the location of two rice TFIIAγ genes corresponded to a large-scale duplication of a portion of rice chromosomes 1 and 5 [7,8]. To determine whether the timing of the duplication event leading to TFIIAγ1/TFIIAγ5 is consistent with the whole genome duplication around 70 MYA, we calculated the synonymous distance (dS) between TFIIAγ orthologs and paralogs for rice and maize by the method of Nei and Gojobori (1986). The dS distances between the TFIIAγ orthologs were 0.388 for rice and 0.457 for maize and those between the paralogs of rice and maize were 0.592 (TFIIAγ1) and 0.497 (TFIIAγ5), respectively. According to a molecular clock assuming rice and maize diverged 50 MYA , the TFIIAγ1 and TFIIAγ5 paralogs diverged about 54 ~76 MYA. This date coincides with the time scale that Poaceae diverged 55 ~ 77 MYA [58,59]. Wang et al. (2005) identified 10 large duplicated blocks arising from the whole genome duplication, including two blocks involving chromosomes 1 and 5. Our further search on rice genome found that two rice TFIIAγ genes located on block 10 determined by Wang et al. (2005). Therefore, the TFIIAγ duplication is within a large duplicated segment of rice genome and most likely to arise following a whole genome duplication event that was assumed to have occurred before the divergence of Poaceae [60-62].
Our timing of the TFIIAγ duplication suggests that the TFIIAγ1 and TFIIAγ5 paralogs have been maintained in the grass genome for a considerable amount of time (at least 50 MYA). This implicates that selection rather than random drift is responsible for the retention of both TFIIAγ activities during grass evolution because most gene duplicates have a short lifespan (within a few million years after duplication) before one copy was deleted (pseudogenization) . It has been well established that gene duplication is often followed by an elevated rate of protein evolution and a large proportion of the duplicate pairs displayed asymmetric evolution, i.e., one of the duplicates evolves much faster than the other [19,29,63-65]. Conant and Wagner (2003) analyzed four completely sequenced genomes and found that 20% - 30% of duplicate gene pairs showed asymmetric evolution in the amino acid sequence, and particularly, the greater this asymmetry, the greater the dN/dS ratio in a gene pair, indicating that most asymmetric divergence might be caused by relaxed selective constraints on one of the duplicates. In well agreement with previous studies, we found significantly higher ω ratios for branches arising from the duplication event in rice tribe and its relatives, suggesting weaker purifying selection on the duplicate genes during diversification of grasses after the duplication event. Moreover, the ω ratios of the TFIIAγ1 sequences are two times higher than those of TFIIAγ5 sequences, consistent with the results of relative rate tests in which TFIIAγ1 evolved faster than TFIIAγ5 (Table (Table2).2). Such an asymmetric evolution of the TFIIAγ duplicates reflects an acceleration of evolutionary rate of TFIIAγ1 relative to TFIIAγ5. Our likelihood-based analyses with both branch and codon models showed no evidence of positive selection but a signature of relaxed selective constraint after the TFIIAγ duplication and subsequent acceleration of TFIIAγ1 gene. The low ω values (0.060 ~ 0.118) across the branches leading to both TFIIAγ duplicates also suggest that strong selection constrains remain for the two copies after the duplication, with TFIIAγ1 evolving under weaker selective constraint in grass species.
The fate of duplicated genes has been a hot debate since Ohno (1970), and several hypotheses have been proposed to interpret the preservation of both copies, including neofunctionalization , subfunctionalization [21,24], subneofunctionalization  and some other models (see review in Semon and Wolfe 2007). Based on sequence analyses and expression data, Iyer and McCouch (2004) found that the recessive mutation on TFIIAγ5 locus for resistance to rice bacterial blight did not affect the essential function of TFIIAγ gene and hypothesized that TFIIAγ5 functioned both as a general transcription factor and as a resistance gene (xa5) in rice, which was further demonstrated by subsequent complementation test and 3-D structure prediction . We conducted a secondary structure prediction of the TFIIAγ1 and TFIIAγ5 proteins of grass species and found little difference in the secondary structures between the two copies [Additional file 5]. These observations, in combination of our molecular evolutionary analyses (Tables (Tables11 and and3),3), demonstrated that both TFIIAγ genes were functional and under selection constraint in Oryzeae and its relatives. Thus, pseudogenization is unlikely involved in TFIIAγ evolution. Because extra amounts of protein or RNA products such as rRNAs and histones are in high demand , the retention of both TFIIAγ copies might be attributed partly to the importance of TFIIAγ as a component of TFIIA that is a general transcription factor needed in all polymerase II transcriptions [4,5].
Jiang et al (2006) investigated the expression patterns of two TFIIAγ genes in rice and indicated that the TFIIAγ1 gene was not expressed in young panicle, in contrast to TFIIAγ5 that expressed in all organs tested (leaf, stem, panicle, and root). In our study on O. sativa, O. punctata and Z. latifolia, however, the expression of TFIIAγ1 was detected in both leaves and young panicles but the expression level was much lower relative to TFIIAγ5 gene (Figure (Figure3).3). These observations, in conjunction with our expression data, indicate that after whole genome duplication, the expression of TFIIAγ1 copy was significantly reduced while TFIIAγ5 remained constitutively expressed and maintained the ancestral role as a subunit of the TFIIA complex. Consequently, it seems that subfunctionalization might be involved in TFIIAγ evolution in grasses. The case of TFIIAγ genes agree with previous notion that subfunctionalization would lead to functional specialization when one of the duplicate genes became better at performing the original function of the progenitor gene . Nevertheless, the possibility that positive selection on some specific sites immediately after duplication of TFIIAγ gene in ancestor of grasses cannot be excluded entirely given short length of the TFIIAγ gene and the inference power of methods in our case .
One important point for the evolution of TFIIAγ genes is the evidence that both TFIIAγ1 and TFIIAγ5 genes were effectively involved in response to biotic or abiotic factors. In addition to xa5 mutation that lead to resistance to rice bacterial blight, a recent study documented that the expression of TFIIAγ1 could express 400-fold greater than normal when infected by specific bacterial races (PXO99A) that cause blight disease . Our EST database search also found the frequent presence of TFIIAγ1 gene in drought-stressed cDNA library both in rice and sorghum, implying its inducibility by drought stress [Additional file 6]. As pointed out by previous authors, gene redundancy might create subtle fitness advantage that was only evident in particular stages of the life cycle or under particular environments [25,68,69]. Therefore, the fate of the duplicated TFIIAγ genes can be alternatively explained by the Dykhuizen-Hartl effect [31,34], which predicts that one of duplicate genes evolves under relaxed purifying selection and the fixed mutations later convey a selective advantage in a novel environment or genetic background. It is noted that the V39E substitution in the α-helix domain of TFIIAγ5 was confined only to some varieties of O. sativa, suggestive of its recent emergence [7,8] [Additional file 5].
The involvement of the duplicated TFIIAγ genes in adversity response could also be explained by the buffering hypothesis , which suggests that selection for a buffering effect was a mechanism for duplicate gene preservation after whole genome duplication. By exploring the footprints of selection associated with genome duplication in Arabidopisis ecotypes and rice subspecies, Chapman et al. (2006) found that functional buffering might be important against genetic turbulence after genome duplication and could continue to act ~60 million years later. Retention of duplicate genes, particularly for complex genes and gene network, plays a critical role for genetic robustness of biological systems [22,25,27,70,71]. TFIIA is a complex consisting of three polypeptides and assumed recently to be tightly regulated with a particular role in differentiation and development . Further biochemical and molecular investigations on the respective functions and the interactions between TFIIAγ and the other two components will be required to better understanding of the biology of the transcription factor TFIIA and to provide useful insights into the evolution of TFIIAγ and its counterparts.
Based on phylogenetic reconstruction of the TFIIAγ genes from main lineages of angiosperms, we demonstrated that two TFIIAγ genes (TFIIAγ1 and TFIIAγ5) arose from a whole genome duplication that happened in the common ancestor of grasses. Likelihood-based analyses with different models showed no evidence of positive selection but a signature of relaxed selective constraint after the TFIIAγ duplication. In particular, the nonsynonymous/synonymous rate ratio (ω = dN/dS) of the TFIIAγ1 sequences was two times higher than that of TFIIAγ5 sequences, indicating highly asymmetric rates of protein evolution in rice tribe and its relatives. Our expression data and EST database search further indicated that after whole genome duplication, the expression of TFIIAγ1 gene was significantly reduced while TFIIAγ5 remained constitutively expressed and maintained the ancestral role as a subunit of the TFIIA complex. These observations are not consistent with the neofunctionalization model that predicts that one of the duplicated genes acquires a new function and instead, implicate that subfunctionalization might be involved in TFIIAγ evolution in grasses. The fact that both TFIIAγ1 and TFIIAγ5 genes were effectively involved in response to biotic or abiotic factors might be explained by either Dykhuizen-Hartl effect or buffering hypothesis.
TBP: TATA-binding protein; ENC: effective number of codons; EST: expressed sequence tags; ML: maximum likelihood; BI: Bayesian inference; MCMC: Markov chain Monte Carlo.
SG and HZS designed the research and outlined the manuscript together. HZS performed the research. HZS and SG analyzed and interpreted the data. SG and HZS wrote the paper. Both authors have read and approved the final manuscript.
TFIIAγ-like sequences included in this study.
Gene structure and the location of primers. Universal forward (P1 and P3) and reverse (P2 and P4) primers are shown above the genes and the copy-specific internal sequencing primers (P7 and P8) are shown below the gene. Exons are shown in boxes and the shaded boxes are coding regions.
GC contents (%) and ENC of TFIIAγ1 and TFIIAγ5 in Oryzeae species and its relative.
Maximum likelihood tree using GTR+I +G model of evolution. Bootstrap values > 50% are shown above branches.
Amino acid alignment of the TFIIAγ genes. 2D structure in the bottom is predicted by PredictProtein http://www.predictprotein.org/ using O. sativa sequences as references. H represents the alpha helix and E the beta strand.
EST hits of grass TFIIAγ genes in GenBank EST database.
We thank Qihui Zhu, Xin-Hui Zou, Yan-Hua Yang and other members of Ge's group for their helps during the experiment and data analyses. We also thank Frank White and Bin Yang of Kansas State University of USA for providing useful information during early stage of this study. We are grateful to the International Rice Research Institute (Los Banos, Philippines) for providing leaf and seed samples. This study was supported by the National Basic Research Program of China (2007CB815704), National Natural Science Foundation of China (30990240 and 30430030), and the grants from the Chinese Academy of Sciences.