|Home | About | Journals | Submit | Contact Us | Français|
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Polyploidy has played a prominent role in shaping the genomic architecture of the angiosperms. Through allopolyploidization, several modern Gossypium (cotton) species contain two divergent, although largely redundant genomes. Owing to this redundancy, these genomes can play host to an array of evolutionary processes that act on duplicate genes.
We compared homoeolog (genes duplicated by polyploidy) contributions to the transcriptome of a natural allopolyploid and a synthetic interspecific F1 hybrid, both derived from a merger between diploid species from the Gossypium A-genome and D-genome groups. Relative levels of A- and D-genome contributions to the petal transcriptome were determined for 1,383 gene pairs. This comparison permitted partitioning of homoeolog expression biases into those arising from genomic merger and those resulting from polyploidy. Within allopolyploid Gossypium, approximately 24% of the genes with biased (unequal contributions from the two homoeologous copies) expression patterns are inferred to have arisen as a consequence of genomic merger, indicating that a substantial fraction of homoeolog expression biases occur instantaneously with hybridization. The remaining 76% of biased homoeologs reflect long-term evolutionary forces, such as duplicate gene neofunctionalization and subfunctionalization. Finally, we observed a greater number of genes biased toward the paternal D-genome and that expression biases have tended to increases during allopolyploid evolution.
Our results indicate that allopolyploidization entails significant homoeolog expression modulation, both immediately as a consequence of genomic merger, and secondarily as a result of long-term evolutionary transformations in duplicate gene expression.
A hallmark of angiosperm genome organization is gene redundancy. Redundant genome segments have been identified in the composition and architecture of modern-day angiosperm genomes suggesting one or more ancient genome duplication events [1-3]. This has led to considerable interest in the evolution of the resulting duplicated genes. A key issue has been the identification of factors that enhance the retention of duplicate gene pairs and their potential for adaptive diversification or subfunctionalization (the partitioning of ancestral function). Mechanisms such as the maintenance of gene dosage and epistatic interactions [4,5] and epigenetically regulated expression subfunctionalization [6,7] have been implicated in aiding duplicate gene retention. These processes describe mechanisms of retention for ancient duplicate genes and naturally lead to questions about the evolutionary behavior of duplicate gene pairs in more recently formed polyploid species.
Members of the cotton genus provide a phylogenetic framework to study the evolution of duplicate gene expression in recent polyploids because five diverse allopolyploid species are thought to have diverged from a single allopolyploidization event , and models of the ancestral diploid progenitor species (denoted by A2 and D5) have been identified (Figure (Figure1A).1A). In addition, extensive genomic resources, such as comprehensive expressed sequence tag (EST) libraries , microarray platforms [10,11], and BAC libraries  have greatly extended research capabilities. Synthesis of an F1 hybrid, combining the A- and D-genome diploid model species, offers the opportunity to untangle the effects of genomic merger from those arising from genome doubling and subsequent evolutionary change. This phylogenetic framework facilitates the study of gene expression from co-resident genomes on two temporal scales, from the onset of hybridization to a longer-term evolutionary timeframe encompassed by the natural allotetraploid species.
Adams et al  demonstrated that homoeolog expression in allotetraploid cotton has been strongly influenced by developmentally regulated, organ-specific silencing, resulting in subfunctionalization of the aggregate ancestral expression profile. This subfunctionalization may occur immediately after polyploidization or may arise over a longer period of evolutionary resolution [13,14]. The net effect is a process that appears to impose a form of selective retention on both homoeologs. Thus, expression subfunctionalization leads to prolonged duplicate gene retention, which may in turn enhance the potential for spatial, temporal, or functional divergence of duplicated genes.
Here we employ a novel microarray technology, which uses homoeolog specific probe sets, to assess the relative contribution of 1,383 homoeologous gene pairs to the transcriptome of natural allopolyploid Gossypium hirsutum and a synthetic, diploid F1 hybrid (denoted as AD1 and F1, respectively). We show that the two genomes contribute unequally to the total transcriptome of the allopolyploid. By comparing these entities we demonstrate that, for a substantial fraction of the genome, homoeolog expression biases occur immediately with the onset of genomic merger. In addition, a greater number of homoeolog expression biases appear in allopolyploid cotton that likely were not instigated by genomic merger. These findings indicate that upon allopolyploid formation, homoeolog expression biases happen in two, distinct temporal phases.
We analyzed the relative A- and D-genome contributions to the transcriptome of a synthetic F1 hybrid and AD1 allotetraploid cotton. This was done by comparing these mixed transcriptomes with the A2 and D5 model progenitors as well as with a 1:1 mix of A2 and D5 (Figure (Figure1A).1A). In total, 7,574 homoeolog-specific probe sets (around 33% of all possible) representing 1,383 unique EST contigs (hereafter referred to as genes) were identified as being reciprocally diagnostic with respect to identifying A- and D-genome specific expression in the F1 hybrid and allotetraploid cotton. Thus, using conservative measures (false discovery rate (FDR) ≤ 0.05), we recovered 1,383 diagnostic genes, representing 2.6% (see ) to 4% (J Hawkins, personal communication) of the genic content of the cotton genome. As expected, a principal component analysis on the natural log differences of A- and D-genome expression distinguished among all accessions, placing the AD1, F1, and 1:1 mix values intermediate between A2 and D5 along the first axis (see Figure S1 in Additional file 1). This indicates that the homoeolog-specific probes have performed as designed, and can be expected to yield useful estimates of A- and D-genome contributions to the transcriptome. Furthermore, quantitative mass-spectrometry validation of 12 homoeologous gene pairs from AD1 and 13 homoeologous gene pairs from F1 indicate that our findings regarding homoeolog-specific expression are reproducible (comparisons between platforms yielded R2 values of 0.37 and 0.39 and p-values of 0.035 and 0.022, for AD1 and F1, respectively; see Figure S2 in Additional file 1).
For each gene, a linear model was fit to the three replicate measures of relative A- and D-genome contributions. Using FDR corrected p-values (FDR ≤ 0.15) from this model, each gene from the AD1 and F1 samples was categorized as 'A-biased' (log ratio ((ln(Aprobe) – ln(Dprobe)) statistically greater than 1:1 mix), 'D-biased' (log ratio statistically less than 1:1 mix) or 'Equivalent' (log ratio not statistically different from 1:1 mix); see Figure Figure2A.2A. This categorization system is a rudimentary representation of the spectrum of homoeolog expression values, however, all categorizations presented here are based on known reference samples, which mitigates the effects of differential hybridization among homoeolog-specific probe pairs. In addition, this categorization is a statistical description of genome-specific transcript ratios and not a declaration of biological relevance (as pertaining to phenotype) of biases, which are unknown at present. Using this approach, many diagnostic gene pairs (29.9% (414 out of 1,383) of AD1 and 69.5% (961 out of 1,383) of F1) were inferred to be equivalently expressed in petals. We infer that these gene pairs showed no statistically significant change in homoeologous (or allelic for the F1 hybrid) contribution to the transcriptome relative to the in vitro mid-parent value. Among those genes exhibiting biased expression, there was an approximately 1.3x and 2.5x overrepresentation of the D-genome biased genes in petal tissues of AD1 and F1, respectively (Figure 2A, B). In addition, we detected 46 AD1 and 6 F1 genes that appear to be A-genome silenced and 69 AD1 and 5 F1 genes that are D-genome silenced, indicating a significant increase in silencing in the AD1 allopolyploid in both the A- and D-genomes. For a limited sampling of genes, expression biases comparable to those above have been demonstrated previously in cotton [6,13,14,16].
The comparison between the artificially synthesized F1 hybrid and the 1–2 MY old natural allopolyploid, G. hirsutum (AD1), allows us to assess the role genomic merger plays in the allopolyploidization process [14,17]. The inclusion of model A- and D-genome diploid progenitors facilitates inference of ancestral expression states and, hence, the directionality and pace of expression evolution (Figure 1A, B). An additional temporal dimension to the analysis concerns homoeolog-specific expression biases detected in the AD1 allopolyploid that were also detected in the F1 hybrid (Figure 2B, C). This is demonstrated by both the sizable set of shared genes found within all expression categories (Figure (Figure2A)2A) and the positive correlation (Pearson's r = 0.391; p-value < 2.2 × 10-16) between estimates of genomic contribution in the F1 hybrid compared with those from the allopolyploid (Figure (Figure2C).2C). Overall around 24% (235 out of 969) of the genes with an A- or D-genome expression bias in the polyploid are also found to be biased in the same direction in the F1 hybrid. This indicates that a significant portion of the expression evolution associated with allopolyploidization may have accompanied the initial genomic merger.
An additional directional trend in the data is a tendency for the allopolyploid genes to exhibit more extreme expression biases (Figure (Figure2D).2D). Both the A- and D-genome biased genes demonstrate a greater number of more extremely biased AD1 genes, when compared with the F1 (1 and 18 gene(s), respectively, for shared A- and D-biased sets). In addition, paired t-test for equality between AD1 and F1 values confirm that the differences in means between AD1 and F1 are significantly different for D-biased genes (AD1 mean = -0.45 and F1 mean = -0.36; p-value = 6.63 × 10-5), and marginally non-significant for A-biased genes (AD1 mean = 0.46 and F1 mean = 0.37; p-value = 0.07). Thus, for genes with immediate expression biases toward one parental Gossypium genome, stabilization and evolution of the allopolyploid genome preferentially continues to enhance this initial bias.
It has long been thought that gene and genome duplication may serve as a key source of evolutionary innovation [18-23]. Recently, studies from a diverse array of organisms have demonstrated that gene duplication stimulates a variety of evolutionary outcomes [6,18,19,24-30]. These studies have demonstrated that following duplication, genes may evolve rapidly both at the sequence level and in their expression profile. It is thought that much of this change occurs as a result of the relaxation of purifying selection that occurs following duplication [18-20,22]. During this period of relaxed selection, duplicate genes either find new roles (neofunctionalize), partition ancestral roles (subfunctionalize) or accumulate deleterious mutations and decay as pseudogenes. These processes are thought to occur on an evolutionary timescale measured in thousands to millions of years; for example, it has been estimated that the average half-life of duplicate gene pairs is of the order of 3 to 7 MY for mammals, invertebrates, and plants . Here we have demonstrated that expression divergence among many genes duplicated by allopolyploidy (AD1) is already apparent at the stage of interspecific genomic merger between two genomes (F1). These genes, with conserved homoeologous biases between an ancient allotetraploid and modern F1 hybrid, represent the proportion of loci we might expect to have immediately experienced expression alteration at the time of allopolyploid origin 1 to 2 MYA. These data indicate that the critical parameter 'time to subfunctionalization' [18,19], may actually be zero for a significant fraction of the genome in allopolyploid plants. Thus, we conclude that during allopolyploidization, genomic merger per se plays a crucial and persistent role in determining subsequent evolutionary trajectories in homoeolog expression patterns.
In addition to the foregoing set of genes inferred to have experienced instantaneous expression alteration as a consequence of genomic merger, an even larger class of genes did not exhibit shared expression biases in the F1 hybrid and AD1 allopolyploid. Specifically, 76% of the genes that displayed biased expression in AD1 were not biased in the intergenomic F1. Reciprocally, about 44% (187 out of 422) of the genes with biased expression in the F1 were not biased in AD1. These differences of expression bias may reflect (1) additional expression evolution in allopolyploid cotton since the interspecific genomic merger via the mechanisms of neo-, sub-, and non-functionalization [18,20], (2) differences between the parents of the F1 hybrid and the actual diploid progenitors of AD1 (Figure (Figure1A);1A); that is, the extant diploids are good models but they are not the actual progenitors of allopolyploid Gossypium, or (3) elimination of initial genome specific biases during chromosome doubling or subsequent evolution of the natural AD1 allopolyploid.
It has been shown that genes belonging to some functional categories are retained, following duplication, at a higher than expected rate . As a corollary, it might be expected that gene function could also affect the likelihood of retention of expression bias. To explore this, we asked if genes from particular Gene Ontology (GO)  categories were over- or under-contributing to particular expression bias classes within the F1 hybrid and AD1 allopolyploid. Using the Blast2GO software , only two GO categories were found to be significantly over-represented and none were under-represented (FDR ≤ 0.05; data not shown). Both significant GO categories were inclusive high-level biological processes (cofactor metabolic process (GO:0051186); and coenzyme metabolic process (GO:0006732)), and were contained within the equivalently expressed genes from the F1 hybrid. We had, however, only limited power (that is, small numbers of genes within GO categories) to detect distortions between the observed and expected frequencies of GO categories. Thus, within our subset of analyzed genes, gene classification does not appear to be a strong predictor of the direction or degree of genome-specific bias, although the strength of this conclusion may be limited by our current sample size.
Taken together, these data indicate that a significant proportion (around 24%) of duplicate gene expression evolution, ascribed to allopolyploid cotton, could have been generated immediately during allopolyploid formation by genetic and epigenetic factors associated with interspecific genomic merger [4,5,33]. In addition, following allopolyploidy formation, subsequent duplicate gene evolution plays a large role in shaping homoeolog expression patterns. Thus, both immediate and long-term evolutionary processes contribute to homoeologous expression patterns. Based on this we speculate that expression-induced evolutionary novelty in allopolyploids occurs in two distinct modes: first, an immediate, massive, and saltational disruption of ancestral expression patterns accompanying the polyploidization process; and then a second, more gradual phase of expression evolution mediated by the mechanisms of duplicate gene evolution embodied in the traditional models [18,20] of the race between duplicate gene preservation and pseudogenization.
The signature of paleopolyploidy (ancient polyploidy) can be found in the genomes of all angiosperms [1-3,34-36]. In addition, a high proportion (30% to 50%) of paleologs (duplicate gene pairs arising from a paleopolyploidy event) can be retained for millions of years [3,18,34]. Adams and Wendel  have shown that A- and D-genome allelic pairs at the Adh locus display reciprocal silencing across multiple tissues in two Gossypium F1 hybrids. Thus, upon genomic merger ancestral gene expression domains are immediately partitioned and purifying selection is placed on both duplicate gene pairs, thereby increasing the probabilities of co-retention. To the extent that the results of Adams and Wendel  are mirrored by the present analysis, we have demonstrated that, in petals, around 17% (235 out of 1,383; Figure Figure2A)2A) of the homoeologous gene pairs studied could potentially fit this model, by having been found to be biased immediately in the F1 and by having that bias retained throughout allopolyploidy. If we extrapolate this finding to the entire Gossypium genome, it would indicate that, following polyploidization, a large number of homoeologs could be retained by 'instantaneous subfunctionalization', occurring solely from the initial effects of genomic merger. Furthermore, given that these biases appear to have been maintained for about 1 to 2 MY following polyploidization, this immediate form of expression bias may play an underappreciated role in the retention of duplicate genes following whole genome duplication .
An intriguing aspect of the expression bias data is that for both natural AD1 allopolyploid and the interspecific F1 hybrid, a greater number of genes exhibited a D-genome bias than the reverse (Figure 2A, B). This bias favors the paternal D-genome genome, and stands in contrast to the recently reported A-genome bias described for ovular tissue . To the best of the authors' knowledge, this is the most extensive example of widespread paternal dominance. When considered in light of the results of Yang et al , our data suggest that neither Gossypium genome is globally dominant with respect to expression, but that instead, each genome may have local dominance in certain tissue types or developmental stages. This finding confirms previous results in Gossypium [6,11] but differs from recent analysis of allotetraploid Arabidopsis . In the latter study, leaf and flower bud tissues from a synthetic Arabidopsis allotetraploid were shown to exhibit dominance favoring only its A. arenosa parent, with genome-wide suppression of the A. thaliana parental contribution. In the tissues that have been studied in Gossypium and Arabidopsis, it appears that both species demonstrate biased parental contributions to the transcriptome, however, in Gossypium these biases can favor either parental genome, whereas in Arabidoposis only the A. arenosa parent has demonstrated dominance. These findings reflect the importance and perhaps ad hoc nature of specific genomic combinations and their interactions during allopolyploidization.
A notable observation in the present study is that genes showing biased expression patterns, tend to have more extreme biases in the AD1 allopolyploid (Figure (Figure2D),2D), including a much larger number of silenced genes (115 total), when compared with the F1 (11 total). One possible explanation for enhancement of genome-specific expression in allopolyploid cotton could be that immediately acting epigenetic effects become evolutionarily stabilized, either by natural selection or neutral processes. If this stabilization process is predisposed (through neutral or adaptive mechanisms) toward enhancing the initial expression bias, the result would be evolution toward a more extreme bias. This amplification of expression bias, which to our knowledge has not been described previously, may represent an additional factor underlying the genesis of phenotypic novelty in allopolyploid species.
These results extend previous findings of homoeolog expression biases in hybrid and allotetraploid cotton [6,11,13,14,16]. By employing microarray technology to analyze a large number of genes, we describe the general phenomenon of genomic expression bias in both a modern synthetic F1 hybrid and an ancient allotetraploid. Furthermore, for petal tissues, these biases favor the parental D-genome and have become more extreme in the allotetraploid when compared with the F1 hybrid. By comparing homoeolog contributions to the transcriptome from the F1 hybrid and AD1 allotetraploid, it was possible determine the role of genomic merger in producing homoeolog expression biases. Given this comparison, we have shown that a significant fraction of the expression biases found in the allotetraploid is likely initiated immediately by genomic merger. A still larger fraction of the expression biases is inferred to have arisen from long-term evolutionary processes, thus implicating two temporally distinct phases of expression evolution following allopolyploidization.
Three replicate blocks of four Gossypium accessions (A2 | D5 | A2♀ X D5♂ F1 | AD1; Table Table1)1) were grown in the Pohl Conservatory at Iowa State University, Ames, IA. These four accessions include representatives of both diploid progenitor genomes (A- and D-genomes) of natural allopolyploid cotton, their synthetic F1 hybrid, and an allotetraploid, respectively  (Figure (Figure1A).1A). Petals from all four accessions were harvested on the day of anthesis and three biological replicates were generated by pooling tissues from a minimum of eight flowers obtained from three individuals, or alternatively from a minimum of three flowers from a single individual if multiple individuals were not available (applicable only to F1 hybrid). RNA extractions were performed following a modified hot borate procedure optimized for Gossypium . All RNA samples were quantified and visually assessed for degradation and DNA contamination via a Bioanalyzer (Agilent Technologies, Santa Clara, CA). From each pair of A2 and D5 replicates, an equimolar RNA mix (1:1 mix) was made. RNA samples were sent to NimbleGen Systems (Madison, WI), for cDNA synthesis, labeling, and hybridization to 15 microarrays, following proprietary protocols.
We have designed and implemented a novel microarray platform capable of measuring homoeolog-specific expression in Gossypium species (Figure (Figure1B).1B). The utility of this design has been demonstrated with our first-generation arrays , but rapid developments in the depth of cotton EST resources, EST assembly quality, and microarray probe density enabled us to create a second-generation platform, which was used in this study. A description of the microarray design can be found in Additional file 1. This second-generation platform features oligonucleotide probe-pairs near 35 bases in length differing by an A- or D-genome homoeolog-specific single nucleotide polymorphism (SNP) at their middle base (Figure (Figure1C,1C, box). Thus, the microarray platform has the ability to measure expression levels separately for each homoeolog, detected by the corresponding homoeolog-specific probe.
Raw data values for each microarray were natural log transformed, median centered, and scale normalized across all arrays prior to analysis. For each homoeolog probe pair the difference of natural logs of the A- and D-homoeolog-specific probe was calculated ((ln(Aprobe) – ln(Dprobe); hereafter referred to as log ratio). Using this approach, positive values indicate an A-genome bias, whereas negative values indicate a D-genome bias. A linear model including effects for replication and genotype was fit to the log ratio data from each probe to identify the subset of probes that diagnostically detected homoeolog-specific expression. This was done by filtering for only those probes in which the log ratio for A2 was significantly (FDR ≤ 0.05; see ) and appreciably greater (fold change of at least 1.5) than the 1:1 mix of A2 and D5, and the 1:1 mix log ratio was significantly and appreciably greater than D5 (A2 > 1:1 mix > D5). The resulting, empirically identified, probe sets can diagnose homoeolog-specific expression levels within transcriptionally mixed A- and D-genome hybrid and allopolyploid RNA samples.
Following the identification of all diagnostic probes, contig-level log ratio values were determined by calculating a robust average of the log ratio values from all diagnostic probe sets within a contig using Tukey's Biweight method. A linear model including effects for replication and genotype was fit to this contig-level data, allowing the estimation of all possible contrasts between A2, D5, 1:1 mix, AD1, and the F1 hybrid. The contrasts between the AD1 and F1 samples and the 1:1 mix allow us to diagnose change relative to the in vitro mid-parent value of the A2 and D5 diploids. In addition, these contrasts account for the specific hybridization kinetics of each probe, when faced with a genomically mixed transcript pool. This is useful, as it can factor out non-linear competitive interactions that may occur as a result of the interaction between A- and D-genome transcripts.
Given the distributions of p-values from AD1 versus 1:1 mix and F1 versus 1:1 mix contrasts, we estimated the expected number of true null hypotheses, using the procedure of Nettleton et al . It was determined that approximately 495 and 884 genes were true nulls, and thus not statistically different in mean log ratio from the 1:1 mix, for AD1 and F1, respectively. Using these estimates from the AD1 versus 1:1 mix and F1 versus 1:1 mix contrasts, we selected an FDR threshold for significance  of 0.15 to strike a reasonable balance between the expected number of false positives and false negatives. FDR significance thresholds of 0.05 and 0.10 were examined as well and can be found in Table S1A, B in Additional file 1.
Using the A2 and D5 diploids as a reference measure of pure A- or D-genome expression gives us the ability to discover cases of genome-specific gene silencing in both AD1 and F1. These putative cases of silencing can be detected as log ratio values that are greater than or equal to the A2 diploid parent or less than or equal to the D5 diploid parent. Using this definition of silencing, we were able to detect gene silencing in both the AD1 and F1 accessions.
Validation of our microarray results was performed for 13 randomly selected homoeologous gene pairs using Sequenom quantitative mass-spectrometry following the methods of Stupar and Springer . Aliquots of RNA transcripts used for microarray hybridizations were analyzed for A- and D-genome contributions to the transcriptome for AD1 and F1 samples (the validation design can be found in Additional file 1). Briefly, the Sequenom technology amplifies A- and D-derived cDNA transcripts in parallel, and then quantifies relative homoeolog abundance based on matrix-assisted laser desorption/ionization time-of-flight mass-spectrometry. All Sequenom assays were conducted at the University of Minnesota Genotyping Facility.
LF, JU, and JW designed the research. JU and JW designed the microarray platform. LF and JU performed the research. LF and DN analyzed the data. LF and JW drafted the manuscript. All authors participated in editing the manuscript and approved the final version.
This file includes additional information about microarray construction and the design of the Sequenom validation experiment. In addition, there are two tables (Table S1A and B) detailing the results of alternative q-value thresholds and two figures (Figures S1 and S2), including a principal component analysis of all expression data and the results of the Sequenom mass spectrometry validation experiment.
We thank J Stewart and D Stelly for generation and contribution of the F1 hybrid used in this study. B Stupar and N Springer offered invaluable advice and technical support regarding the Sequenom microarray platform. This research was supported by a grant from the National Science Foundation (to JW) and a grant from the US Department of Agriculture (to JW and JU), and by an Iowa State University Plant Sciences Institute Fellowship (to LF).