Plants have been colorfully labeled the “big kahuna of polyploidization” (Sémon and Wolfe,
2007). The lineages leading to the two preeminent models for plant genetics –
Arabidopsis (a eudicot) and maize (a monocot) – each show evidence of multiple independent whole genome duplications (Figure ) since monocots and eudicots diverged approximately 120 million years ago (Soltis et al.,
2009). Recent evidence suggests at least two additional, shared, whole genome duplications prior to the monocot/eudicot split (Jiao et al.,
2011). The cumulative ploidy numbers relative to a pre-seed plant ancestor are listed in parentheses in Figure . Whole genome duplication creates duplicate, potentially redundant, copies of all the genes within a genome. The loss of these duplicate copies from the genomes of ancient polyploid species is known as fractionation (Langham et al.,
2004) and – over evolutionary time scales – the majority of genes duplicated by polyploidy will be reduced back to a single copy. If fractionation did not occur, an ancestral genome of 10,000 genes would grow to an unrealistically large 640,000 genes in maize, and 1.44 million genes in
Brassica rapa.
Some classes of genes, particularly those encoding organelle, preferentially revert to single copy status following whole genome duplications (Duarte et al.,
2010). However, other classes of genes – such as subunits of large multiprotein complexes, transcription factors, and signal transduction machinery tend to resist fractionation following whole genome duplication (Blanc and Wolfe,
2004; Seoighe and Gehring,
2004; Maere et al.,
2005). This observation has been explained by the Gene Dosage Hypothesis (Birchler and Veitia,
2007) which predicts that fractionation of genes encoding proteins involved in dose–sensitive interactions will be selected against, as the loss of either gene copy is expected to throw the dosage of that gene pair’s product out of balance with its interaction partners, partners that also tend to remain duplicated. The topic of the influence of gene dosage-constraints on post-tetraploidy genome evolution has been well-reviewed (Sémon and Wolfe,
2007; Edger and Pires,
2009; Freeling,
2009; Birchler and Veitia,
2010). A previous study of multiple sequential tetraploidies in the
Arabidopsis lineage found a general tendency for genes retained following one tetraploidy to also be retained following a second one (Seoighe and Gehring,
2004).
Since the divergence of the
Arabidopsis and grape lineages,
Arabidopsis has experienced two additional rounds of whole genome duplication. The rate of duplicate gene retention for transcription factors after single polyploidies have been observed to be approximately 25% (Blanc and Wolfe,
2004; Seoighe and Gehring,
2004). If no mitigation of gene dosage occurred, our expectation after two rounds of whole genome duplication is that
Arabidopsis should contain approximately 156% as many transcription factor encoding genes as grape. However, a detailed annotation of transcription factors using conserved protein domains found the number of transcription factors in the
Arabidopsis genome is only 25.4% greater than the number found in grape (Lang et al.,
2010). The fitness cost of changes in relative gene dosage must, to some extent, be mitigated over multiple whole genome duplications or the genomes of plants would long ago have become over-burdened with genes encoding life’s most complicated machines.
This paper provides evidence that duplicate genes do not equally maintain their progenitor’s preference for duplicate gene retention. Duplicate genes produced by whole genome duplication are not equivalent. Parental genomes originating from different species within a polyploid almost immediately differentiate into dominant and non-dominant subgenomes (Chang et al.,
2010), and these expression differences are preserved for millions of years (Flagel and Wendel,
2010; Schnable et al.,
2011a). Bias in gene loss between duplicate regions (fractionation bias) has been observed in
Arabidopsis (Thomas et al.,
2006) and maize (Woodhouse et al.,
2010) and seems to be a general rule for whole genome duplications ranging from paramecium to fish (Sankoff et al.,
2010). Bias in fractionation and genome dominance are linked because it is expected that genes on the underexpressed, non-dominant subgenome simply matter less to purifying selection and dosage-constraints (Schnable et al.,
2011a). In maize, genes with known mutant phenotypes are indeed preferentially found on the dominant subgenome (Schnable and Freeling,
2011). As bias in expression predicts which subgenome will experience more fractionation following polyploidy, either subgenome identity or the expression patterns of individual gene pairs may also predict which copy of a duplicate gene pair will be more prone to duplicate gene retention in future polyploidies.
We addressed the issue of mitigation of gene dosage-constraints with two experimental systems, the grasses, and the crucifers. Both clades have roughly parallel histories of polyploidy among species with sequenced genomes (Figure ; Table ). Both grasses and crucifers contain a more ancient whole genome duplication which is shared by all sequenced species in the clade (Bowers et al.,
2003; Paterson et al.,
2004) and in both clades one well studied species with a sequenced genome has experienced a second subsequent whole genome duplication – maize in the grasses (Gaut and Doebley,
1997) and
B. rapa in the crucifers (Lysak et al.,
2005). In both cases any duplicate genes retained from the older clade-wide polyploidy did not retain additional duplicate copies in the subsequent lineage-specific polyploidy. Therefore we were able to carry out parallel experiments to identify characteristics associated with preferential retention. It was possible to control, to some extent for the effect of protein function, by focusing on pairs of duplicate genes retained in the clade-wide polyploidy which had different fates in the subsequent lineage-specific polyploidy. A model is proposed to explain how the duplicate copies of dose–sensitive genes escape preferential retention in later polyploidies.