Whole-genome duplications (WGDs) have occurred in the lineages of plants
[1], animals
[2],
[3] and fungi
[4],
[5], with possible consequences including evolution of novel or modified gene functions
[6],
[7],
[8],
[9], and/or provision of “buffer capacity”
[10],
[11] or genetic redundancy that increases genetic robustness
[12],
[13],
[14],
[15],
[16],
[17]. Genome duplication may also increase opportunities for nonreciprocal recombination
[18],
[19],
[20], permitting or causing duplicated genes to evolve in concert for a period of time. Rapid DNA loss and restructuring of low-copy DNA
[21],
[22],
[23],
[24], retrotransposon activation
[25],
[26],
[27] and epigenetic changes
[28],
[29],
[30],
[31],
[32],
[33] following WGD may further provide materials for evolutionary change.
Genes may be duplicated by several mechanisms in addition to WGDs, which have been collectively referred to as small scale duplications
[34] or single gene duplications
[35],
[36]. Tandem duplicates are consecutive in the genome while proximal duplicates are near one another but separated by a few genes. These two gene duplication modes are presumed to arise through unequal crossing over
[36] or localized transposon activities
[37]. Dispersed duplicates are neither adjacent to each other in the genome nor within homeologous chromosome segments
[38]. Distant single gene transposition may explain the widespread existence of dispersed duplicates within and among genomes
[36]. Distant single gene transposition duplication (referred to as distantly transposed duplication) may occur by DNA based or RNA based mechanisms
[35]. DNA transposons such as packmules (rice)
[39], helitrons (maize)
[40], and CACTA elements (sorghum)
[27] may relocate duplicated genes or gene segments to new chromosomal positions (referred to as DNA based transposed duplication). RNA based transposed duplication, often referred to as retrotransposition, typically creates a single-exon retrocopy from a multi-exon parental gene, by reverse transcription of a spliced messenger RNA. It is presumed that the retrocopy duplicates only the transcribed sequence of the parental gene, detached from the parental promoter. The new retrogene is often deposited in a novel chromosomal environment with new (i.e. non-ancestral) neighboring genes and, having lost its native promoter, is only likely to survive as a functional gene if a new promoter is acquired
[41],
[42].
Classical population genetic theory suggests that a likely consequence of gene duplication is reversion to single copy (singleton), unless at least one gene copy evolves new function
[8]. More recently, the subfunctionalization model, which proposes that duplicated gene copies might both be retained if they partition the functions of the ancestral gene between them, has described an important modification of the classical model
[9],
[43]. Some studies also show evidence to support the value of genetic redundancy
per se [10],
[12],
[13],
[14],
[15],
[16],
[17],
[44],
[45] or dosage balance
[34],
[46],
[47],
[48].
The angiosperms (flowering plants) are an outstanding model in which to elucidate the consequences of gene duplication. All angiosperms are now thought to be paleopolyploids
[49], many of which underwent multiple WGDs
[50],
[51]. Traces of past WGDs can often be detected from pairwise syntenic alignments through software such as ColinearScan
[52] and multiple alignments using MCScan
[53]. Arabidopsis, selected as the first angiosperm genome to be sequenced due to its small genome size and minimal DNA sequence duplication, has experienced two ‘recent’ WGDs, i.e. since its divergence from other members of the Brassicales clade (α and β), and a more ancient triplication (γ) shared with most if not all eudicots
[49],
[51],
[53]. Likewise, rice appears to have experienced at least two WGDs, one shared with most if not all cereals (ρ), and another more ancient event (σ)
[54]. Single gene duplications in angiosperms are also widespread
[36],
[55],
[56].
One avenue for systematic investigation of functional divergence between duplicate genes is comparison of their spatiotemporal expression profiles, comparing degrees of divergence with proxies of duplication age such as synonymous substitution rates (Ks) between duplicate genes. In Arabidopsis, the rate of protein sequence evolution is asymmetric in >20% of duplicate pairs and functional diversification of surviving duplicate genes has been proposed to be a major feature of the long-term evolution of polyploids
[57]. Arabidopsis genes created by large-scale duplication events are more evolutionarily conserved in gene expression than those created by small-scale duplication or those that do not lie in duplicate segments, and the time since duplication is correlated with functional divergence of genes
[58]. Further, there may be also a strong positive correlation between expression divergence and non-synonymous mutation (Ka) in Arabidopsis, and the different modes (segmental, tandem and dispersed) of duplication may affect patterns of expression divergence
[38]. Arabidopsis duplicated genes show greater expression diversity than singleton genes across closely related species and allopolyploids
[59]. In rice, expression correlation is significantly higher for gene pairs from WGDs or tandem duplications than dispersed duplications, and expression divergence is closely related to divergence time
[60].
Though many studies have investigated the functional divergence and retention of duplicate genes, conclusions are often contradictory, e.g. gene retention has been attributed to either neofunctionalization
[6],
[7] or genetic redundancy
[12],
[13],
[14],
[15],
[16],
[17], and expression divergence between duplicate genes has been suggested to be either time dependent
[58],
[60] or selection dependent
[38]. The fates of duplicate genes may be influenced by different modes of gene duplication, which have been suggested to retain genes in a biased manner
[36]. With much richer expression and annotation data available now than for most prior studies, and improved ability to discern various mechanisms of gene duplication, we find merit in re-examining some existing hypotheses and exploring some new hypotheses regarding the consequences of gene duplication. Here, we related multiple types of genomic data to gene expression divergence in two angiosperm species, Arabidopsis and Oryza (rice), to formally test possible evolutionary patterns (hypotheses). A far richer volume of analyzed microarray data than was available in prior studies improves the robustness of statistical analyses.