Early studies with the first generation of molecular markers indicated the presence of duplicated loci on the genetic maps of various cereals, suggesting ancestral genome duplications and polyploidization events in the history of species that are now considered as diploids (1
). In rice (i) restriction fragment length polymorphism mapping performed in the nineties suggested that chromosomes 1 and 5 (2
) as well as chromosomes 11 and 12 (3
) were ancient duplicates and (ii) comparative genomics studies on the sequence level also suggested ancient polyploidy in rice (4–6
). The release of genome sequence drafts from japonica
rice subspecies allowed whole genome sequence comparisons and further characterization of duplications in rice (7–11
). The most recent analysis (11
) concluded that a whole-genome duplication event (involving 10 chromosome-to-chromosome duplication relationships) predated the divergence of cereal genomes 53–94 million years ago, while a more recent, independent duplication event between rice chromosomes 11 and 12 occurred 21 million years ago. Together, these duplications cover 65.7% of the genome. The identification of 163 or even 319 duplicated blocks in the rice genome has recently been published by Lin et al.
) and Wang et al.
), respectively. Unfortunately, many of these studies were based on low-stringency sequence alignment criteria, such as the direct use of pairwise sequence alignment information through BLAST expect or score values, and did not take into account the density and location of genes to identify precisely the structure and evolution of paralogous regions. Because it is difficult to infer paralogous relationships from sequence comparisons, expertized alignment criteria and statistical validation are required to (i) evaluate objectively and accurately whether the association between two or more genes in the same order on two chromosome segments occurs by chance or truly reflects duplications; (ii) eliminate the presence of massive background noise linked to the identification of artefactual paralogs necessary to produce a unique view of the duplicated nature of the rice genome from either 10 (11
), 163 (12
) or 319 (13
) duplicated regions.
Recently, we have reassessed the duplicated nature of the rice genome based on a combination of (i) new alignment criteria that increase analysis stringency and (ii) statistical tests to re-define interchromosomal duplications (14
). We identified 29 rice duplications covering 72% (267
Mb) of the genome with an average density of one gene every 0.8 Mb involved in the duplications. Ten of the 29 duplications were those previously reported in the literature (11
) covering 47.8% of the rice genome. The remaining 19 duplicated blocks associated with 539 paralogous gene pairs were newly identified in the study. Moreover, the identification of seven paleo-duplicated blocks (among the 29) shared with the wheat, maize and sorghum genomes allowed us to propose a model in which grass genomes have evolved from a common ancestor with a basic number of five chromosomes, by whole genome and segmental duplications, chromosome fusions and translocations.
Gene duplication generates functional redundancy followed by either pseudogenization (i.e. unexpressed or functionless paralog), concerted evolution (i.e. conservation of function for paralog), subfunctionalization (i.e. complementary function of paralog) and neofunctionalization (i.e. novel function of paralog) during the course of genome evolution. Functional divergence either by subfunctionalization or neofunctionalization among duplicated genes is one of the most important sources of evolutionary innovation in complex organisms. Recent studies suggested that a majority of duplicated genes that are structurally retained during the evolution have at least partially diverged in their function (15
). These studies were based either on (i) systematic studies of the changes in protein sequences through the estimates of synonymous (Ks) or non-synonymous (Ka) substitution per site between paralogs or (ii) the analysis of the timing, location and relative number of gene transcripts available in public expressed sequence tag (EST) databases. However, these approaches are only indirectly related to gene expression as it is generally assumed that (i) variation in substitution rate is not related to variation in the rate of expression divergence (17
) and (ii) estimation of the level, location and timing of gene expression based on the available EST are limited to the type of cDNA libraries available, and these are not representative of all the spacial and temporal conditions of plant development. Recent micro-array studies in Arabidopsis clearly demonstrated that the vast majority of duplicated genes diverged in their expression profile (18–20
). Ganko et al.
) conclude that ~70% of gene pairs show asymmetric divergence based on micro-array data. Blanc et al.
) previously concluded that 57% and 73% of the gene pairs acquired divergent expression patterns for recent and old duplication events in Arabidopsis, respectively.
In order to analyze the impact of rice genome duplications on gene structure and expression, we produced a rice expression map (eMAP), based on expertized micro-array data collected from a unique platform, and compared expression profiles of all paralogous gene pairs identified in the rice genome. This allowed us to provide new insights into the structural and functional evolution of genes after a whole genome duplication event.