Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome.
Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella.
When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution.
Shoot branching is an important determinant of plant architecture and influences various aspects of growth and development. Selection on branching has also played an important role in the domestication of crop plants, including sunflower (Helianthus annuus L.). Here, we describe an investigation of the genetic basis of variation in branching in sunflower via association mapping in a diverse collection of cultivated sunflower lines.
Detailed phenotypic analyses revealed extensive variation in the extent and type of branching within the focal population. After correcting for population structure and kinship, association analyses were performed using a genome-wide collection of SNPs to identify genomic regions that influence a variety of branching-related traits. This work resulted in the identification of multiple previously unidentified genomic regions that contribute to variation in branching. Genomic regions that were associated with apical and mid-apical branching were generally distinct from those associated with basal and mid-basal branching. Homologs of known branching genes from other study systems (i.e., Arabidopsis, rice, pea, and petunia) were also identified from the draft assembly of the sunflower genome and their map positions were compared to those of associations identified herein. Numerous candidate branching genes were found to map in close proximity to significant branching associations.
In sunflower, variation in branching is genetically complex and overall branching patterns (i.e., apical vs. basal) were found to be influenced by distinct genomic regions. Moreover, numerous candidate branching genes mapped in close proximity to significant branching associations. Although the sunflower genome exhibits localized islands of elevated linkage disequilibrium (LD), these non-random associations are known to decay rapidly elsewhere. The subset of candidate genes that co-localized with significant associations in regions of low LD represents the most promising target for future functional analyses.
Electronic supplementary material
The online version of this article (doi:10.1186/s12870-015-0458-9) contains supplementary material, which is available to authorized users.
Apical dominance; Association mapping; Branching; Helianthus annuus; Linkage disequilibrium; Plant architecture; Sunflower
Lotus is a diploid plant with agricultural, medicinal, and ecological significance. Genetic linkage maps are fundamental resources for genome and genetic study, and also provide molecular markers for breeding in agriculturally important species. Genotyping by sequencing revolutionized genetic mapping, the restriction-site associated DNA sequencing (RADseq) allowed rapid discovery of thousands of SNPs markers, and a crucial aspect of the sequence based mapping strategy is the reference sequences used for marker identification.
We assessed the effectiveness of linkage mapping using three types of references for scoring markers: the unmasked genome, repeat masked genome, and gene models. Overall, the repeat masked genome produced the optimal genetic maps. A high-density genetic map of American lotus was constructed using an F1 population derived from a cross between Nelumbo nucifera ‘China Antique’ and N. lutea ‘AL1’. A total of 4,098 RADseq markers were used to construct the American lotus ‘AL1’ genetic map, and 147 markers were used to construct the Chinese lotus ‘China Antique’ genetic map. The American lotus map has 9 linkage groups, and spans 494.3 cM, with an average distance of 0.7 cM between adjacent markers. The American lotus map was used to anchor scaffold sequences in the N. nucifera ‘China Antique’ draft genome. 3,603 RADseq markers anchored 234 individual scaffold sequences into 9 megascaffolds spanning 67% of the 804 Mb draft genome.
Among the unmasked genome, repeat masked genome and gene models, the optimal reference sequences to call RADseq markers for map construction is repeat masked genome. This high density genetic map is a valuable resource for genomic research and crop improvement in lotus.
Chinese lotus; Genome assembly; Genotyping by sequencing; Restriction associated sequencing; Megascaffold
Safflower (Carthamus tinctorius L.) is an oilseed crop in the Compositae (a.k.a. Asteraceae) that is valued for its oils rich in unsaturated fatty acids. Here, we present an analysis of the genetic architecture of safflower domestication and compare our findings to those from sunflower (Helianthus annuus L.), an independently domesticated oilseed crop within the same family.
We mapped quantitative trait loci (QTL) underlying 24 domestication-related traits in progeny from a cross between safflower and its wild progenitor, Carthamus palaestinus Eig. Also, we compared QTL positions in safflower against those that have been previously identified in cultivated x wild sunflower crosses to identify instances of colocalization.
We mapped 61 QTL, the vast majority of which (59) exhibited minor or moderate phenotypic effects. The two large-effect QTL corresponded to one each for flower color and leaf spininess. A total of 14 safflower QTL colocalized with previously reported sunflower QTL for the same traits. Of these, QTL for three traits (days to flower, achene length, and number of selfed seed) had cultivar alleles that conferred effects in the same direction in both species.
As has been observed in sunflower, and unlike many other crops, our results suggest that the genetics of safflower domestication is quite complex. Moreover, our comparative mapping results indicate that safflower and sunflower exhibit numerous instances of QTL colocalization, suggesting that parallel trait transitions during domestication may have been driven, at least in part, by parallel genotypic evolution at some of the same underlying genes.
Carthamus; Domestication; Comparative genetic mapping; Helianthus; Parallel evolution; QTL analysis; Safflower; Sunflower
Sacred lotus is a basal eudicot with agricultural, medicinal, cultural and religious importance. It was domesticated in Asia about 7,000 years ago, and cultivated for its rhizomes and seeds as a food crop. It is particularly noted for its 1,300-year seed longevity and exceptional water repellency, known as the lotus effect. The latter property is due to the nanoscopic closely packed protuberances of its self-cleaning leaf surface, which have been adapted for the manufacture of a self-cleaning industrial paint, Lotusan.
The genome of the China Antique variety of the sacred lotus was sequenced with Illumina and 454 technologies, at respective depths of 101× and 5.2×. The final assembly has a contig N50 of 38.8 kbp and a scaffold N50 of 3.4 Mbp, and covers 86.5% of the estimated 929 Mbp total genome size. The genome notably lacks the paleo-triplication observed in other eudicots, but reveals a lineage-specific duplication. The genome has evidence of slow evolution, with a 30% slower nucleotide mutation rate than observed in grape. Comparisons of the available sequenced genomes suggest a minimum gene set for vascular plants of 4,223 genes. Strikingly, the sacred lotus has 16 COG2132 multi-copper oxidase family proteins with root-specific expression; these are involved in root meristem phosphate starvation, reflecting adaptation to limited nutrient availability in an aquatic environment.
The slow nucleotide substitution rate makes the sacred lotus a better resource than the current standard, grape, for reconstructing the pan-eudicot genome, and should therefore accelerate comparative analysis between eudicots and monocots.
The combination of large-scale population genomic analyses and trait-based mapping approaches has the potential to provide novel insights into the evolutionary history and genome organization of crop plants. Here, we describe the detailed genotypic and phenotypic analysis of a sunflower (Helianthus annuus L.) association mapping population that captures nearly 90% of the allelic diversity present within the cultivated sunflower germplasm collection. We used these data to characterize overall patterns of genomic diversity and to perform association analyses on plant architecture (i.e., branching) and flowering time, successfully identifying numerous associations underlying these agronomically and evolutionarily important traits. Overall, we found variable levels of linkage disequilibrium (LD) across the genome. In general, islands of elevated LD correspond to genomic regions underlying traits that are known to have been targeted by selection during the evolution of cultivated sunflower. In many cases, these regions also showed significantly elevated levels of differentiation between the two major sunflower breeding groups, consistent with the occurrence of divergence due to strong selection. One of these regions, which harbors a major branching locus, spans a surprisingly long genetic interval (ca. 25 cM), indicating the occurrence of an extended selective sweep in an otherwise recombinogenic interval.
Selection during the evolution of crop plants has resulted in dramatic phenotypic differentiation, and these same selective pressures are expected to have had a significant impact on underlying genomic diversity. Population genomic analyses, especially when coupled with trait-based mapping approaches, thus have the potential to provide unique insights into the evolution of crop plants and their genomes. In this study, we performed a genome-wide analysis of genetic variation in cultivated sunflower and used the resulting data to genetically dissect variation in plant architecture (i.e., branching) and flowering time. We found substantial variation in levels of linkage disequilibrium (LD) across the genome, with islands of elevated LD generally corresponding to genomic regions underlying traits that have been targeted by selection during the evolution of cultivated sunflower. A number of these same regions also exhibited strong population genetic differentiation across the sunflower gene pool, suggesting that they may harbor genes underlying adaptation following domestication. Our analyses also identified numerous genomic regions underlying variation in both plant architecture and flowering time, many of which fall in genomic regions that have not previously been shown to influence these traits using more traditional quantitative genetic approaches.
The development of ultra-dense genetic maps has the potential to facilitate detailed comparative genomic analyses and whole genome sequence assemblies. Here we describe the use of a custom Affymetrix GeneChip containing nearly 2.4 million features (25 bp sequences) targeting 86,023 unigenes from sunflower (Helianthus annuus L.) and related species to test for single-feature polymorphisms (SFPs) in a recombinant inbred line (RIL) mapping population derived from a cross between confectionery and oilseed sunflower lines (RHA280×RHA801). We then employed an existing genetic map derived from this same population to rigorously filter out low quality data and place 67,486 features corresponding to 22,481 unigenes on the sunflower genetic map. The resulting map contains a substantial fraction of all sunflower genes and will thus facilitate a number of downstream applications, including genome assembly and the identification of candidate genes underlying QTL or traits of interest.
Cultivated peanut or groundnut (Arachis hypogaea L.) is an important oilseed crop with an allotetraploid genome (AABB, 2n = 4x = 40). Both the low level of genetic variation within the cultivated gene pool and its polyploid nature limit the utilization of molecular markers to explore genome structure and facilitate genetic improvement. Nevertheless, a wealth of genetic diversity exists in diploid Arachis species (2n = 2x = 20), which represent a valuable gene pool for cultivated peanut improvement. Interspecific populations have been used widely for genetic mapping in diploid species of Arachis. However, an intraspecific mapping strategy was essential to detect chromosomal rearrangements among species that could be obscured by mapping in interspecific populations. To develop intraspecific reference linkage maps and gain insights into karyotypic evolution within the genus, we comparatively mapped the A- and B-genome diploid species using intraspecific F2 populations. Exploring genome organization among diploid peanut species by comparative mapping will enhance our understanding of the cultivated tetraploid peanut genome. Moreover, new sources of molecular markers that are highly transferable between species and developed from expressed genes will be required to construct saturated genetic maps for peanut.
A total of 2,138 EST-SSR (expressed sequence tag-simple sequence repeat) markers were developed by mining a tetraploid peanut EST assembly including 101,132 unigenes (37,916 contigs and 63,216 singletons) derived from 70,771 long-read (Sanger) and 270,957 short-read (454) sequences. A set of 97 SSR markers were also developed by mining 9,517 genomic survey sequences of Arachis. An SSR-based intraspecific linkage map was constructed using an F2 population derived from a cross between K 9484 (PI 298639) and GKBSPSc 30081 (PI 468327) in the B-genome species A. batizocoi. A high degree of macrosynteny was observed when comparing the homoeologous linkage groups between A (A. duranensis) and B (A. batizocoi) genomes. Comparison of the A- and B-genome genetic linkage maps also showed a total of five inversions and one major reciprocal translocation between two pairs of chromosomes under our current mapping resolution.
Our findings will contribute to understanding tetraploid peanut genome origin and evolution and eventually promote its genetic improvement. The newly developed EST-SSR markers will enrich current molecular marker resources in peanut.
Peanut (Arachis hypogaea); SSR; Genetic linkage map; Intraspecific cross; EST
Cultivated peanut (Arachis hypogaea) is an allotetraploid species whose ancestral genomes are most likely derived from the A-genome species, A. duranensis, and the B-genome species, A. ipaensis. The very recent (several millennia) evolutionary origin of A. hypogaea has imposed a bottleneck for allelic and phenotypic diversity within the cultigen. However, wild diploid relatives are a rich source of alleles that could be used for crop improvement and their simpler genomes can be more easily analyzed while providing insight into the structure of the allotetraploid peanut genome. The objective of this research was to establish a high-density genetic map of the diploid species A. duranensis based on de novo generated EST databases. Arachis duranensis was chosen for mapping because it is the A-genome progenitor of cultivated peanut and also in order to circumvent the confounding effects of gene duplication associated with allopolyploidy in A. hypogaea.
More than one million expressed sequence tag (EST) sequences generated from normalized cDNA libraries of A. duranensis were assembled into 81,116 unique transcripts. Mining this dataset, 1236 EST-SNP markers were developed between two A. duranensis accessions, PI 475887 and Grif 15036. An additional 300 SNP markers also were developed from genomic sequences representing conserved legume orthologs. Of the 1536 SNP markers, 1054 were placed on a genetic map. In addition, 598 EST-SSR markers identified in A. hypogaea assemblies were included in the map along with 37 disease resistance gene candidate (RGC) and 35 other previously published markers. In total, 1724 markers spanning 1081.3 cM over 10 linkage groups were mapped. Gene sequences that provided mapped markers were annotated using similarity searches in three different databases, and gene ontology descriptions were determined using the Medicago Gene Atlas and TAIR databases. Synteny analysis between A. duranensis, Medicago and Glycine revealed significant stretches of conserved gene clusters spread across the peanut genome. A higher level of colinearity was detected between A. duranensis and Glycine than with Medicago.
The first high-density, gene-based linkage map for A. duranensis was generated that can serve as a reference map for both wild and cultivated Arachis species. The markers developed here are valuable resources for the peanut, and more broadly, to the legume research community. The A-genome map will have utility for fine mapping in other peanut species and has already had application for mapping a nematode resistance gene that was introgressed into A. hypogaea from A. cardenasii.
Genetic linkage maps have the potential to facilitate the genetic dissection of complex traits and comparative analyses of genome structure, as well as molecular breeding efforts in species of agronomic importance. Until recently, the majority of such maps was based on relatively low-throughput marker technologies, which limited marker density across the genome. The availability of high-throughput genotyping technologies has, however, made possible the efficient development of high-density genetic maps. Here, we describe the analysis and integration of genotypic data from four sunflower (Helianthus annuus L.) mapping populations to produce a consensus linkage map of the sunflower genome. Although the individual maps (which contained 3500–5500 loci each) were highly colinear, we observed localized variation in recombination rates in several genomic regions. We also observed several gaps up to 26 cM in length that completely lacked mappable markers in individual crosses, presumably due to regions of identity by descent in the mapping parents. Because these regions differed by cross, the consensus map of 10,080 loci contained no such gaps, clearly illustrating the value of simultaneously analyzing multiple mapping populations.
Although it is agreed that a major polyploidy event, gamma, occurred within the eudicots, the phylogenetic placement of the event remains unclear.
To determine when this polyploidization occurred relative to speciation events in angiosperm history, we employed a phylogenomic approach to investigate the timing of gene set duplications located on syntenic gamma blocks. We populated 769 putative gene families with large sets of homologs obtained from public transcriptomes of basal angiosperms, magnoliids, asterids, and more than 91.8 gigabases of new next-generation transcriptome sequences of non-grass monocots and basal eudicots. The overwhelming majority (95%) of well-resolved gamma duplications was placed before the separation of rosids and asterids and after the split of monocots and eudicots, providing strong evidence that the gamma polyploidy event occurred early in eudicot evolution. Further, the majority of gene duplications was placed after the divergence of the Ranunculales and core eudicots, indicating that the gamma appears to be restricted to core eudicots. Molecular dating estimates indicate that the duplication events were intensely concentrated around 117 million years ago.
The rapid radiation of core eudicot lineages that gave rise to nearly 75% of angiosperm species appears to have occurred coincidentally or shortly following the gamma triplication event. Reconciliation of gene trees with a species phylogeny can elucidate the timing of major events in genome evolution, even when genome sequences are only available for a subset of species represented in the gene trees. Comprehensive transcriptome datasets are valuable complements to genome sequences for high-resolution phylogenomic analysis.
Recent advances in next-generation DNA sequencing technologies have made possible the development of high-throughput SNP genotyping platforms that allow for the simultaneous interrogation of thousands of single-nucleotide polymorphisms (SNPs). Such resources have the potential to facilitate the rapid development of high-density genetic maps, and to enable genome-wide association studies as well as molecular breeding approaches in a variety of taxa. Herein, we describe the development of a SNP genotyping resource for use in sunflower (Helianthus annuus L.). This work involved the development of a reference transcriptome assembly for sunflower, the discovery of thousands of high quality SNPs based on the generation and analysis of ca. 6 Gb of transcriptome re-sequencing data derived from multiple genotypes, the selection of 10,640 SNPs for inclusion in the genotyping array, and the use of the resulting array to screen a diverse panel of sunflower accessions as well as related wild species. The results of this work revealed a high frequency of polymorphic SNPs and relatively high level of cross-species transferability. Indeed, greater than 95% of successful SNP assays revealed polymorphism, and more than 90% of these assays could be successfully transferred to related wild species. Analysis of the polymorphism data revealed patterns of genetic differentiation that were largely congruent with the evolutionary history of sunflower, though the large number of markers allowed for finer resolution than has previously been possible.
Evolution of the Brassica species has been recursively affected by polyploidy events, and comparison to their relative, Arabidopsis thaliana, provides means to explore their genomic complexity.
A genome-wide physical map of a rapid-cycling strain of B. oleracea was constructed by integrating high-information-content fingerprinting (HICF) of Bacterial Artificial Chromosome (BAC) clones with hybridization to sequence-tagged probes. Using 2907 contigs of two or more BACs, we performed several lines of comparative genomic analysis. Interspecific DNA synteny is much better preserved in euchromatin than heterochromatin, showing the qualitative difference in evolution of these respective genomic domains. About 67% of contigs can be aligned to the Arabidopsis genome, with 96.5% corresponding to euchromatic regions, and 3.5% (shown to contain repetitive sequences) to pericentromeric regions. Overgo probe hybridization data showed that contigs aligned to Arabidopsis euchromatin contain ~80% of low-copy-number genes, while genes with high copy number are much more frequently associated with pericentromeric regions. We identified 39 interchromosomal breakpoints during the diversification of B. oleracea and Arabidopsis thaliana, a relatively high level of genomic change since their divergence. Comparison of the B. oleracea physical map with Arabidopsis and other available eudicot genomes showed appreciable 'shadowing' produced by more ancient polyploidies, resulting in a web of relatedness among contigs which increased genomic complexity.
A high-resolution genetically-anchored physical map sheds light on Brassica genome organization and advances positional cloning of specific genes, and may help to validate genome sequence assembly and alignment to chromosomes.
All the physical mapping data is freely shared at a WebFPC site (http://lulu.pgml.uga.edu/fpc/WebAGCoL/brassica/WebFPC/; Temporarily password-protected: account: pgml; password: 123qwe123.
Comparative genomics; polyploidy; Arabidopsis thaliana
In the eight years since phylogenomics was introduced as the intersection of genomics and phylogenetics, the field has provided fundamental insights into gene function, genome history and organismal relationships. The utility of phylogenomics is growing with the increase in the number and diversity of taxa for which whole genome and large transcriptome sequence sets are being generated. We assert that the synergy between genomic and phylogenetic perspectives in comparative biology would be enhanced by the development and refinement of minimal reporting standards for phylogenetic analyses. Encouraged by the development of the Minimum Information About a Microarray Experiment (MIAME) standard, we propose a similar roadmap for the development of a Minimal Information About a Phylogenetic Analysis (MIAPA) standard. Key in the successful development and implementation of such a standard will be broad participation by developers of phylogenetic analysis software, phylogenetic database developers, practitioners of phylogenomics, and journal editors.
Genetically anchored physical maps of large eukaryotic genomes have proven useful both for their intrinsic merit and as an adjunct to genome sequencing. Cultivated tetraploid cottons, Gossypium hirsutum and G. barbadense, share a common ancestor formed by a merger of the A and D genomes about 1-2 million years ago. Toward the long-term goal of characterizing the spectrum of diversity among cotton genomes, the worldwide cotton community has prioritized the D genome progenitor Gossypium raimondii for complete sequencing.
A whole genome physical map of G. raimondii, the putative D genome ancestral species of tetraploid cottons was assembled, integrating genetically-anchored overgo hybridization probes, agarose based fingerprints and 'high information content fingerprinting' (HICF). A total of 13,662 BAC-end sequences and 2,828 DNA probes were used in genetically anchoring 1585 contigs to a cotton consensus genetic map, and 370 and 438 contigs, respectively to Arabidopsis thaliana (AT) and Vitis vinifera (VV) whole genome sequences.
Several lines of evidence suggest that the G. raimondii genome is comprised of two qualitatively different components. Much of the gene rich component is aligned to the Arabidopsis and Vitis vinifera genomes and shows promise for utilizing translational genomic approaches in understanding this important genome and its resident genes. The integrated genetic-physical map is of value both in assembling and validating a planned reference sequence.
Papaya, a fruit crop cultivated in tropical and subtropical regions, is known for its nutritional benefits and medicinal applications. Here we report a 3× draft genome sequence of ‘SunUp’ papaya, the first commercial virus-resistant transgenic fruit tree1 to be sequenced. The papaya genome is three times the size of the Arabidopsis genome, but contains fewer genes, including significantly fewer disease-resistance gene analogues. Comparison of the five sequenced genomes suggests a minimal angiosperm gene set of 13,311. A lack of recent genome duplication, atypical of other angiosperm genomes sequenced so far2–5, may account for the smaller papaya gene number in most functional groups. Nonetheless, striking amplifications in gene number within particular functional groups suggest roles in the evolution of tree-like habit, deposition and remobilization of starch reserves, attraction of seed dispersal agents, and adaptation to tropical daylengths. Transgenesis at three locations is closely associated with chloroplast insertions into the nuclear genome, and with topoisomerase I recognition sites. Papaya offers numerous advantages as a system for fruit-tree functional genomics, and this draft genome sequence provides the foundation for revealing the basis of Carica's distinguishing morpho-physiological, medicinal and nutritional properties.
Papaya is a major fruit crop in tropical and subtropical regions worldwide and has primitive sex chromosomes controlling sex determination in this trioecious species. The papaya genome was recently sequenced because of its agricultural importance, unique biological features, and successful application of transgenic papaya for resistance to papaya ringspot virus. As a part of the genome sequencing project, we constructed a BAC-based physical map using a high information-content fingerprinting approach to assist whole genome shotgun sequence assembly.
The physical map consists of 963 contigs, representing 9.4× genome equivalents, and was integrated with the genetic map and genome sequence using BAC end sequences and a sequence-tagged high-density genetic map. The estimated genome coverage of the physical map is about 95.8%, while 72.4% of the genome was aligned to the genetic map. A total of 1,181 high quality overgo (overlapping oligonucleotide) probes representing conserved sequences in Arabidopsis and genetically mapped loci in Brassica were anchored on the physical map, which provides a foundation for comparative genomics in the Brassicales. The integrated genetic and physical map aligned with the genome sequence revealed recombination hotspots as well as regions suppressed for recombination across the genome, particularly on the recently evolved sex chromosomes. Suppression of recombination spread to the adjacent region of the male specific region of the Y chromosome (MSY), and recombination rates were recovered gradually and then exceeded the genome average. Recombination hotspots were observed at about 10 Mb away on both sides of the MSY, showing 7-fold increase compared with the genome wide average, demonstrating the dynamics of recombination of the sex chromosomes.
A BAC-based physical map of papaya was constructed and integrated with the genetic map and genome sequence. The integrated map facilitated the draft genome assembly, and is a valuable resource for comparative genomics and map-based cloning of agronomically and economically important genes and for sex chromosome research.
Comparison of the sorghum, maize and rice genomes shows that gene duplication and functional innovation is common to evolution of most but not all genes in the C4 photosynthetic pathway
Sorghum is the first C4 plant and the second grass with a full genome sequence available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution by comparing key photosynthetic enzyme genes in sorghum, maize (C4) and rice (C3), and to investigate a long-standing hypothesis that a reservoir of duplicated genes is a prerequisite for the evolution of C4 photosynthesis from a C3 progenitor.
We show that both whole-genome and individual gene duplication have contributed to the evolution of C4 photosynthesis. The C4 gene isoforms show differential duplicability, with some C4 genes being recruited from whole genome duplication duplicates by multiple modes of functional innovation. The sorghum and maize carbonic anhydrase genes display a novel mode of new gene formation, with recursive tandem duplication and gene fusion accompanied by adaptive evolution to produce C4 genes with one to three functional units. Other C4 enzymes in sorghum and maize also show evidence of adaptive evolution, though differing in level and mode. Intriguingly, a phosphoenolpyruvate carboxylase gene in the C3 plant rice has also been evolving rapidly and shows evidence of adaptive evolution, although lacking key mutations that are characteristic of C4 metabolism. We also found evidence that both gene redundancy and alternative splicing may have sheltered the evolution of new function.
Gene duplication followed by functional innovation is common to evolution of most but not all C4 genes. The apparently long time-lag between the availability of duplicates for recruitment into C4 and the appearance of C4 grasses, together with the heterogeneity of origins of C4 genes, suggests that there may have been a long transition process before the establishment of C4 photosynthesis.
Maize (Zea mays L.) is one of the most important cereal crops and a model for the study of genetics, evolution, and domestication. To better understand maize genome organization and to build a framework for genome sequencing, we constructed a sequence-ready fingerprinted contig-based physical map that covers 93.5% of the genome, of which 86.1% is aligned to the genetic map. The fingerprinted contig map contains 25,908 genic markers that enabled us to align nearly 73% of the anchored maize genome to the rice genome. The distribution pattern of expressed sequence tags correlates to that of recombination. In collinear regions, 1 kb in rice corresponds to an average of 3.2 kb in maize, yet maize has a 6-fold genome size expansion. This can be explained by the fact that most rice regions correspond to two regions in maize as a result of its recent polyploid origin. Inversions account for the majority of chromosome structural variations during subsequent maize diploidization. We also find clear evidence of ancient genome duplication predating the divergence of the progenitors of maize and rice. Reconstructing the paleoethnobotany of the maize genome indicates that the progenitors of modern maize contained ten chromosomes.
As a cash crop and a model biological system, maize is of great public interest. To facilitate maize molecular breeding and its basic biology research, we built a high-resolution physical map with two different fingerprinting methods on the same set of bacterial artificial chromosome clones. The physical map was integrated to a high-density genetic map and further serves as a framework for the maize genome-sequencing project. Comparative genomics showed that the euchromatic regions between rice and maize are very conserved. Physically we delimited these conserved regions and thus detected many genome rearrangements. We defined extensively the duplication blocks within the maize genome. These blocks allowed us to reconstruct the chromosomes of the maize progenitor. We detected that maize genome has experienced two rounds of genome duplications, an ancient one before maize–rice divergence and a recent one after tetraploidization.