Search tips
Search criteria

Results 1-22 (22)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
Document Types
author:("Wang, xiin")
1.  Comparative Genomics Analysis of Rice and Pineapple Contributes to Understand the Chromosome Number Reduction and Genomic Changes in Grasses 
Frontiers in Genetics  2016;7:174.
Rice is one of the most researched model plant, and has a genome structure most resembling that of the grass common ancestor after a grass common tetraploidization ∼100 million years ago. There has been a standing controversy whether there had been five or seven basic chromosomes, before the tetraploidization, which were tackled but could not be well solved for the lacking of a sequenced and assembled outgroup plant to have a conservative genome structure. Recently, the availability of pineapple genome, which has not been subjected to the grass-common tetraploidization, provides a precious opportunity to solve the above controversy and to research into genome changes of rice and other grasses. Here, we performed a comparative genomics analysis of pineapple and rice, and found solid evidence that grass-common ancestor had 2n = 2x = 14 basic chromosomes before the tetraploidization and duplicated to 2n = 4x = 28 after the event. Moreover, we proposed that enormous gene missing from duplicated regions in rice should be explained by an allotetraploid produced by prominently divergent parental lines, rather than gene losses after their divergence. This means that genome fractionation might have occurred before the formation of the allotetraploid grass ancestor.
PMCID: PMC5047885  PMID: 27757123
rice; pineapple; grass; chromosome; genome
2.  Large-Scale Gene Relocations following an Ancient Genome Triplication Associated with the Diversification of Core Eudicots 
PLoS ONE  2016;11(5):e0155637.
Different modes of gene duplication including whole-genome duplication (WGD), and tandem, proximal and dispersed duplications are widespread in angiosperm genomes. Small-scale, stochastic gene relocations and transposed gene duplications are widely accepted to be the primary mechanisms for the creation of dispersed duplicates. However, here we show that most surviving ancient dispersed duplicates in core eudicots originated from large-scale gene relocations within a narrow window of time following a genome triplication (γ) event that occurred in the stem lineage of core eudicots. We name these surviving ancient dispersed duplicates as relocated γ duplicates. In Arabidopsis thaliana, relocated γ, WGD and single-gene duplicates have distinct features with regard to gene functions, essentiality, and protein interactions. Relative to γ duplicates, relocated γ duplicates have higher non-synonymous substitution rates, but comparable levels of expression and regulation divergence. Thus, relocated γ duplicates should be distinguished from WGD and single-gene duplicates for evolutionary investigations. Our results suggest large-scale gene relocations following the γ event were associated with the diversification of core eudicots.
PMCID: PMC4873151  PMID: 27195960
3.  Origination, Expansion, Evolutionary Trajectory, and Expression Bias of AP2/ERF Superfamily in Brassica napus 
The AP2/ERF superfamily, one of the most important transcription factor families, plays crucial roles in response to biotic and abiotic stresses. So far, a comprehensive evolutionary inference of its origination and expansion has not been available. Here, we identified 515 AP2/ERF genes in B. napus, a neo-tetraploid forming ~7500 years ago, and found that 82.14% of them were duplicated in the tetraploidization. A prominent subgenome bias was revealed in gene expression, tissue-specific, and gene conversion. Moreover, a large-scale analysis across plants and alga suggested that this superfamily could have been originated from AP2 family, expanding to form other families (ERF, and RAV). This process was accompanied by duplicating and/or alternative deleting AP2 domain, intragenic domain sequence conversion, and/or by acquiring other domains, resulting in copy number variations, alternatively contributing to functional innovation. We found that significant positive selection occurred at certain critical nodes during the evolution of land plants, possibly responding to changing environment. In conclusion, the present research revealed origination, functional innovation, and evolutionary trajectory of the AP2/ERF superfamily, contributing to understanding their roles in plant stress tolerance.
PMCID: PMC4982375  PMID: 27570529
AP2/ERF superfamily; polyploid; positive selection; stress tolerance; RNA-seq; B. napus
4.  Promoting flowering, lateral shoot outgrowth, leaf development, and flower abscission in tobacco plants overexpressing cotton FLOWERING LOCUS T (FT)-like gene GhFT1 
FLOWERING LOCUS T (FT) encodes a mobile signal protein, recognized as major component of florigen, which has a central position in regulating flowering, and also plays important roles in various physiological aspects. A mode is recently emerging for the balance of indeterminate and determinate growth, which is controlled by the ratio of FT-like and TERMINAL FLOWER 1 (TFL1)-like gene activities, and has a strong influence on the floral transition and plant architecture. Orthologs of GhFT1 was previously isolated and characterized from Gossypium hirsutum. We demonstrated that ectopic overexpression of GhFT1 in tobacco, other than promoting flowering, promoted lateral shoot outgrowth at the base, induced more axillary bud at the axillae of rosette leaves, altered leaf morphology, increased chlorophyll content, had higher rate of photosynthesis and caused flowers abscission. Analysis of gene expression suggested that flower identity genes were significantly upregulated in transgenic plants. Further analysis of tobacco FT paralogs indicated that NtFT4, acting as flower inducer, was upregulated, whereas NtFT2 and NtFT3 as flower inhibitors were upregulated in transgenic plants under long-day conditions, but downregulated under short-day conditions. Our data suggests that sufficient level of transgenic cotton FT might disturb the balance of the endogenous tobacco FT paralogs of inducers and repressors and resulted in altered phenotype in transgenic tobacco, emphasizing the expanding roles of FT in regulating shoot architecture by advancing determine growth. Manipulating the ratio for indeterminate and determinate growth factors throughout FT-like and TFL1-like gene activity holds promise to improve plant architecture and enhance crop yield.
PMCID: PMC4469826  PMID: 26136765
florigen; FLOWERING LOCUS T (FT); floral transition; lateral shoot; leaf morphology; abscission; tobacco
5.  Comparative and Evolutionary Analysis of Major Peanut Allergen Gene Families 
Genome Biology and Evolution  2014;6(9):2468-2488.
Peanut (Arachis hypogaea L.) causes one of the most serious food allergies. Peanut seed proteins, Arah1, Arah2, and Arah3, are considered to be among the most important peanut allergens. To gain insights into genome organization and evolution of allergen-encoding genes, approximately 617 kb from the genome of cultivated peanut and 215 kb from a wild relative were sequenced including three Arah1, one Arah2, eight Arah3, and two Arah6 gene family members. To assign polarity to differences between homoeologous regions in peanut, we used as outgroups the single orthologous regions in Medicago, Lotus, common bean, chickpea, and pigeonpea, which diverged from peanut about 50 Ma and have not undergone subsequent polyploidy. These regions were also compared with orthologs in many additional dicot plant species to help clarify the timing of evolutionary events. The lack of conservation of allergenic epitopes between species, and the fact that many different proteins can be allergenic, makes the identification of allergens across species by comparative studies difficult. The peanut allergen genes are interspersed with low-copy genes and transposable elements. Phylogenetic analyses revealed lineage-specific expansion and loss of low-copy genes between species and homoeologs. Arah1 syntenic regions are conserved in soybean, pigeonpea, tomato, grape, Lotus, and Arabidopsis, whereas Arah3 syntenic regions show genome rearrangements. We infer that tandem and segmental duplications led to the establishment of the Arah3 gene family. Our analysis indicates differences in conserved motifs in allergen proteins and in the promoter regions of the allergen-encoding genes. Phylogenetic analysis and genomic organization studies provide new insights into the evolution of the major peanut allergen-encoding genes.
PMCID: PMC4202325  PMID: 25193311
Arachis hypogaea L.; allergens; gene synteny; genome organization; homologs; evolution
6.  Transcriptome and methylome profiling reveals relics of genome dominance in the mesopolyploid Brassica oleracea 
Genome Biology  2014;15(6):R77.
Brassica oleracea is a valuable vegetable species that has contributed to human health and nutrition for hundreds of years and comprises multiple distinct cultivar groups with diverse morphological and phytochemical attributes. In addition to this phenotypic wealth, B. oleracea offers unique insights into polyploid evolution, as it results from multiple ancestral polyploidy events and a final Brassiceae-specific triplication event. Further, B. oleracea represents one of the diploid genomes that formed the economically important allopolyploid oilseed, Brassica napus. A deeper understanding of B. oleracea genome architecture provides a foundation for crop improvement strategies throughout the Brassica genus.
We generate an assembly representing 75% of the predicted B. oleracea genome using a hybrid Illumina/Roche 454 approach. Two dense genetic maps are generated to anchor almost 92% of the assembled scaffolds to nine pseudo-chromosomes. Over 50,000 genes are annotated and 40% of the genome predicted to be repetitive, thus contributing to the increased genome size of B. oleracea compared to its close relative B. rapa. A snapshot of both the leaf transcriptome and methylome allows comparisons to be made across the triplicated sub-genomes, which resulted from the most recent Brassiceae-specific polyploidy event.
Differential expression of the triplicated syntelogs and cytosine methylation levels across the sub-genomes suggest residual marks of the genome dominance that led to the current genome architecture. Although cytosine methylation does not correlate with individual gene dominance, the independent methylation patterns of triplicated copies suggest epigenetic mechanisms play a role in the functional diversification of duplicate genes.
PMCID: PMC4097860  PMID: 24916971
7.  SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data 
BMC Genomics  2014;15:162.
Phylogenetic trees are widely used for genetic and evolutionary studies in various organisms. Advanced sequencing technology has dramatically enriched data available for constructing phylogenetic trees based on single nucleotide polymorphisms (SNPs). However, massive SNP data makes it difficult to perform reliable analysis, and there has been no ready-to-use pipeline to generate phylogenetic trees from these data.
We developed a new pipeline, SNPhylo, to construct phylogenetic trees based on large SNP datasets. The pipeline may enable users to construct a phylogenetic tree from three representative SNP data file formats. In addition, in order to increase reliability of a tree, the pipeline has steps such as removing low quality data and considering linkage disequilibrium. A maximum likelihood method for the inference of phylogeny is also adopted in generation of a tree in our pipeline.
Using SNPhylo, users can easily produce a reliable phylogenetic tree from a large SNP data file. Thus, this pipeline can help a researcher focus more on interpretation of the results of analysis of voluminous data sets, rather than manipulations necessary to accomplish the analysis.
PMCID: PMC3945939  PMID: 24571581
Polymorphisms; Linkage disequilibrium; Maximum likelihood
8.  Ancient Gene Duplicates in Gossypium (Cotton) Exhibit Near-Complete Expression Divergence 
Genome Biology and Evolution  2014;6(3):559-571.
Whole genome duplication (WGD) is widespread in flowering plants and is a driving force in angiosperm diversification. The redundancy introduced by WGD allows the evolution of novel gene interactions and functions, although the patterns and processes of diversification are poorly understood. We identified ∼2,000 pairs of paralogous genes in Gossypium raimondii (cotton) resulting from an approximately 60 My old 5- to 6-fold ploidy increase. Gene expression analyses revealed that, in G. raimondii, 99.4% of the gene pairs exhibit differential expression in at least one of the three tissues (petal, leaf, and seed), with 93% to 94% exhibiting differential expression on a per-tissue basis. For 1,666 (85%) pairs, differential expression was observed in all tissues. These observations were mirrored in a time series of G. raimondii seed, and separately in leaf, petal, and seed of G. arboreum, indicating expression level diversification before species divergence. A generalized linear model revealed 92.4% of the paralog pairs exhibited expression divergence, with most exhibiting significant gene and tissue interactions indicating complementary expression patterns in different tissues. These data indicate massive, near-complete expression level neo- and/or subfunctionalization among ancient gene duplicates, suggesting these processes are essential in their maintenance over ∼60 Ma.
PMCID: PMC3971588  PMID: 24558256
9.  A Whole-Genome DNA Marker Map for Cotton Based on the D-Genome Sequence of Gossypium raimondii L. 
G3: Genes|Genomes|Genetics  2013;3(10):1759-1767.
We constructed a very-high-density, whole-genome marker map (WGMM) for cotton by using 18,597 DNA markers corresponding to 48,958 loci that were aligned to both a consensus genetic map and a reference genome sequence. The WGMM has a density of one locus per 15.6 kb, or an average of 1.3 loci per gene. The WGMM was anchored by the use of colinear markers to a detailed genetic map, providing recombinational information. Mapped markers occurred at relatively greater physical densities in distal chromosomal regions and lower physical densities in the central regions, with all 1 Mb bins having at least nine markers. Hotspots for quantitative trait loci and resistance gene analog clusters were aligned to the map and DNA markers identified for targeting of these regions of high practical importance. Based on the cotton D genome reference sequence, the locations of chromosome structural rearrangements plotted on the map facilitate its translation to other Gossypium genome types. The WGMM is a versatile genetic map for marker assisted breeding, fine mapping and cloning of genes and quantitative trait loci, developing new genetic markers and maps, genome-wide association mapping, and genome evolution studies.
PMCID: PMC3789800  PMID: 23979945
quantitative trait loci; resistance gene analog; simple sequence repeat; restriction fragment length polymorphism; inversions
10.  A Novel Application of Furazolidone: Anti-Leukemic Activity in Acute Myeloid Leukemia 
PLoS ONE  2013;8(8):e72335.
Acute myeloid leukemia (AML) is the most common malignant myeloid disorder of progenitor cells in myeloid hematopoiesis and exemplifies a genetically heterogeneous disease. The patients with AML also show a heterogeneous response to therapy. Although all-trans retinoic acid (ATRA) has been successfully introduced to treat acute promyelocytic leukemia (APL), it is rather ineffective in non-APL AML. In our present study, 1200 off-patent marketed drugs and natural compounds that have been approved by the Food and Drug Administration (FDA) were screened for anti-leukemia activity using the retrovirus transduction/transformation assay (RTTA). Furazolidone (FZD) was shown to inhibit bone marrow transformation mediated by several leukemia fusion proteins, including AML1-ETO. Furazolidone has been used in the treatment of certain bacterial and protozoan infections in human and animals for more than sixty years. We investigated the anti-leukemic activity of FZD in a series of AML cells. FZD displayed potent antiproliferative properties at submicromolar concentrations and induced apoptosis in AML cell lines. Importantly, FZD treatment of certain AML cells induced myeloid cell differentiation by morphology and flow cytometry for CD11b expression. Furthermore, FZD treatment resulted in increased stability of tumor suppressor p53 protein in AML cells. Our in vitro results suggest furazolidone as a novel therapeutic strategy in AML patients.
PMCID: PMC3739762  PMID: 23951311
11.  PGDD: a database of gene and genome duplication in plants 
Nucleic Acids Research  2012;41(Database issue):D1152-D1158.
Genome duplication (GD) has permanently shaped the architecture and function of many higher eukaryotic genomes. The angiosperms (flowering plants) are outstanding models in which to elucidate consequences of GD for higher eukaryotes, owing to their propensity for chromosomal duplication or even triplication in a few cases. Duplicated genome structures often require both intra- and inter-genome alignments to unravel their evolutionary history, also providing the means to deduce both obvious and otherwise-cryptic orthology, paralogy and other relationships among genes. The burgeoning sets of angiosperm genome sequences provide the foundation for a host of investigations into the functional and evolutionary consequences of gene and GD. To provide genome alignments from a single resource based on uniform standards that have been validated by empirical studies, we built the Plant Genome Duplication Database (PGDD; freely available at, a web service providing synteny information in terms of colinearity between chromosomes. At present, PGDD contains data for 26 plants including bryophytes and chlorophyta, as well as angiosperms with draft genome sequences. In addition to the inclusion of new genomes as they become available, we are preparing new functions to enhance PGDD.
PMCID: PMC3531184  PMID: 23180799
12.  MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity 
Nucleic Acids Research  2012;40(7):e49.
MCScan is an algorithm able to scan multiple genomes or subgenomes in order to identify putative homologous chromosomal regions, and align these regions using genes as anchors. The MCScanX toolkit implements an adjusted MCScan algorithm for detection of synteny and collinearity that extends the original software by incorporating 14 utility programs for visualization of results and additional downstream analyses. Applications of MCScanX to several sequenced plant genomes and gene families are shown as examples. MCScanX can be used to effectively analyze chromosome structural changes, and reveal the history of gene family expansions that might contribute to the adaptation of lineages and taxa. An integrated view of various modes of gene duplication can supplement the traditional gene tree analysis in specific families. The source code and documentation of MCScanX are freely available at
PMCID: PMC3326336  PMID: 22217600
13.  Modes of Gene Duplication Contribute Differently to Genetic Novelty and Redundancy, but Show Parallels across Divergent Angiosperms 
PLoS ONE  2011;6(12):e28150.
Both single gene and whole genome duplications (WGD) have recurred in angiosperm evolution. However, the evolutionary effects of different modes of gene duplication, especially regarding their contributions to genetic novelty or redundancy, have been inadequately explored.
In Arabidopsis thaliana and Oryza sativa (rice), species that deeply sample botanical diversity and for which expression data are available from a wide range of tissues and physiological conditions, we have compared expression divergence between genes duplicated by six different mechanisms (WGD, tandem, proximal, DNA based transposed, retrotransposed and dispersed), and between positional orthologs. Both neo-functionalization and genetic redundancy appear to contribute to retention of duplicate genes. Genes resulting from WGD and tandem duplications diverge slowest in both coding sequences and gene expression, and contribute most to genetic redundancy, while other duplication modes contribute more to evolutionary novelty. WGD duplicates may more frequently be retained due to dosage amplification, while inferred transposon mediated gene duplications tend to reduce gene expression levels. The extent of expression divergence between duplicates is discernibly related to duplication modes, different WGD events, amino acid divergence, and putatively neutral divergence (time), but the contribution of each factor is heterogeneous among duplication modes. Gene loss may retard inter-species expression divergence. Members of different gene families may have non-random patterns of origin that are similar in Arabidopsis and rice, suggesting the action of pan-taxon principles of molecular evolution.
Gene duplication modes differ in contribution to genetic novelty and redundancy, but show some parallels in taxa separated by hundreds of millions of years of evolution.
PMCID: PMC3229532  PMID: 22164235
14.  Spreading of Alu Methylation to the Promoter of the MLH1 Gene in Gastrointestinal Cancer 
PLoS ONE  2011;6(10):e25913.
The highly repetitive Alu retroelements are regarded as methylation centres in the genome. Methylation in the gene promoters could be spreading from them. Promoter methylation of MLH1 is frequently detected in cancers, but the underlying mechanism is unclear. The aim of this study is to understand whether the methylation in the Alu elements is associated with promoter methylation in the MLH1 gene. Bisulfite genomic sequencing was used to analyse the CpG sites of the 5′ end (promoter, exon 1 and Alu-containing intron 1) of the MLH1 gene in colorectal cancer cells and tissues, and gastric cancer tissues. Hypomethylation in the Alu elements and hypermethylation in the promoters and the regions between the promoters and the Alu elements were detected in two cancer cell lines and seven cancer tissues. However, demethylation or hypomethylation of the MLH1 promoter and regions between promoter and the Alu elements, and hypermethylation in the Alu elements, were identified in the normal tissues. MLH1 promoter methylation may spread from Alu elements that are located in intron 1 of the MLH1 gene. The trans-acting elements binding to the mutation sites could play a role in the methylation spreading.
PMCID: PMC3192117  PMID: 22022465
15.  A physical map of Brassica oleracea shows complexity of chromosomal changes following recursive paleopolyploidizations 
BMC Genomics  2011;12:470.
Evolution of the Brassica species has been recursively affected by polyploidy events, and comparison to their relative, Arabidopsis thaliana, provides means to explore their genomic complexity.
A genome-wide physical map of a rapid-cycling strain of B. oleracea was constructed by integrating high-information-content fingerprinting (HICF) of Bacterial Artificial Chromosome (BAC) clones with hybridization to sequence-tagged probes. Using 2907 contigs of two or more BACs, we performed several lines of comparative genomic analysis. Interspecific DNA synteny is much better preserved in euchromatin than heterochromatin, showing the qualitative difference in evolution of these respective genomic domains. About 67% of contigs can be aligned to the Arabidopsis genome, with 96.5% corresponding to euchromatic regions, and 3.5% (shown to contain repetitive sequences) to pericentromeric regions. Overgo probe hybridization data showed that contigs aligned to Arabidopsis euchromatin contain ~80% of low-copy-number genes, while genes with high copy number are much more frequently associated with pericentromeric regions. We identified 39 interchromosomal breakpoints during the diversification of B. oleracea and Arabidopsis thaliana, a relatively high level of genomic change since their divergence. Comparison of the B. oleracea physical map with Arabidopsis and other available eudicot genomes showed appreciable 'shadowing' produced by more ancient polyploidies, resulting in a web of relatedness among contigs which increased genomic complexity.
A high-resolution genetically-anchored physical map sheds light on Brassica genome organization and advances positional cloning of specific genes, and may help to validate genome sequence assembly and alignment to chromosomes.
All the physical mapping data is freely shared at a WebFPC site (; Temporarily password-protected: account: pgml; password: 123qwe123.
PMCID: PMC3193055  PMID: 21955929
Comparative genomics; polyploidy; Arabidopsis thaliana
16.  A draft physical map of a D-genome cotton species (Gossypium raimondii) 
BMC Genomics  2010;11:395.
Genetically anchored physical maps of large eukaryotic genomes have proven useful both for their intrinsic merit and as an adjunct to genome sequencing. Cultivated tetraploid cottons, Gossypium hirsutum and G. barbadense, share a common ancestor formed by a merger of the A and D genomes about 1-2 million years ago. Toward the long-term goal of characterizing the spectrum of diversity among cotton genomes, the worldwide cotton community has prioritized the D genome progenitor Gossypium raimondii for complete sequencing.
A whole genome physical map of G. raimondii, the putative D genome ancestral species of tetraploid cottons was assembled, integrating genetically-anchored overgo hybridization probes, agarose based fingerprints and 'high information content fingerprinting' (HICF). A total of 13,662 BAC-end sequences and 2,828 DNA probes were used in genetically anchoring 1585 contigs to a cotton consensus genetic map, and 370 and 438 contigs, respectively to Arabidopsis thaliana (AT) and Vitis vinifera (VV) whole genome sequences.
Several lines of evidence suggest that the G. raimondii genome is comprised of two qualitatively different components. Much of the gene rich component is aligned to the Arabidopsis and Vitis vinifera genomes and shows promise for utilizing translational genomic approaches in understanding this important genome and its resident genes. The integrated genetic-physical map is of value both in assembling and validating a planned reference sequence.
PMCID: PMC2996926  PMID: 20569427
17.  The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus) 
Ming, Ray | Hou, Shaobin | Feng, Yun | Yu, Qingyi | Dionne-Laporte, Alexandre | Saw, Jimmy H. | Senin, Pavel | Wang, Wei | Ly, Benjamin V. | Lewis, Kanako L. T. | Salzberg, Steven L. | Feng, Lu | Jones, Meghan R. | Skelton, Rachel L. | Murray, Jan E. | Chen, Cuixia | Qian, Wubin | Shen, Junguo | Du, Peng | Eustice, Moriah | Tong, Eric | Tang, Haibao | Lyons, Eric | Paull, Robert E. | Michael, Todd P. | Wall, Kerr | Rice, Danny W. | Albert, Henrik | Wang, Ming-Li | Zhu, Yun J. | Schatz, Michael | Nagarajan, Niranjan | Acob, Ricelle A. | Guan, Peizhu | Blas, Andrea | Wai, Ching Man | Ackerman, Christine M. | Ren, Yan | Liu, Chao | Wang, Jianmei | Wang, Jianping | Na, Jong-Kuk | Shakirov, Eugene V. | Haas, Brian | Thimmapuram, Jyothi | Nelson, David | Wang, Xiyin | Bowers, John E. | Gschwend, Andrea R. | Delcher, Arthur L. | Singh, Ratnesh | Suzuki, Jon Y. | Tripathi, Savarni | Neupane, Kabi | Wei, Hairong | Irikura, Beth | Paidi, Maya | Jiang, Ning | Zhang, Wenli | Presting, Gernot | Windsor, Aaron | Navajas-Pérez, Rafael | Torres, Manuel J. | Feltus, F. Alex | Porter, Brad | Li, Yingjun | Burroughs, A. Max | Luo, Ming-Cheng | Liu, Lei | Christopher, David A. | Mount, Stephen M. | Moore, Paul H. | Sugimura, Tak | Jiang, Jiming | Schuler, Mary A. | Friedman, Vikki | Mitchell-Olds, Thomas | Shippen, Dorothy E. | dePamphilis, Claude W. | Palmer, Jeffrey D. | Freeling, Michael | Paterson, Andrew H. | Gonsalves, Dennis | Wang, Lei | Alam, Maqsudul
Nature  2008;452(7190):991-996.
Papaya, a fruit crop cultivated in tropical and subtropical regions, is known for its nutritional benefits and medicinal applications. Here we report a 3× draft genome sequence of ‘SunUp’ papaya, the first commercial virus-resistant transgenic fruit tree1 to be sequenced. The papaya genome is three times the size of the Arabidopsis genome, but contains fewer genes, including significantly fewer disease-resistance gene analogues. Comparison of the five sequenced genomes suggests a minimal angiosperm gene set of 13,311. A lack of recent genome duplication, atypical of other angiosperm genomes sequenced so far2–5, may account for the smaller papaya gene number in most functional groups. Nonetheless, striking amplifications in gene number within particular functional groups suggest roles in the evolution of tree-like habit, deposition and remobilization of starch reserves, attraction of seed dispersal agents, and adaptation to tropical daylengths. Transgenesis at three locations is closely associated with chloroplast insertions into the nuclear genome, and with topoisomerase I recognition sites. Papaya offers numerous advantages as a system for fruit-tree functional genomics, and this draft genome sequence provides the foundation for revealing the basis of Carica's distinguishing morpho-physiological, medicinal and nutritional properties.
PMCID: PMC2836516  PMID: 18432245
18.  Comparative genomic analysis of C4 photosynthetic pathway evolution in grasses 
Genome Biology  2009;10(6):R68.
Comparison of the sorghum, maize and rice genomes shows that gene duplication and functional innovation is common to evolution of most but not all genes in the C4 photosynthetic pathway
Sorghum is the first C4 plant and the second grass with a full genome sequence available. This makes it possible to perform a whole-genome-level exploration of C4 pathway evolution by comparing key photosynthetic enzyme genes in sorghum, maize (C4) and rice (C3), and to investigate a long-standing hypothesis that a reservoir of duplicated genes is a prerequisite for the evolution of C4 photosynthesis from a C3 progenitor.
We show that both whole-genome and individual gene duplication have contributed to the evolution of C4 photosynthesis. The C4 gene isoforms show differential duplicability, with some C4 genes being recruited from whole genome duplication duplicates by multiple modes of functional innovation. The sorghum and maize carbonic anhydrase genes display a novel mode of new gene formation, with recursive tandem duplication and gene fusion accompanied by adaptive evolution to produce C4 genes with one to three functional units. Other C4 enzymes in sorghum and maize also show evidence of adaptive evolution, though differing in level and mode. Intriguingly, a phosphoenolpyruvate carboxylase gene in the C3 plant rice has also been evolving rapidly and shows evidence of adaptive evolution, although lacking key mutations that are characteristic of C4 metabolism. We also found evidence that both gene redundancy and alternative splicing may have sheltered the evolution of new function.
Gene duplication followed by functional innovation is common to evolution of most but not all C4 genes. The apparently long time-lag between the availability of duplicates for recruitment into C4 and the appearance of C4 grasses, together with the heterogeneity of origins of C4 genes, suggests that there may have been a long transition process before the establishment of C4 photosynthesis.
PMCID: PMC2718502  PMID: 19549309
19.  Statistical inference of chromosomal homology based on gene colinearity and applications to Arabidopsis and rice 
BMC Bioinformatics  2006;7:447.
The identification of chromosomal homology will shed light on such mysteries of genome evolution as DNA duplication, rearrangement and loss. Several approaches have been developed to detect chromosomal homology based on gene synteny or colinearity. However, the previously reported implementations lack statistical inferences which are essential to reveal actual homologies.
In this study, we present a statistical approach to detect homologous chromosomal segments based on gene colinearity. We implement this approach in a software package ColinearScan to detect putative colinear regions using a dynamic programming algorithm. Statistical models are proposed to estimate proper parameter values and evaluate the significance of putative homologous regions. Statistical inference, high computational efficiency and flexibility of input data type are three key features of our approach.
We apply ColinearScan to the Arabidopsis and rice genomes to detect duplicated regions within each species and homologous fragments between these two species. We find many more homologous chromosomal segments in the rice genome than previously reported. We also find many small colinear segments between rice and Arabidopsis genomes.
PMCID: PMC1626491  PMID: 17038171
20.  The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes 
Nature Communications  2014;5:3930.
Polyploidization has provided much genetic variation for plant adaptive evolution, but the mechanisms by which the molecular evolution of polyploid genomes establishes genetic architecture underlying species differentiation are unclear. Brassica is an ideal model to increase knowledge of polyploid evolution. Here we describe a draft genome sequence of Brassica oleracea, comparing it with that of its sister species B. rapa to reveal numerous chromosome rearrangements and asymmetrical gene loss in duplicated genomic blocks, asymmetrical amplification of transposable elements, differential gene co-retention for specific pathways and variation in gene expression, including alternative splicing, among a large number of paralogous and orthologous genes. Genes related to the production of anticancer phytochemicals and morphological variations illustrate consequences of genome duplication and gene divergence, imparting biochemical and morphological variation to B. oleracea. This study provides insights into Brassica genome evolution and will underpin research into the many important crops in this genus.
Brassica oleracea is plant species comprising economically important vegetable crops. Here, the authors report the draft genome sequence of B. oleracea and, through a comparative analysis with the closely related B. rapa, reveal insights into Brassica evolution and divergence of interspecific genomes and intraspecific subgenomes.
PMCID: PMC4279128  PMID: 24852848
21.  The Genomes of Oryza sativa: A History of Duplications 
Yu, Jun | Wang, Jun | Lin, Wei | Li, Songgang | Li, Heng | Zhou, Jun | Ni, Peixiang | Dong, Wei | Hu, Songnian | Zeng, Changqing | Zhang, Jianguo | Zhang, Yong | Li, Ruiqiang | Xu, Zuyuan | Li, Shengting | Li, Xianran | Zheng, Hongkun | Cong, Lijuan | Lin, Liang | Yin, Jianning | Geng, Jianing | Li, Guangyuan | Shi, Jianping | Liu, Juan | Lv, Hong | Li, Jun | Wang, Jing | Deng, Yajun | Ran, Longhua | Shi, Xiaoli | Wang, Xiyin | Wu, Qingfa | Li, Changfeng | Ren, Xiaoyu | Wang, Jingqiang | Wang, Xiaoling | Li, Dawei | Liu, Dongyuan | Zhang, Xiaowei | Ji, Zhendong | Zhao, Wenming | Sun, Yongqiao | Zhang, Zhenpeng | Bao, Jingyue | Han, Yujun | Dong, Lingli | Ji, Jia | Chen, Peng | Wu, Shuming | Liu, Jinsong | Xiao, Ying | Bu, Dongbo | Tan, Jianlong | Yang, Li | Ye, Chen | Zhang, Jingfen | Xu, Jingyi | Zhou, Yan | Yu, Yingpu | Zhang, Bing | Zhuang, Shulin | Wei, Haibin | Liu, Bin | Lei, Meng | Yu, Hong | Li, Yuanzhe | Xu, Hao | Wei, Shulin | He, Ximiao | Fang, Lijun | Zhang, Zengjin | Zhang, Yunze | Huang, Xiangang | Su, Zhixi | Tong, Wei | Li, Jinhong | Tong, Zongzhong | Li, Shuangli | Ye, Jia | Wang, Lishun | Fang, Lin | Lei, Tingting | Chen, Chen | Chen, Huan | Xu, Zhao | Li, Haihong | Huang, Haiyan | Zhang, Feng | Xu, Huayong | Li, Na | Zhao, Caifeng | Li, Shuting | Dong, Lijun | Huang, Yanqing | Li, Long | Xi, Yan | Qi, Qiuhui | Li, Wenjie | Zhang, Bo | Hu, Wei | Zhang, Yanling | Tian, Xiangjun | Jiao, Yongzhi | Liang, Xiaohu | Jin, Jiao | Gao, Lei | Zheng, Weimou | Hao, Bailin | Liu, Siqi | Wang, Wen | Yuan, Longping | Cao, Mengliang | McDermott, Jason | Samudrala, Ram | Wang, Jian | Wong, Gane Ka-Shu | Yang, Huanming
PLoS Biology  2005;3(2):e38.
We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000–40,000. Only 2%–3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family.
Comparative genome sequencing of indica and japonica rice reveals that duplication of genes and genomic regions has played a major part in the evolution of grass genomes
PMCID: PMC546038  PMID: 15685292
22.  Draft genome sequence of the mulberry tree Morus notabilis 
Nature Communications  2013;4:2445.
Human utilization of the mulberry–silkworm interaction started at least 5,000 years ago and greatly influenced world history through the Silk Road. Complementing the silkworm genome sequence, here we describe the genome of a mulberry species Morus notabilis. In the 330-Mb genome assembly, we identify 128 Mb of repetitive sequences and 29,338 genes, 60.8% of which are supported by transcriptome sequencing. Mulberry gene sequences appear to evolve ~3 times faster than other Rosales, perhaps facilitating the species’ spread worldwide. The mulberry tree is among a few eudicots but several Rosales that have not preserved genome duplications in more than 100 million years; however, a neopolyploid series found in the mulberry tree and several others suggest that new duplications may confer benefits. Five predicted mulberry miRNAs are found in the haemolymph and silk glands of the silkworm, suggesting interactions at molecular levels in the plant–herbivore relationship. The identification and analyses of mulberry genes involved in diversifying selection, resistance and protease inhibitor expressed in the laticifers will accelerate the improvement of mulberry plants.
Mulberry trees are the primary food source for silkworms, which are reared for the production of silk. In this study, He et al. present the draft genome sequence of Morus notabilis and find that it evolved significantly faster than other plants in the Rosales order.
PMCID: PMC3791463  PMID: 24048436

Results 1-22 (22)