Search tips
Search criteria

Results 1-25 (798072)

Clipboard (0)

Related Articles

1.  The sequence of rice chromosomes 11 and 12, rich in disease resistance genes and recent gene duplications 
BMC Biology  2005;3:20.
Rice is an important staple food and, with the smallest cereal genome, serves as a reference species for studies on the evolution of cereals and other grasses. Therefore, decoding its entire genome will be a prerequisite for applied and basic research on this species and all other cereals.
We have determined and analyzed the complete sequences of two of its chromosomes, 11 and 12, which total 55.9 Mb (14.3% of the entire genome length), based on a set of overlapping clones. A total of 5,993 non-transposable element related genes are present on these chromosomes. Among them are 289 disease resistance-like and 28 defense-response genes, a higher proportion of these categories than on any other rice chromosome. A three-Mb segment on both chromosomes resulted from a duplication 7.7 million years ago (mya), the most recent large-scale duplication in the rice genome. Paralogous gene copies within this segmental duplication can be aligned with genomic assemblies from sorghum and maize. Although these gene copies are preserved on both chromosomes, their expression patterns have diverged. When the gene order of rice chromosomes 11 and 12 was compared to wheat gene loci, significant synteny between these orthologous regions was detected, illustrating the presence of conserved genes alternating with recently evolved genes.
Because the resistance and defense response genes, enriched on these chromosomes relative to the whole genome, also occur in clusters, they provide a preferred target for breeding durable disease resistance in rice and the isolation of their allelic variants. The recent duplication of a large chromosomal segment coupled with the high density of disease resistance gene clusters makes this the most recently evolved part of the rice genome. Based on syntenic alignments of these chromosomes, rice chromosome 11 and 12 do not appear to have resulted from a single whole-genome duplication event as previously suggested.
PMCID: PMC1261165  PMID: 16188032
2.  Intron gain and loss in segmentally duplicated genes in rice 
Genome Biology  2006;7(5):R41.
Analysis of over 3,000 co-linear paired genes in rice shows more intron loss than intron gain following segmental duplication.
Introns are under less selection pressure than exons, and consequently, intronic sequences have a higher rate of gain and loss than exons. In a number of plant species, a large portion of the genome has been segmentally duplicated, giving rise to a large set of duplicated genes. The recent completion of the rice genome in which segmental duplication has been documented has allowed us to investigate intron evolution within rice, a diploid monocotyledonous species.
Analysis of segmental duplication in rice revealed that 159 Mb of the 371 Mb genome and 21,570 of the 43,719 non-transposable element-related genes were contained within a duplicated region. In these duplicated regions, 3,101 collinear paired genes were present. Using this set of segmentally duplicated genes, we investigated intron evolution from full-length cDNA-supported non-transposable element-related gene models of rice. Using gene pairs that have an ortholog in the dicotyledonous model species Arabidopsis thaliana, we identified more intron loss (49 introns within 35 gene pairs) than intron gain (5 introns within 5 gene pairs) following segmental duplication. We were unable to demonstrate preferential intron loss at the 3' end of genes as previously reported in mammalian genomes. However, we did find that the four nucleotides of exons that flank lost introns had less frequently used 4-mers.
We observed that intron evolution within rice following segmental duplication is largely dominated by intron loss. In two of the five cases of intron gain within segmentally duplicated genes, the gained sequences were similar to transposable elements.
PMCID: PMC1779517  PMID: 16719932
3.  Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse 
PLoS Biology  2009;7(5):e1000112.
A finished clone-based assembly of the mouse genome reveals extensive recent sequence duplication during recent evolution and rodent-specific expansion of certain gene families. Newly assembled duplications contain protein-coding genes that are mostly involved in reproductive function.
The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non–protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not.
Author Summary
The availability of an accurate genome sequence provides the bedrock upon which modern biomedical research is based. Here we describe a high-quality assembly, Build 36, of the mouse genome. This assembly was put together by aligning overlapping individual clones representing parts of the genome, and it provides a more complete picture than previous assemblies, because it adds much rodent-specific sequence that was previously unavailable. The addition of these sequences provides insight into both the genomic architecture and the gene complement of the mouse. In particular, it highlights recent gene duplications and the expansion of certain gene families during rodent evolution. An improved understanding of the mouse genome and thus mouse biology will enhance the utility of the mouse as a model for human disease.
PMCID: PMC2680341  PMID: 19468303
4.  Association of microsatellite pairs with segmental duplications in insect genomes 
BMC Genomics  2013;14:907.
Segmental duplications (SDs), also known as low-copy repeats, are DNA sequences of length greater than 1 kb which are duplicated with a high degree of sequence identity (greater than 90%) causing instability in genomes. SDs are generally found in the genome as mosaic forms of duplicated sequences which are generated by a two-step process: first, multiple duplicated sequences are aggregated at specific genomic regions, and then, these primary duplications undergo multiple secondary duplications. However, the mechanism of how duplicated sequences are aggregated in the first place is not well understood.
By analyzing the distribution of microsatellite sequences among twenty insect species in a genome-wide manner it was found that pairs of microsatellites along with the intervening sequences were duplicated multiple times in each genome. They were found as low copy repeats or segmental duplications when the duplicated loci were greater than 1 kb in length and had greater than 90% sequence similarity. By performing a sliding-window genomic analysis for number of paired microsatellites and number of segmental duplications, it was observed that regions rich in repetitive paired microsatellites tend to get richer in segmental duplication suggesting a “rich-gets-richer” mode of aggregation of the duplicated loci in specific regions of the genome. Results further show that the relationship between number of paired microsatellites and segmental duplications among the species is independent of the known phylogeny suggesting that association of microsatellites with segmental duplications may be a species-specific evolutionary process. It was also observed that the repetitive microsatellite pairs are associated with gene duplications but those sequences are rarely retained in the orthologous genes between species. Although some of the duplicated sequences with microsatellites as termini were found within transposable elements (TEs) of Drosophila, most of the duplications are found in the TE-free and gene-free regions of the genome.
The study clearly suggests that microsatellites are instrumental in extensive sequence duplications that may contribute to species-specific evolution of genome plasticity in insects.
PMCID: PMC3878106  PMID: 24359442
Segmental duplication; Genome dynamics; Microsatellite; Insect genomes; Duplication shadowing; Gene duplication
5.  A recent duplication revisited: phylogenetic analysis reveals an ancestral duplication highly-conserved throughout the Oryza genus and beyond 
BMC Plant Biology  2009;9:146.
The role of gene duplication in the structural and functional evolution of genomes has been well documented. Analysis of complete rice (Oryza sativa) genome sequences suggested an ancient whole genome duplication, common to all the grasses, some 50-70 million years ago and a more conserved segmental duplication between the distal regions of the short arms of chromosomes 11 and 12, whose evolutionary history is controversial.
We have carried out a comparative analysis of this duplication within the wild species of the genus Oryza, using a phylogenetic approach to specify its origin and evolutionary dynamics. Paralogous pairs were isolated for nine genes selected throughout the region in all Oryza genome types, as well as in two outgroup species, Leersia perrieri and Potamophila parviflora. All Oryza species display the same global evolutionary dynamics but some lineage-specific features appear towards the proximal end of the duplicated region. The same level of conservation is observed between the redundant copies of the tetraploid species Oryza minuta. The presence of orthologous duplicated blocks in the genome of the more distantly-related species, Brachypodium distachyon, strongly suggests that this duplication between chromosomes 11 and 12 was formed as part of the whole genome duplication common to all Poaceae.
Our observations suggest that recurrent but heterogeneous concerted evolution throughout the Oryza genus and in related species has led specifically to the extremely high sequence conservation occurring in this region of more than 2 Mbp.
PMCID: PMC2797015  PMID: 20003305
6.  Evidence and evolutionary analysis of ancient whole-genome duplication in barley predating the divergence from rice 
Well preserved genomic colinearity among agronomically important grass species such as rice, maize, Sorghum, wheat and barley provides access to whole-genome structure information even in species lacking a reference genome sequence. We investigated footprints of whole-genome duplication (WGD) in barley that shaped the cereal ancestor genome by analyzing shared synteny with rice using a ~2000 gene-based barley genetic map and the rice genome reference sequence.
Based on a recent annotation of the rice genome, we reviewed the WGD in rice and identified 24 pairs of duplicated genomic segments involving 70% of the rice genome. Using 968 putative orthologous gene pairs, synteny covered 89% of the barley genetic map and 63% of the rice genome. We found strong evidence for seven shared segmental genome duplications, corresponding to more than 50% of the segmental genome duplications previously determined in rice. Analysis of synonymous substitution rates (Ks) suggested that shared duplications originated before the divergence of these two species. While major genome rearrangements affected the ancestral genome of both species, small paracentric inversions were found to be species specific.
We provide a thorough analysis of comparative genome evolution between barley and rice. A barley genetic map of approximately 2000 non-redundant EST sequences provided sufficient density to allow a detailed view of shared synteny with the rice genome. Using an indirect approach that included the localization of WGD-derived duplicated genome segments in the rice genome, we determined the current extent of shared WGD-derived genome duplications that occurred prior to species divergence.
PMCID: PMC2746218  PMID: 19698139
7.  Detection of copy number variations in rice using array-based comparative genomic hybridization 
BMC Genomics  2011;12:372.
Copy number variations (CNVs) can create new genes, change gene dosage, reshape gene structures, and modify elements regulating gene expression. As with all types of genetic variation, CNVs may influence phenotypic variation and gene expression. CNVs are thus considered major sources of genetic variation. Little is known, however, about their contribution to genetic variation in rice.
To detect CNVs, we used a set of NimbleGen whole-genome comparative genomic hybridization arrays containing 718,256 oligonucleotide probes with a median probe spacing of 500 bp. We compiled a high-resolution map of CNVs in the rice genome, showing 641 CNVs between the genomes of the rice cultivars 'Nipponbare' (from O. sativa ssp. japonica) and 'Guang-lu-ai 4' (from O. sativa ssp. indica). The CNVs identified vary in size from 1.1 kb to 180.7 kb, and encompass approximately 7.6 Mb of the rice genome. The largest regions showing copy gain and loss are of 37.4 kb on chromosome 4, and 180.7 kb on chromosome 8. In addition, 85 DNA segments were identified, including some genic sequences. Contracted genes greatly outnumbered duplicated ones. Many of the contracted genes corresponded to either the same genes or genes involved in the same biological processes; this was also the case for genes involved in disease and defense.
We detected CNVs in rice by array-based comparative genomic hybridization. These CNVs contain known genes. Further discussion of CNVs is important, as they are linked to variation among rice varieties, and are likely to contribute to subspecific characteristics.
PMCID: PMC3156786  PMID: 21771342
8.  Generation of Tandem Direct Duplications by Reversed-Ends Transposition of Maize Ac Elements 
PLoS Genetics  2013;9(8):e1003691.
Tandem direct duplications are a common feature of the genomes of eukaryotes ranging from yeast to human, where they comprise a significant fraction of copy number variations. The prevailing model for the formation of tandem direct duplications is non-allelic homologous recombination (NAHR). Here we report the isolation of a series of duplications and reciprocal deletions isolated de novo from a maize allele containing two Class II Ac/Ds transposons. The duplication/deletion structures suggest that they were generated by alternative transposition reactions involving the termini of two nearby transposable elements. The deletion/duplication breakpoint junctions contain 8 bp target site duplications characteristic of Ac/Ds transposition events, confirming their formation directly by an alternative transposition mechanism. Tandem direct duplications and reciprocal deletions were generated at a relatively high frequency (∼0.5 to 1%) in the materials examined here in which transposons are positioned nearby each other in appropriate orientation; frequencies would likely be much lower in other genotypes. To test whether this mechanism may have contributed to maize genome evolution, we analyzed sequences flanking Ac/Ds and other hAT family transposons and identified three small tandem direct duplications with the structural features predicted by the alternative transposition mechanism. Together these results show that some class II transposons are capable of directly inducing tandem sequence duplications, and that this activity has contributed to the evolution of the maize genome.
Author Summary
The recent explosion of genome sequence data has greatly increased the need to understand the forces that shape eukaryotic genomes. A common feature of higher plant genomes is the presence of large numbers of duplications, often occurring as tandem repeats of thousands of base pairs. Despite the importance of gene duplications in evolution and disease, the precise mechanism(s) that generate tandem duplications are still unclear. In this study we identified nine new spontaneous duplications that arose flanking elements of the Ac transposon system. These duplications range in size from 8 kbp to >5,000 kbp, and all cases exhibit features characteristic of Ac transposition. Using similar criteria in a bioinformatics search, we identified three smaller duplications adjacent to other hAT family transposons in the maize B73 reference genome sequence. Our results show that transposable elements can directly generate tandem duplications via alternative transposition, and that this mechanism is responsible for at least some of the duplications present in the maize B73 genome. This work extends the significance of Barbara McClintock's discovery of transposable elements by demonstrating how they can act as agents of genome expansion.
PMCID: PMC3744419  PMID: 23966872
9.  LTR retrotransposons reveal recent extensive inter-subspecies nonreciprocal recombination in Asian cultivated rice 
BMC Genomics  2008;9:565.
Long Terminal Repeats retrotransposons (LTR elements) are ubiquitous Eukaryotic transposable elements (TEs). They are considered to be one of the major forces underlying plant genome evolution. Because of relatively high evolutionary speed, active transposition of LTR elements in the host genomes provides rich information on their short-term history. As more and more genomes, especially those of closely related organisms, have been sequenced, it is possible to perform global comparative study of their LTR retrotransposons to reveal events in the history.
The present research is designed to investigate important evolutionary events in the origin of Asian cultivated rice through the comparison of LTR elements. We have developed LTR_INSERT, a new method for LTR elements discovery in two closely related genomes. Our method has a distinctive feature that it is capable of judging whether an insertion occurs prior or posterior to the divergence of genomes. LTR_INSERT identifies 993 full-length LTR elements, annotates 15916 copies related with them, and discovers at least 16 novel LTR families in the whole-genome comparative map of two cultivated rice subspecies. From the full-length LTR elements, we estimate that a significant proportion of the rice genome has experienced inter-subspecies nonreciprocal recombination (ISNR) in as recent as 53,000 years. Large-scale samplings further support that more than 15% of the rice genome has been involved in such recombination. In addition, LTR elements confirm that the genome of O. sativa ssp. indica and that of japonica diverged about 600,000 years ago.
A new LTR retrotransposon identification method integrating both comparative genomics and ab initio algorithm is introduced and applied to Asian cultivated rice genomes. At whole-genome level, this work confirms that recent ISNR is an important factor that molds modern cultivated rice genome.
PMCID: PMC2612701  PMID: 19038031
10.  Global Reorganization of Replication Domains During Embryonic Stem Cell Differentiation 
PLoS Biology  2008;6(10):e245.
DNA replication in mammals is regulated via the coordinate firing of clusters of replicons that duplicate megabase-sized chromosome segments at specific times during S-phase. Cytogenetic studies show that these “replicon clusters” coalesce as subchromosomal units that persist through multiple cell generations, but the molecular boundaries of such units have remained elusive. Moreover, the extent to which changes in replication timing occur during differentiation and their relationship to transcription changes has not been rigorously investigated. We have constructed high-resolution replication-timing profiles in mouse embryonic stem cells (mESCs) before and after differentiation to neural precursor cells. We demonstrate that chromosomes can be segmented into multimegabase domains of coordinate replication, which we call “replication domains,” separated by transition regions whose replication kinetics are consistent with large originless segments. The molecular boundaries of replication domains are remarkably well conserved between distantly related ESC lines and induced pluripotent stem cells. Unexpectedly, ESC differentiation was accompanied by the consolidation of smaller differentially replicating domains into larger coordinately replicated units whose replication time was more aligned to isochore GC content and the density of LINE-1 transposable elements, but not gene density. Replication-timing changes were coordinated with transcription changes for weak promoters more than strong promoters, and were accompanied by rearrangements in subnuclear position. We conclude that replication profiles are cell-type specific, and changes in these profiles reveal chromosome segments that undergo large changes in organization during differentiation. Moreover, smaller replication domains and a higher density of timing transition regions that interrupt isochore replication timing define a novel characteristic of the pluripotent state.
Author Summary
Microscopy studies have suggested that chromosomal DNA is composed of multiple, megabase-sized segments, each replicated at different times during S-phase of the cell cycle. However, a molecular definition of these coordinately replicated sequences and the stability of the boundaries between them has not been established. We constructed genome-wide replication-timing maps in mouse embryonic stem cells, identifying multimegabase coordinately replicated chromosome segments—“replication domains”—separated by remarkably distinct temporal boundaries. These domain boundaries were shared between several unrelated embryonic stem cell lines, including somatic cells reprogrammed to pluripotency (so-called induced pluripotent stem cells). However, upon differentiation to neural precursor cells, domains encompassing approximately 20% of the genome changed their replication timing, temporally consolidating into fewer, larger replication domains that were conserved between different neural precursor cell lines. Domains that changed replication timing showed a unique sequence composition, a strongly biased directionality for changes in resident gene expression, and altered radial positioning within the three-dimensional space in the cell nucleus, suggesting that changes in replication timing are related to the reorganization of higher-order chromosome structure and function during differentiation. Moreover, the property of smaller discordantly replicating domains may define a novel characteristic of pluripotency.
Analyzing the temporal order of DNA replication across the genome during embryonic stem cell differentiation reveals stable boundaries between coordinately replicated regions that consolidate into fewer, larger domains during differentiation.
PMCID: PMC2561079  PMID: 18842067
11.  Different patterns of gene structure divergence following gene duplication in Arabidopsis 
BMC Genomics  2013;14:652.
Divergence in gene structure following gene duplication is not well understood. Gene duplication can occur via whole-genome duplication (WGD) and single-gene duplications including tandem, proximal and transposed duplications. Different modes of gene duplication may be associated with different types, levels, and patterns of structural divergence.
In Arabidopsis thaliana, we denote levels of structural divergence between duplicated genes by differences in coding-region lengths and average exon lengths, and the number of insertions/deletions (indels) and maximum indel length in their protein sequence alignment. Among recent duplicates of different modes, transposed duplicates diverge most dramatically in gene structure. In transposed duplications, parental loci tend to have longer coding-regions and exons, and smaller numbers of indels and maximum indel lengths than transposed loci, reflecting biased structural changes in transposed duplications. Structural divergence increases with evolutionary time for WGDs, but not transposed duplications, possibly because of biased gene losses following transposed duplications. Structural divergence has heterogeneous relationships with nucleotide substitution rates, but is consistently positively correlated with gene expression divergence. The NBS-LRR gene family shows higher-than-average levels of structural divergence.
Our study suggests that structural divergence between duplicated genes is greatly affected by the mechanisms of gene duplication and may be not proportional to evolutionary time, and that certain gene families are under selection on rapid evolution of gene structure.
PMCID: PMC3848917  PMID: 24063813
Gene structure; Divergence; Transposed duplication; Whole-genome duplication; Selection; Arabidopsis
12.  Duplication and independent selection of cell-wall invertase genes GIF1 and OsCIN1 during rice evolution and domestication 
Various evolutionary models have been proposed to interpret the fate of paralogous duplicates, which provides substrates on which evolution selection could act. In particular, domestication, as a special selection, has played important role in crop cultivation with divergence of many genes controlling important agronomic traits. Recent studies have indicated that a pair of duplicate genes was often sub-functionalized from their ancestral functions held by the parental genes. We previously demonstrated that the rice cell-wall invertase (CWI) gene GIF1 that plays an important role in the grain-filling process was most likely subjected to domestication selection in the promoter region. Here, we report that GIF1 and another CWI gene OsCIN1 constitute a pair of duplicate genes with differentiated expression and function through independent selection.
Through synteny analysis, we show that GIF1 and another cell-wall invertase gene OsCIN1 were paralogues derived from a segmental duplication originated during genome duplication of grasses. Results based on analyses of population genetics and gene phylogenetic tree of 25 cultivars and 25 wild rice sequences demonstrated that OsCIN1 was also artificially selected during rice domestication with a fixed mutation in the coding region, in contrast to GIF1 that was selected in the promoter region. GIF1 and OsCIN1 have evolved into different expression patterns and probable different kinetics parameters of enzymatic activity with the latter displaying less enzymatic activity. Overexpression of GIF1 and OsCIN1 also resulted in different phenotypes, suggesting that OsCIN1 might regulate other unrecognized biological process.
How gene duplication and divergence contribute to genetic novelty and morphological adaptation has been an interesting issue to geneticists and biologists. Our discovery that the duplicated pair of GIF1 and OsCIN1 has experienced sub-functionalization implies that selection could act independently on each duplicate towards different functional specificity, which provides a vivid example for evolution of genetic novelties in a model crop. Our results also further support the established hypothesis that gene duplication with sub-functionalization could be one solution for genetic adaptive conflict.
PMCID: PMC2873416  PMID: 20416079
13.  Two Evolutionary Histories in the Genome of Rice: the Roles of Domestication Genes 
PLoS Genetics  2011;7(6):e1002100.
Genealogical patterns in different genomic regions may be different due to the joint influence of gene flow and selection. The existence of two subspecies of cultivated rice provides a unique opportunity for analyzing these effects during domestication. We chose 66 accessions from the three rice taxa (about 22 each from Oryza sativa indica, O. sativa japonica, and O. rufipogon) for whole-genome sequencing. In the search for the signature of selection, we focus on low diversity regions (LDRs) shared by both cultivars. We found that the genealogical histories of these overlapping LDRs are distinct from the genomic background. While indica and japonica genomes generally appear to be of independent origin, many overlapping LDRs may have originated only once, as a result of selection and subsequent introgression. Interestingly, many such LDRs contain only one candidate gene of rice domestication, and several known domestication genes have indeed been “rediscovered” by this approach. In summary, we identified 13 additional candidate genes of domestication.
Author Summary
The origin of two cultivated rice Oryza sativa indica and O. sativa japonica has been an interesting topic in evolutionary biology. Through whole-genome sequencing, we show that the rice genome embodies two different evolutionary trajectories. Overall genome-wide pattern supports a history of independent origin of two cultivars from their wild population. However, genomic segments bearing important agronomic traits originated only once in one population and spread across all cultivars through introgression and human selection. Population genetic analysis allows us to pinpoint 13 additional candidate domestication genes.
PMCID: PMC3111475  PMID: 21695282
14.  Genome Duplication and Gene Loss Affect the Evolution of Heat Shock Transcription Factor Genes in Legumes 
PLoS ONE  2014;9(7):e102825.
Whole-genome duplication events (polyploidy events) and gene loss events have played important roles in the evolution of legumes. Here we show that the vast majority of Hsf gene duplications resulted from whole genome duplication events rather than tandem duplication, and significant differences in gene retention exist between species. By searching for intraspecies gene colinearity (microsynteny) and dating the age distributions of duplicated genes, we found that genome duplications accounted for 42 of 46 Hsf-containing segments in Glycine max, while paired segments were rarely identified in Lotus japonicas, Medicago truncatula and Cajanus cajan. However, by comparing interspecies microsynteny, we determined that the great majority of Hsf-containing segments in Lotus japonicas, Medicago truncatula and Cajanus cajan show extensive conservation with the duplicated regions of Glycine max. These segments formed 17 groups of orthologous segments. These results suggest that these regions shared ancient genome duplication with Hsf genes in Glycine max, but more than half of the copies of these genes were lost. On the other hand, the Glycine max Hsf gene family retained approximately 75% and 84% of duplicated genes produced from the ancient genome duplication and recent Glycine-specific genome duplication, respectively. Continuous purifying selection has played a key role in the maintenance of Hsf genes in Glycine max. Expression analysis of the Hsf genes in Lotus japonicus revealed their putative involvement in multiple tissue-/developmental stages and responses to various abiotic stimuli. This study traces the evolution of Hsf genes in legume species and demonstrates that the rates of gene gain and loss are far from equilibrium in different species.
PMCID: PMC4105503  PMID: 25047803
15.  Does the Upstream Region Possessing MULE-Like Sequence in Rice Upregulate PsbS1 Gene Expression? 
PLoS ONE  2014;9(9):e102742.
The genomic nucleotide sequences of japonica rice (Sasanishiki and Nipponbare) contained about 2.7-kb unique region at the point of 0.4-kb upstream of the OsPsbS1 gene. In this study, we found that japonica rice with a few exceptions possessing such DNA sequences [denoted to OsMULE-japonica specific sequence (JSS)] is distinct by the presence of Mutator-like-element (MULE). Such sequence was absent in most of indica cultivars and Oryza glaberrima. In OsMULE-JSS1, we noted the presence of possible target site duplication (TSD; CTTTTCCAG) and about 80-bp terminal inverted repeat (TIR) near TSD. We also found the enhancement ofOsPsbS1 mRNA accumulation by intensified light, which was not associated with the DNA methylation status in OsMULE/JSS. In addition, O. rufipogon, possible ancestor of modern rice cultivars was found to compose PsbS gene of either japonica (minor) or indica (major) type. Transient gene expression assay showed that the japonica type promoter elevated a reporter gene activity than indica type.
PMCID: PMC4178011  PMID: 25259844
16.  Coding region structural heterogeneity and turnover of transcription start sites contribute to divergence in expression between duplicate genes 
Genome Biology  2009;10(1):R10.
Gene expression data for duplicated gene pairs in humans provides insights into the regulatory factors affecting the expression divergence of these genes and implications for their evolution.
Gene expression divergence is one manifestation of functional differences between duplicate genes. Although rapid accumulation of expression divergence between duplicate gene copies has been observed, the driving mechanisms behind this phenomenon have not been explored in detail.
We examine which factors influence expression divergence between human duplicate genes, utilizing the latest genome-wide data sets. We conclude that the turnover of transcription start sites between duplicate genes occurs rapidly after gene duplication and that gene pairs with shared transcription start sites have significantly higher expression similarity than those without shared transcription start sites. Moreover, we find that most (55%) duplicate gene pairs do not retain the same coding sequence structure between the two duplicate copies and this also contributes to divergence in their expression. Furthermore, the proportion of aligned sequences in cis-regulatory regions between the two copies is positively correlated with expression similarity. Surprisingly, we find no effect of copy-specific transposable element insertions on the divergence of duplicate gene expression.
Our results suggest that turnover of transcription start sites, structural heterogeneity of coding sequences, and divergence of cis-regulatory regions between copies play a pivotal role in determining the expression divergence of duplicate genes.
PMCID: PMC2687787  PMID: 19175934
17.  Locally Duplicated Ohnologs Evolve Faster Than Nonlocally Duplicated Ohnologs in Arabidopsis and Rice 
Genome Biology and Evolution  2013;5(2):362-369.
Whole-genome duplications (WGDs) have recurred in the evolution of angiosperms, resulting in many duplicated chromosomal segments. Local gene duplications are also widespread in angiosperms. WGD-derived duplicates, that is, ohnologs, and local duplicates often show contrasting patterns of gene retention and evolution. However, many genes in angiosperms underwent multiple gene duplication events, possibly by different modes, indicating that different modes of gene duplication are not mutually exclusive. In two representative angiosperm genomes, Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa), we found that 9.6% and 11.3% of unique ohnologs, corresponding to 15.5% and 17.1% of ohnolog pairs, were also involved in local duplications, respectively. Locally duplicated ohnologs are widely distributed in different duplicated chromosomal segments and functionally biased. Coding sequence divergence between duplicated genes is denoted by nonsynonymous (Ka) and synonymous (Ks) substitution rates. Locally duplicated ohnolog pairs tend to have higher Ka, Ka/Ks, and gene expression divergence than nonlocally duplicated ohnolog pairs. Locally duplicated ohnologs also tend to have higher interspecies sequence divergence. These observations indicate that locally duplicated ohnologs evolve faster than nonlocally duplicated ohnologs. This study highlights the necessity to take local duplications into account when analyzing the evolutionary dynamics of ohnologs.
PMCID: PMC3590777  PMID: 23362157
local gene duplication; whole-genome duplication; ohnolog; divergence; colinearity
18.  Following Tetraploidy in Maize, a Short Deletion Mechanism Removed Genes Preferentially from One of the Two Homeologs 
PLoS Biology  2010;8(6):e1000409.
Following genome duplication and selfish DNA expansion, maize used a heretofore unknown mechanism to shed redundant genes and functionless DNA with bias toward one of the parental genomes.
Previous work in Arabidopsis showed that after an ancient tetraploidy event, genes were preferentially removed from one of the two homeologs, a process known as fractionation. The mechanism of fractionation is unknown. We sought to determine whether such preferential, or biased, fractionation exists in maize and, if so, whether a specific mechanism could be implicated in this process. We studied the process of fractionation using two recently sequenced grass species: sorghum and maize. The maize lineage has experienced a tetraploidy since its divergence from sorghum approximately 12 million years ago, and fragments of many knocked-out genes retain enough sequence similarity to be easily identifiable. Using sorghum exons as the query sequence, we studied the fate of both orthologous genes in maize following the maize tetraploidy. We show that genes are predominantly lost, not relocated, and that single-gene loss by deletion is the rule. Based on comparisons with orthologous sorghum and rice genes, we also infer that the sequences present before the deletion events were flanked by short direct repeats, a signature of intra-chromosomal recombination. Evidence of this deletion mechanism is found 2.3 times more frequently on one of the maize homeologs, consistent with earlier observations of biased fractionation. The over-fractionated homeolog is also a greater than 3-fold better target for transposon removal, but does not have an observably higher synonymous base substitution rate, nor could we find differentially placed methylation domains. We conclude that fractionation is indeed biased in maize and that intra-chromosomal or possibly a similar illegitimate recombination is the primary mechanism by which fractionation occurs. The mechanism of intra-chromosomal recombination explains the observed bias in both gene and transposon loss in the maize lineage. The existence of fractionation bias demonstrates that the frequency of deletion is modulated. Among the evolutionary benefits of this deletion/fractionation mechanism is bulk DNA removal and the generation of novel combinations of regulatory sequences and coding regions.
Author Summary
All genomes can accumulate dispensable DNA in the form of duplications of individual genes or even partial or whole genome duplications. Genomes also can accumulate selfish DNA elements. Duplication events specifically are often followed by extensive gene loss. The maize genome is particularly extreme, having become tetraploid 10 million years ago and played host to massive transposon amplifications. We compared the genome of sorghum (which is homologous to the pre-tetraploid maize genome) with the two identifiable parental genomes retained in maize. The two maize genomes differ greatly: one of the parental genomes has lost 2.3 times more genes than the other, and the selfish DNA regions between genes were even more frequently lost, suggesting maize can distinguish between the parental genomes present in the original tetraploid. We show that genes are actually lost, not simply relocated. Deletions were rarely longer than a single gene, and occurred between repeated DNA sequences, suggesting mis-recombination as a mechanism of gene removal. We hypothesize an epigenetic mechanism of genome distinction to account for the selective loss. To the extent that the rate of base substitutions tracks time, we neither support nor refute claims of maize allotetraploidy. Finally, we explain why it makes sense that purifying selection in mammals does not operate at all like the gene and genome deletion program we describe here.
PMCID: PMC2893956  PMID: 20613864
19.  Genetic interactions reveal the evolutionary trajectories of duplicate genes 
Duplicate genes show significantly fewer interactions than singleton genes, and functionally similar duplicates can exhibit dissimilar profiles because common interactions are ‘hidden' due to buffering.Genetic interaction profiles provide insights into evolutionary mechanisms of duplicate retention by distinguishing duplicates under dosage selection from those retained because of some divergence in function.The genetic interactions of duplicate genes evolve in an extremely asymmetric way and the directionality of this asymmetry correlates well with other evolutionary properties of duplicate genes.Genetic interaction profiles can be used to elucidate the divergent function of specific duplicate pairs.
Gene duplication and divergence serves as a primary source for new genes and new functions, and as such has broad implications on the evolutionary process. Duplicate genes within S. cerevisiae have been shown to retain a high degree of similarity with regard to many of their functional properties (Papp et al, 2004; Guan et al, 2007; Wapinski et al, 2007; Musso et al, 2008), and perturbation of duplicate genes has been shown to result in smaller fitness defects than singleton genes (Gu et al, 2003; DeLuna et al, 2008; Dean et al, 2008; Musso et al, 2008). Individual genetic interactions between pairs of genes and profiles of such interactions across the entire genome provide a new context in which to examine the properties of duplicate compensation.
In this study we use the most recent and comprehensive set of genetic interactions in yeast produced to date (Costanzo et al, 2010) to address questions of duplicate retention and redundancy. We show that the ability for duplicate genes to buffer the deletion of a partner has three main consequences. First it agrees with previous work demonstrating that a high proportion of duplicate pairs are synthetic lethal, a classic indication of the ability to buffer one another functionally (DeLuna et al, 2008; Dean et al, 2008; Musso et al, 2008). Second, it reduces the number of genetic interactions observed between duplicate genes and the rest of the genome by masking interactions relating to common function from experimental detection. Third, this buffering of common interactions serves to reduce profile similarity in spite of common function (Figure 1). The compensatory ability of functionally similar duplicates buffers genetic interactions related to their common function (reducing the number of genetic interactions overall), while allowing the measurement of interactions related to any divergent function. Thus, even functionally similar duplicates may have dissimilar genetic interaction profiles. As previously surmised (Ihmels et al, 2007), duplicate genes under selection for dosage amplification have differing profile characteristics. We show that dosage-mediated duplicates have much higher genetic interaction profile similarity than do other duplicate pairs. Furthermore, we show in a comparison with local neighbors on a protein–protein interaction (PPI) network, that although dosage-mediated duplicates more often have higher similarity to each other than they do to their neighbors, the reverse is true for duplicates in general. That is, slightly divergent duplicate genes more often exhibit a higher similarity with a common neighbor on the PPI network than they do with each other, and that observation is consistent with the idea that common interactions are buffered while interactions corresponding to divergent functions are observed.
We then asked whether duplicates' genetic interactions that are not buffered appear in a symmetric or an asymmetric fashion. Previous work has established asymmetric patterns with regard to PPI degree (Wagner, 2002; He and Zhang, 2005), sequence divergence (Conant and Wagner, 2003; Zhang et al, 2003; Kellis et al, 2004; Scannell and Wolfe, 2008) and expression patterns (Gu et al, 2002b; Tirosh and Barkai, 2007). Although genetic interactions are further removed from mechanism than protein–protein interactions, for example, they do offer a more direct measurement of functional consequence and, thus, may give a better indication of the functional differences between a duplicate pair. We found that duplicates exhibit a strikingly asymmetric pattern of genetic interactions, with the ratio of interactions between sisters commonly exceeding 7:1 (Figure 4A). The observations differ significantly from random simulations in which genetic interactions were redistributed between sisters with equal probability (Figure 4A). Moreover, the directionality of this interaction asymmetry agrees with other physiological properties of duplicate pairs. For example, the sister with more genetic interactions also tends to have more protein–protein interactions and also tends to evolve at a slower rate (Figure 4B).
Genetic interaction degree and profiles can be used to understand the functional divergence of particular duplicates pairs. As a case example, we consider the whole-genome-duplication pair CIK1–VIK1. Each of these genes encode proteins that form distinct heterodimeric complexes with the microtubule motor protein Kar3 (Manning et al, 1999). Although each of these proteins depend on a direct physical interaction with Kar3, Cik1 has a much higher profile similarity to Kar3 than does Vik1 (r=0.5 and r=0.3, respectively). Consistent with its higher similarity, Δcik1 and Δkar3 exhibit several similar phenotypes, including abnormally short spindles, chromosome loss and delayed cell cycle progression (Page et al, 1994; Manning et al, 1999). In contrast, a Δvik1 mutant strain exhibits no overt phenotype (Manning et al, 1999).
The characterization of functional redundancy and divergence between duplicate genes is an important step in understanding the evolution of genetic systems. Large-scale genetic network analysis in Saccharomyces cerevisiae provides a powerful perspective for addressing these questions through quantitative measurements of genetic interactions between pairs of duplicated genes, and more generally, through the study of genome-wide genetic interaction profiles associated with duplicated genes. We show that duplicate genes exhibit fewer genetic interactions than other genes because they tend to buffer one another functionally, whereas observed interactions are non-overlapping and reflect their divergent roles. We also show that duplicate gene pairs are highly imbalanced in their number of genetic interactions with other genes, a pattern that appears to result from asymmetric evolution, such that one duplicate evolves or degrades faster than the other and often becomes functionally or conditionally specialized. The differences in genetic interactions are predictive of differences in several other evolutionary and physiological properties of duplicate pairs.
PMCID: PMC3010121  PMID: 21081923
duplicate genes; functional divergence; genetic interactions; paralogs; Saccharomyces cerevisiae
20.  Molecular evolution of the duplicated TFIIAγ genes in Oryzeae and its relatives 
Gene duplication provides raw genetic materials for evolutionary novelty and adaptation. The evolutionary fate of duplicated transcription factor genes is less studied although transcription factor gene plays important roles in many biological processes. TFIIAγ is a small subunit of TFIIA that is one of general transcription factors required by RNA polymerase II. Previous studies identified two TFIIAγ-like genes in rice genome and found that these genes either conferred resistance to rice bacterial blight or could be induced by pathogen invasion, raising the question as to their functional divergence and evolutionary fates after gene duplication.
We reconstructed the evolutionary history of the TFIIAγ genes from main lineages of angiosperms and demonstrated that two TFIIAγ genes (TFIIAγ1 and TFIIAγ5) arose from a whole genome duplication that happened in the common ancestor of grasses. Likelihood-based analyses with branch, codon, and branch-site models showed no evidence of positive selection but a signature of relaxed selective constraint after the TFIIAγ duplication. In particular, we found that the nonsynonymous/synonymous rate ratio (ω = dN/dS) of the TFIIAγ1 sequences was two times higher than that of TFIIAγ5 sequences, indicating highly asymmetric rates of protein evolution in rice tribe and its relatives, with an accelerated rate of TFIIAγ1 gene. Our expression data and EST database search further indicated that after whole genome duplication, the expression of TFIIAγ1 gene was significantly reduced while TFIIAγ5 remained constitutively expressed and maintained the ancestral role as a subunit of the TFIIA complex.
The evolutionary fate of TFIIAγ duplicates is not consistent with the neofunctionalization model that predicts that one of the duplicated genes acquires a new function because of positive Darwinian selection. Instead, we suggest that subfunctionalization might be involved in TFIIAγ evolution in grasses. The fact that both TFIIAγ1 and TFIIAγ5 genes were effectively involved in response to biotic or abiotic factors might be explained by either Dykhuizen-Hartl effect or buffering hypothesis.
PMCID: PMC2887407  PMID: 20438643
21.  BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics 
Nucleic Acids Research  2004;32(Database issue):D377-D382.
Rice is a major food staple for the world’s population and serves as a model species in cereal genome research. The Beijing Genomics Institute (BGI) has long been devoting itself to sequencing, information analysis and biological research of the rice and other crop genomes. In order to facilitate the application of the rice genomic information and to provide a foundation for functional and evolutionary studies of other important cereal crops, we implemented our Rice Information System (BGI-RIS), the most up-to-date integrated information resource as well as a workbench for comparative genomic analysis. In addition to comprehensive data from Oryza sativa L. ssp. indica sequenced by BGI, BGI-RIS also hosts carefully curated genome information from Oryza sativa L. ssp. japonica and EST sequences available from other cereal crops. In this resource, sequence contigs of indica (93-11) have been further assembled into Mbp-sized scaffolds and anchored onto the rice chromosomes referenced to physical/genetic markers, cDNAs and BAC-end sequences. We have annotated the rice genomes for gene content, repetitive elements, gene duplications (tandem and segmental) and single nucleotide polymorphisms between rice subspecies. Designed as a basic platform, BGI-RIS presents the sequenced genomes and related information in systematic and graphical ways for the convenience of in-depth comparative studies (
PMCID: PMC308819  PMID: 14681438
22.  The 3,000 rice genomes project 
GigaScience  2014;3:7.
Rice, Oryza sativa L., is the staple food for half the world’s population. By 2030, the production of rice must increase by at least 25% in order to keep up with global population growth and demand. Accelerated genetic gains in rice improvement are needed to mitigate the effects of climate change and loss of arable land, as well as to ensure a stable global food supply.
We resequenced a core collection of 3,000 rice accessions from 89 countries. All 3,000 genomes had an average sequencing depth of 14×, with average genome coverages and mapping rates of 94.0% and 92.5%, respectively. From our sequencing efforts, approximately 18.9 million single nucleotide polymorphisms (SNPs) in rice were discovered when aligned to the reference genome of the temperate japonica variety, Nipponbare. Phylogenetic analyses based on SNP data confirmed differentiation of the O. sativa gene pool into 5 varietal groups – indica, aus/boro, basmati/sadri, tropical japonica and temperate japonica.
Here, we report an international resequencing effort of 3,000 rice genomes. This data serves as a foundation for large-scale discovery of novel alleles for important rice phenotypes using various bioinformatics and/or genetic approaches. It also serves to understand the genomic diversity within O. sativa at a higher level of detail. With the release of the sequencing data, the project calls for the global rice community to take advantage of this data as a foundation for establishing a global, public rice genetic/genomic database and information platform for advancing rice breeding technology for future rice improvement.
PMCID: PMC4035669  PMID: 24872877
Oryza sativa; Genetic resources; Genome diversity; Sequence variants; Next generation sequencing
23.  A transgenic system for generation of transposon Ac/Ds-induced chromosome rearrangements in rice 
The maize Activator (Ac)/Dissociation (Ds) transposable element system has been used in a variety of plants for insertional mutagenesis. Ac/Ds elements can also generate genome rearrangements via alternative transposition reactions which involve the termini of closely linked transposons. Here, we introduced a transgene containing reverse-oriented Ac/Ds termini together with an Ac transposase gene into rice (Oryza sativa ssp. japonica cv. Nipponbare). Among the transgenic progeny, we identified and characterized 25 independent genome rearrangements at three different chromosomal loci. The rearrangements include chromosomal deletions and inversions and one translocation. Most of the deletions occurred within the T-DNA region, but two cases showed the loss of 72 kilobase pairs (kb) and 79 kb of rice genomic DNA flanking the transgene. In addition to deletions, we obtained chromosomal inversions ranging in size from less than 10 kb (within the transgene DNA) to over 1 million base pairs (Mb). For 11 inversions, we cloned and sequenced both inversion breakpoints; in all 11 cases, the inversion junctions contained the typical 8 base pairs (bp) Ac/Ds target site duplications, confirming their origin as transposition products. Together, our results indicate that alternative Ac/Ds transposition can be an efficient tool for functional genomics and chromosomal manipulation in rice.
Electronic supplementary material
The online version of this article (doi:10.1007/s00122-012-1925-4) contains supplementary material, which is available to authorized users.
PMCID: PMC3470690  PMID: 22798058
24.  Genome-Wide Distribution, Organisation and Functional Characterization of Disease Resistance and Defence Response Genes across Rice Species 
PLoS ONE  2015;10(4):e0125964.
The resistance (R) genes and defense response (DR) genes have become very important resources for the development of disease resistant cultivars. In the present investigation, genome-wide identification, expression, phylogenetic and synteny analysis was done for R and DR-genes across three species of rice viz: Oryza sativa ssp indica cv 93-11, Oryza sativa ssp japonica and wild rice species, Oryza brachyantha. We used the in silico approach to identify and map 786 R -genes and 167 DR-genes, 672 R-genes and 142 DR-genes, 251 R-genes and 86 DR-genes in the japonica, indica and O. brachyanth a genomes, respectively. Our analysis showed that 60.5% and 55.6% of the R-genes are tandemly repeated within clusters and distributed over all the rice chromosomes in indica and japonica genomes, respectively. The phylogenetic analysis along with motif distribution shows high degree of conservation of R- and DR-genes in clusters. In silico expression analysis of R-genes and DR-genes showed more than 85% were expressed genes showing corresponding EST matches in the databases. This study gave special emphasis on mechanisms of gene evolution and duplication for R and DR genes across species. Analysis of paralogs across rice species indicated 17% and 4.38% R-genes, 29% and 11.63% DR-genes duplication in indica and Oryza brachyantha, as compared to 20% and 26% duplication of R-genes and DR-genes in japonica respectively. We found that during the course of duplication only 9.5% of R- and DR-genes changed their function and rest of the genes have maintained their identity. Syntenic relationship across three genomes inferred that more orthology is shared between indica and japonica genomes as compared to brachyantha genome. Genome wide identification of R-genes and DR-genes in the rice genome will help in allele mining and functional validation of these genes, and to understand molecular mechanism of disease resistance and their evolution in rice and related species.
PMCID: PMC4406684  PMID: 25902056
25.  Dose–Sensitivity, Conserved Non-Coding Sequences, and Duplicate Gene Retention Through Multiple Tetraploidies in the Grasses 
Whole genome duplications, or tetraploidies, are an important source of increased gene content. Following whole genome duplication, duplicate copies of many genes are lost from the genome. This loss of genes is biased both in the classes of genes deleted and the subgenome from which they are lost. Many or all classes are genes preferentially retained as duplicate copies are engaged in dose sensitive protein–protein interactions, such that deletion of any one duplicate upsets the status quo of subunit concentrations, and presumably lowers fitness as a result. Transcription factors are also preferentially retained following every whole genome duplications studied. This has been explained as a consequence of protein–protein interactions, just as for other highly retained classes of genes. We show that the quantity of conserved noncoding sequences (CNSs) associated with genes predicts the likelihood of their retention as duplicate pairs following whole genome duplication. As many CNSs likely represent binding sites for transcriptional regulators, we propose that the likelihood of gene retention following tetraploidy may also be influenced by dose–sensitive protein–DNA interactions between the regulatory regions of CNS-rich genes – nicknamed bigfoot genes – and the proteins that bind to them. Using grass genomes, we show that differential loss of CNSs from one member of a pair following the pre-grass tetraploidy reduces its chance of retention in the subsequent maize lineage tetraploidy.
PMCID: PMC3355796  PMID: 22645525
conserved non-coding sequence; polyploidy; fractionation; gene dosage; gene regulation

Results 1-25 (798072)