Search tips
Search criteria

Results 1-7 (7)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
author:("Li, xiangtan")
1.  Evolutionary patterns of DNA base composition and correlation to polymorphisms in DNA repair systems 
Nucleic Acids Research  2015;43(7):3614-3625.
DNA base composition is a fundamental genome feature. However, the evolutionary pattern of base composition and its potential causes have not been well understood. Here, we report findings from comparative analysis of base composition at the whole-genome level across 2210 species, the polymorphic-site level across eight population comparison sets, and the mutation-site level in 12 mutation-tracking experiments. We first demonstrate that base composition follows the individual-strand base equality rule at the genome, chromosome and polymorphic-site levels. More intriguingly, clear separation of base-composition values calculated across polymorphic sites was consistently observed between basal and derived groups, suggesting common underlying mechanisms. Individuals in the derived groups show an A&T-increase/G&C-decrease pattern compared with the basal groups. Spontaneous and induced mutation experiments indicated these patterns of base composition change can emerge across mutation sites. With base-composition across polymorphic sites as a genome phenotype, genome scans with human 1000 Genomes and HapMap3 data identified a set of significant genomic regions enriched with Gene Ontology terms for DNA repair. For three DNA repair genes (BRIP1, PMS2P3 and TTDN), ENCODE data provided evidence for interaction between genomic regions containing these genes and regions containing the significant SNPs. Our findings provide insights into the mechanisms of genome evolution.
PMCID: PMC4402523  PMID: 25765652
2.  Parallel Domestication of the Shattering1 Genes in Cereals 
Nature genetics  2012;44(6):720-724.
A key step during crop domestication is the loss of seed shattering. Here we show that seed shattering in sorghum is controlled by a single gene, Shattering1 (Sh1), which encodes a YABBY transcription factor. Domesticated sorghums harbor three different mutations at the Sh1 locus. Variants at regulatory sites in the promoter and intronic regions lead to a low level of expression, a 2.2-kb fragment deletion causes a truncated transcript that lacks the second and third exons, and a GT-to-GG splicing variant in the intron 4 results in removal of the exon 4. The distributions of these non-shattering haplotypes among sorghum landraces suggest three independent origins. The function of the rice ortholog (OsSh1) was subsequently validated with a shattering resistant mutant, and two maize orthologs (ZmSh1-1 and ZmSh1-5.1+ZmSh1-5.2) were verified with a large mapping population. Our results indicate that Sh1 genes for seed shattering were under parallel selection during sorghum, rice, and maize domestication.
PMCID: PMC3532051  PMID: 22581231
3.  A Fast and Efficient Approach for Genomic Selection with High-Density Markers 
G3: Genes|Genomes|Genetics  2012;2(10):1179-1184.
Recent advances in high-throughput genotyping have motivated genomic selection using high-density markers. However, an increasingly large number of markers brings up both statistical and computational issues and makes it difficult to estimate the breeding values. We propose to apply the penalized orthogonal-components regression (POCRE) method to estimate breeding values. As a supervised dimension reduction method, POCRE sequentially constructs linear combinations of markers, i.e. orthogonal components, such that these components are most closely correlated to the phenotype. Such a dimension reduction is able to group highly correlated predictors and allows for collinear or nearly collinear markers. Different from BayesB, which predetermines hyperparameters, POCRE uses an empirical Bayes thresholding method to obtain data-driven optimal hyperparameters and effectively select important markers when constructing each component. Demonstrated through simulation studies, POCRE greatly reduces the computing time compared with BayesB. On the other hand, unlike fBayesB which slightly sacrifices prediction accuracy for fast computation, POCRE provides similar or even better accuracy of predicting breeding values than BayesB in both simulation studies and real data analyses.
PMCID: PMC3464110  PMID: 23050228
genotypic estimate of breeding values (GEBV); genomic selection; penalized orthogonal-components regression (POCRE); phenotypic estimate of breeding values (PEBV); GenPred; Shared data resources
4.  Integrating Rare-Variant Testing, Function Prediction, and Gene Network in Composite Resequencing-Based Genome-Wide Association Studies (CR-GWAS) 
G3: Genes|Genomes|Genetics  2011;1(3):233-243.
High-density array-based genome-wide association studies (GWAS) are complemented by exome sequencing and whole-genome resequencing-based association studies. Here we present a composite resequencing-based genome-wide association study (CR-GWAS) strategy that systematically exploits collective biological information and analytical tools for a robust analysis. We showcased the utility of this strategy by using Arabidopsis (Arabidopsis thaliana) resequencing data. Bioinformatic predictions of biological function alteration at each locus were integrated into the process of association testing of both common and rare variants for complex traits with a suite of statistics. Significant signals were then filtered with a priori candidate loci generated from genome database and gene network models to obtain a posteriori candidate loci. A probabilistic gene network (AraNet) that interrogates network neighborhoods of genes was then used to expand the filtering power to examine the significant testing signals. Using this strategy, we confirmed the known true positives and identified several new promising associations. Promising genes (AP1, FCA, FRI, FLC, FLM, SPL5, FY, and DCL2) were shown to control for flowering time through either common variants or rare variants within a diverse set of Arabidopsis accessions. Although many of these candidate genes were cloned earlier with mutational studies, identifying their allele variation contribution to overall phenotypic variation among diverse natural accessions is critical. Our rare allele testing established a greater number of connections than previous analyses in which this issue was not addressed. More importantly, our results demonstrated the potential of integrating various biological, statistical, and bioinformatic tools into complex trait dissection.
PMCID: PMC3276137  PMID: 22384334
complex trait dissection; association mapping; rare allele; mixed model
5.  Single-nucleotide polymorphism discovery by high-throughput sequencing in sorghum 
BMC Genomics  2011;12:352.
Eight diverse sorghum (Sorghum bicolor L. Moench) accessions were subjected to short-read genome sequencing to characterize the distribution of single-nucleotide polymorphisms (SNPs). Two strategies were used for DNA library preparation. Missing SNP genotype data were imputed by local haplotype comparison. The effect of library type and genomic diversity on SNP discovery and imputation are evaluated.
Alignment of eight genome equivalents (6 Gb) to the public reference genome revealed 283,000 SNPs at ≥82% confirmation probability. Sequencing from libraries constructed to limit sequencing to start at defined restriction sites led to genotyping 10-fold more SNPs in all 8 accessions, and correctly imputing 11% more missing data, than from semirandom libraries. The SNP yield advantage of the reduced-representation method was less than expected, since up to one fifth of reads started at noncanonical restriction sites and up to one third of restriction sites predicted in silico to yield unique alignments were not sampled at near-saturation. For imputation accuracy, the availability of a genomically similar accession in the germplasm panel was more important than panel size or sequencing coverage.
A sequence quantity of 3 million 50-base reads per accession using a BsrFI library would conservatively provide satisfactory genotyping of 96,000 sorghum SNPs. For most reliable SNP-genotype imputation in shallowly sequenced genomes, germplasm panels should consist of pairs or groups of genomically similar entries. These results may help in designing strategies for economical genotyping-by-sequencing of large numbers of plant accessions.
PMCID: PMC3146956  PMID: 21736744
6.  Chromosome Size in Diploid Eukaryotic Species Centers on the Average Length with a Conserved Boundary 
Molecular Biology and Evolution  2011;28(6):1901-1911.
Understanding genome and chromosome evolution is important for understanding genetic inheritance and evolution. Universal events comprising DNA replication, transcription, repair, mobile genetic element transposition, chromosome rearrangements, mitosis, and meiosis underlie inheritance and variation of living organisms. Although the genome of a species as a whole is important, chromosomes are the basic units subjected to genetic events that coin evolution to a large extent. Now many complete genome sequences are available, we can address evolution and variation of individual chromosomes across species. For example, “How are the repeat and nonrepeat proportions of genetic codes distributed among different chromosomes in a multichromosome species?” “Is there a general rule behind the intuitive observation that chromosome lengths tend to be similar in a species, and if so, can we generalize any findings in chromosome content and size across different taxonomic groups?” Here, we show that chromosomes within a species do not show dramatic fluctuation in their content of mobile genetic elements as the proliferation of these elements increases from unicellular eukaryotes to vertebrates. Furthermore, we demonstrate that, notwithstanding the remarkable plasticity, there is an upper limit to chromosome-size variation in diploid eukaryotes with linear chromosomes. Strikingly, variation in chromosome size for 886 chromosomes in 68 eukaryotic genomes (including 22 human autosomes) can be viably captured by a single model, which predicts that the vast majority of the chromosomes in a species are expected to have a base pair length between 0.4035 and 1.8626 times the average chromosome length. This conserved boundary of chromosome-size variation, which prevails across a wide taxonomic range with few exceptions, indicates that cellular, molecular, and evolutionary mechanisms, possibly together, confine the chromosome lengths around a species-specific average chromosome length.
PMCID: PMC3098514  PMID: 21239390
chromosome size; genome evolution; evolutionary modeling
7.  The Genomes of Oryza sativa: A History of Duplications 
Yu, Jun | Wang, Jun | Lin, Wei | Li, Songgang | Li, Heng | Zhou, Jun | Ni, Peixiang | Dong, Wei | Hu, Songnian | Zeng, Changqing | Zhang, Jianguo | Zhang, Yong | Li, Ruiqiang | Xu, Zuyuan | Li, Shengting | Li, Xianran | Zheng, Hongkun | Cong, Lijuan | Lin, Liang | Yin, Jianning | Geng, Jianing | Li, Guangyuan | Shi, Jianping | Liu, Juan | Lv, Hong | Li, Jun | Wang, Jing | Deng, Yajun | Ran, Longhua | Shi, Xiaoli | Wang, Xiyin | Wu, Qingfa | Li, Changfeng | Ren, Xiaoyu | Wang, Jingqiang | Wang, Xiaoling | Li, Dawei | Liu, Dongyuan | Zhang, Xiaowei | Ji, Zhendong | Zhao, Wenming | Sun, Yongqiao | Zhang, Zhenpeng | Bao, Jingyue | Han, Yujun | Dong, Lingli | Ji, Jia | Chen, Peng | Wu, Shuming | Liu, Jinsong | Xiao, Ying | Bu, Dongbo | Tan, Jianlong | Yang, Li | Ye, Chen | Zhang, Jingfen | Xu, Jingyi | Zhou, Yan | Yu, Yingpu | Zhang, Bing | Zhuang, Shulin | Wei, Haibin | Liu, Bin | Lei, Meng | Yu, Hong | Li, Yuanzhe | Xu, Hao | Wei, Shulin | He, Ximiao | Fang, Lijun | Zhang, Zengjin | Zhang, Yunze | Huang, Xiangang | Su, Zhixi | Tong, Wei | Li, Jinhong | Tong, Zongzhong | Li, Shuangli | Ye, Jia | Wang, Lishun | Fang, Lin | Lei, Tingting | Chen, Chen | Chen, Huan | Xu, Zhao | Li, Haihong | Huang, Haiyan | Zhang, Feng | Xu, Huayong | Li, Na | Zhao, Caifeng | Li, Shuting | Dong, Lijun | Huang, Yanqing | Li, Long | Xi, Yan | Qi, Qiuhui | Li, Wenjie | Zhang, Bo | Hu, Wei | Zhang, Yanling | Tian, Xiangjun | Jiao, Yongzhi | Liang, Xiaohu | Jin, Jiao | Gao, Lei | Zheng, Weimou | Hao, Bailin | Liu, Siqi | Wang, Wen | Yuan, Longping | Cao, Mengliang | McDermott, Jason | Samudrala, Ram | Wang, Jian | Wong, Gane Ka-Shu | Yang, Huanming
PLoS Biology  2005;3(2):e38.
We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000–40,000. Only 2%–3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family.
Comparative genome sequencing of indica and japonica rice reveals that duplication of genes and genomic regions has played a major part in the evolution of grass genomes
PMCID: PMC546038  PMID: 15685292

Results 1-7 (7)