Search tips
Search criteria

Results 1-11 (11)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Construction of a high-density genetic map based on large-scale markers developed by specific length amplified fragment sequencing (SLAF-seq) and its application to QTL analysis for isoflavone content in Glycine max 
BMC Genomics  2014;15(1):1086.
Quantitative trait locus (QTL) mapping is an efficient approach to discover the genetic architecture underlying complex quantitative traits. However, the low density of molecular markers in genetic maps has limited the efficiency and accuracy of QTL mapping. In this study, specific length amplified fragment sequencing (SLAF-seq), a new high-throughput strategy for large-scale SNP discovery and genotyping based on next generation sequencing (NGS), was employed to construct a high-density soybean genetic map using recombinant inbred lines (RILs, Luheidou2 × Nanhuizao, F5:8). With this map, the consistent QTLs for isoflavone content across various environments were identified.
In total, 23 Gb of data containing 87,604,858 pair-end reads were obtained. The average coverage for each SLAF marker was 11.20-fold for the female parent, 12.51-fold for the male parent, and an average of 3.98-fold for individual RILs. Among the 116,216 high-quality SLAFs obtained, 9,948 were polymorphic. The final map consisted of 5,785 SLAFs on 20 linkage groups (LGs) and spanned 2,255.18 cM in genome size with an average distance of 0.43 cM between adjacent markers. Comparative genomic analysis revealed a relatively high collinearity of 20 LGs with the soybean reference genome. Based on this map, 41 QTLs were identified that contributed to the isoflavone content. The high efficiency and accuracy of this map were evidenced by the discovery of genes encoding isoflavone biosynthetic enzymes within these loci. Moreover, 11 of these 41 QTLs (including six novel loci) were associated with isoflavone content across multiple environments. One of them, qIF20-2, contributed to a majority of isoflavone components across various environments and explained a high amount of phenotypic variance (8.7% - 35.3%). This represents a novel major QTL underlying isoflavone content across various environments in soybean.
Herein, we reported a high-density genetic map for soybean. This map exhibited high resolution and accuracy. It will facilitate the identification of genes and QTLs underlying essential agronomic traits in soybean. The novel major QTL for isoflavone content is useful not only for further study on the genetic basis of isoflavone accumulation, but also for marker-assisted selection (MAS) in soybean breeding in the future.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-1086) contains supplementary material, which is available to authorized users.
PMCID: PMC4320444  PMID: 25494922
High-density genetic map; Isoflavone content; QTL; SLAF-seq; Soybean [Glycine max (L.) Merr.]
2.  A High-Density Genetic Map for Soybean Based on Specific Length Amplified Fragment Sequencing 
PLoS ONE  2014;9(8):e104871.
Soybean is an important oil seed crop, but very few high-density genetic maps have been published for this species. Specific length amplified fragment sequencing (SLAF-seq) is a recently developed high-resolution strategy for large scale de novo discovery and genotyping of single nucleotide polymorphisms. SLAF-seq was employed in this study to obtain sufficient markers to construct a high-density genetic map for soybean. In total, 33.10 Gb of data containing 171,001,333 paired-end reads were obtained after preprocessing. The average sequencing depth was 42.29 in the Dongnong594, 56.63 in the Charleston, and 3.92 in each progeny. In total, 164,197 high-quality SLAFs were detected, of which 12,577 SLAFs were polymorphic, and 5,308 of the polymorphic markers met the requirements for use in constructing a genetic map. The final map included 5,308 markers on 20 linkage groups and was 2,655.68 cM in length, with an average distance of 0.5 cM between adjacent markers. To our knowledge, this map has the shortest average distance of adjacent markers for soybean. We report here a high-density genetic map for soybean. The map was constructed using a recombinant inbred line population and the SLAF-seq approach, which allowed the efficient development of a large number of polymorphic markers in a short time. Results of this study will not only provide a platform for gene/quantitative trait loci fine mapping, but will also serve as a reference for molecular breeding of soybean.
PMCID: PMC4130620  PMID: 25118194
3.  Construction and Analysis of High-Density Linkage Map Using High-Throughput Sequencing Data 
PLoS ONE  2014;9(6):e98855.
Linkage maps enable the study of important biological questions. The construction of high-density linkage maps appears more feasible since the advent of next-generation sequencing (NGS), which eases SNP discovery and high-throughput genotyping of large population. However, the marker number explosion and genotyping errors from NGS data challenge the computational efficiency and linkage map quality of linkage study methods. Here we report the HighMap method for constructing high-density linkage maps from NGS data. HighMap employs an iterative ordering and error correction strategy based on a k-nearest neighbor algorithm and a Monte Carlo multipoint maximum likelihood algorithm. Simulation study shows HighMap can create a linkage map with three times as many markers as ordering-only methods while offering more accurate marker orders and stable genetic distances. Using HighMap, we constructed a common carp linkage map with 10,004 markers. The singleton rate was less than one-ninth of that generated by JoinMap4.1. Its total map distance was 5,908 cM, consistent with reports on low-density maps. HighMap is an efficient method for constructing high-density, high-quality linkage maps from high-throughput population NGS data. It will facilitate genome assembling, comparative genomic analysis, and QTL studies. HighMap is available at
PMCID: PMC4048240  PMID: 24905985
4.  SLAF-seq: An Efficient Method of Large-Scale De Novo SNP Discovery and Genotyping Using High-Throughput Sequencing 
PLoS ONE  2013;8(3):e58700.
Large-scale genotyping plays an important role in genetic association studies. It has provided new opportunities for gene discovery, especially when combined with high-throughput sequencing technologies. Here, we report an efficient solution for large-scale genotyping. We call it specific-locus amplified fragment sequencing (SLAF-seq). SLAF-seq technology has several distinguishing characteristics: i) deep sequencing to ensure genotyping accuracy; ii) reduced representation strategy to reduce sequencing costs; iii) pre-designed reduced representation scheme to optimize marker efficiency; and iv) double barcode system for large populations. In this study, we tested the efficiency of SLAF-seq on rice and soybean data. Both sets of results showed strong consistency between predicted and practical SLAFs and considerable genotyping accuracy. We also report the highest density genetic map yet created for any organism without a reference genome sequence, common carp in this case, using SLAF-seq data. We detected 50,530 high-quality SLAFs with 13,291 SNPs genotyped in 211 individual carp. The genetic map contained 5,885 markers with 0.68 cM intervals on average. A comparative genomics study between common carp genetic map and zebrafish genome sequence map showed high-quality SLAF-seq genotyping results. SLAF-seq provides a high-resolution strategy for large-scale genotyping and can be generally applicable to various species and populations.
PMCID: PMC3602454  PMID: 23527008
5.  The diploid genome sequence of an Asian individual 
Nature  2008;456(7218):60-65.
Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics.
PMCID: PMC2716080  PMID: 18987735
6.  Gene conversion in the rice genome 
BMC Genomics  2008;9:93.
Gene conversion causes a non-reciprocal transfer of genetic information between similar sequences. Gene conversion can both homogenize genes and recruit point mutations thereby shaping the evolution of multigene families. In the rice genome, the large number of duplicated genes increases opportunities for gene conversion.
To characterize gene conversion in rice, we have defined 626 multigene families in which 377 gene conversions were detected using the GENECONV program. Over 60% of the conversions we detected were between chromosomes. We found that the inter-chromosomal conversions distributed between chromosome 1 and 5, 2 and 6, and 3 and 5 are more frequent than genome average (Z-test, P < 0.05). The frequencies of gene conversion on the same chromosome decreased with the physical distance between gene conversion partners. Ka/Ks analysis indicates that gene conversion is not tightly linked to natural selection in the rice genome. To assess the contribution of segmental duplication on gene conversion statistics, we determined locations of conversion partners with respect to inter-chromosomal segment duplication. The number of conversions associated with segmentation is less than ten percent. Pseudogenes in the rice genome with low similarity to Arabidopsis genes showed greater likelihood for gene conversion than those with high similarity to Arabidopsis genes. Functional annotations suggest that at least 14 multigene families related to disease or bacteria resistance were involved in conversion events.
The evolution of gene families in the rice genome may have been accelerated by conversion with pseudogenes. Our analysis suggests a possible role for gene conversion in the evolution of pathogen-response genes.
PMCID: PMC2277409  PMID: 18298833
7.  FGF: A web tool for Fishing Gene Family in a whole genome database 
Nucleic Acids Research  2007;35(Web Server issue):W121-W125.
Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) program to efficiently search for and identify gene families. The FGF output displays the results as visual phylogenetic trees including information on gene structure, chromosome position, duplication fate and selective pressure. It is particularly useful to identify pseudogenes and detect changes in gene structure. FGF is freely available on a web server at
PMCID: PMC1933194  PMID: 17584790
8.  Identification and characterization of insect-specific proteins by genome data analysis 
BMC Genomics  2007;8:93.
Insects constitute the vast majority of known species with their importance including biodiversity, agricultural, and human health concerns. It is likely that the successful adaptation of the Insecta clade depends on specific components in its proteome that give rise to specialized features. However, proteome determination is an intensive undertaking. Here we present results from a computational method that uses genome analysis to characterize insect and eukaryote proteomes as an approximation complementary to experimental approaches.
Homologs in common to Drosophila melanogaster, Anopheles gambiae, Bombyx mori, Tribolium castaneum, and Apis mellifera were compared to the complete genomes of three non-insect eukaryotes (opisthokonts) Homo sapiens, Caenorhabditis elegans and Saccharomyces cerevisiae. This operation yielded 154 groups of orthologous proteins in Drosophila to be insect-specific homologs; 466 groups were determined to be common to eukaryotes (represented by three opisthokonts). ESTs from the hemimetabolous insect Locust migratoria were also considered in order to approximate their corresponding genes in the insect-specific homologs. Stress and stimulus response proteins were found to constitute a higher fraction in the insect-specific homologs than in the homologs common to eukaryotes.
The significant representation of stress response and stimulus response proteins in proteins determined to be insect-specific, along with specific cuticle and pheromone/odorant binding proteins, suggest that communication and adaptation to environments may distinguish insect evolution relative to other eukaryotes. The tendency for low Ka/Ks ratios in the insect-specific protein set suggests purifying selection pressure. The generally larger number of paralogs in the insect-specific proteins may indicate adaptation to environment changes. Instances in our insect-specific protein set have been arrived at through experiments reported in the literature, supporting the accuracy of our approach.
PMCID: PMC1852559  PMID: 17407609
9.  WEGO: a web tool for plotting GO annotations 
Nucleic Acids Research  2006;34(Web Server issue):W293-W297.
Unified, structured vocabularies and classifications freely provided by the Gene Ontology (GO) Consortium are widely accepted in most of the large scale gene annotation projects. Consequently, many tools have been created for use with the GO ontologies. WEGO (Web Gene Ontology Annotation Plot) is a simple but useful tool for visualizing, comparing and plotting GO annotation results. Different from other commercial software for creating chart, WEGO is designed to deal with the directed acyclic graph structure of GO to facilitate histogram creation of GO annotation results. WEGO has been used widely in many important biological research projects, such as the rice genome project and the silkworm genome project. It has become one of the daily tools for downstream gene annotation analysis, especially when performing comparative genomics tasks. WEGO, along with the two other tools, namely External to GO Query and GO Archive Query, are freely available for all users at . There are two available mirror sites at and . Any suggestions are welcome at
PMCID: PMC1538768  PMID: 16845012
10.  Draft genome of the kiwifruit Actinidia chinensis 
Nature Communications  2013;4:2640.
The kiwifruit (Actinidia chinensis) is an economically and nutritionally important fruit crop with remarkably high vitamin C content. Here we report the draft genome sequence of a heterozygous kiwifruit, assembled from ~140-fold next-generation sequencing data. The assembled genome has a total length of 616.1 Mb and contains 39,040 genes. Comparative genomic analysis reveals that the kiwifruit has undergone an ancient hexaploidization event (γ) shared by core eudicots and two more recent whole-genome duplication events. Both recent duplication events occurred after the divergence of kiwifruit from tomato and potato and have contributed to the neofunctionalization of genes involved in regulating important kiwifruit characteristics, such as fruit vitamin C, flavonoid and carotenoid metabolism. As the first sequenced species in the Ericales, the kiwifruit genome sequence provides a valuable resource not only for biological discovery and crop improvement but also for evolutionary and comparative genomics analysis, particularly in the asterid lineage.
The kiwifruit is an economically and nutritionally important fruit crop with high vitamin C content. Here, the authors report the draft genome sequence of a heterozygous kiwifruit and through comparative genomic analysis provide valuable insight into kiwifruit evolution.
PMCID: PMC4089393  PMID: 24136039
11.  The Genomes of Oryza sativa: A History of Duplications 
Yu, Jun | Wang, Jun | Lin, Wei | Li, Songgang | Li, Heng | Zhou, Jun | Ni, Peixiang | Dong, Wei | Hu, Songnian | Zeng, Changqing | Zhang, Jianguo | Zhang, Yong | Li, Ruiqiang | Xu, Zuyuan | Li, Shengting | Li, Xianran | Zheng, Hongkun | Cong, Lijuan | Lin, Liang | Yin, Jianning | Geng, Jianing | Li, Guangyuan | Shi, Jianping | Liu, Juan | Lv, Hong | Li, Jun | Wang, Jing | Deng, Yajun | Ran, Longhua | Shi, Xiaoli | Wang, Xiyin | Wu, Qingfa | Li, Changfeng | Ren, Xiaoyu | Wang, Jingqiang | Wang, Xiaoling | Li, Dawei | Liu, Dongyuan | Zhang, Xiaowei | Ji, Zhendong | Zhao, Wenming | Sun, Yongqiao | Zhang, Zhenpeng | Bao, Jingyue | Han, Yujun | Dong, Lingli | Ji, Jia | Chen, Peng | Wu, Shuming | Liu, Jinsong | Xiao, Ying | Bu, Dongbo | Tan, Jianlong | Yang, Li | Ye, Chen | Zhang, Jingfen | Xu, Jingyi | Zhou, Yan | Yu, Yingpu | Zhang, Bing | Zhuang, Shulin | Wei, Haibin | Liu, Bin | Lei, Meng | Yu, Hong | Li, Yuanzhe | Xu, Hao | Wei, Shulin | He, Ximiao | Fang, Lijun | Zhang, Zengjin | Zhang, Yunze | Huang, Xiangang | Su, Zhixi | Tong, Wei | Li, Jinhong | Tong, Zongzhong | Li, Shuangli | Ye, Jia | Wang, Lishun | Fang, Lin | Lei, Tingting | Chen, Chen | Chen, Huan | Xu, Zhao | Li, Haihong | Huang, Haiyan | Zhang, Feng | Xu, Huayong | Li, Na | Zhao, Caifeng | Li, Shuting | Dong, Lijun | Huang, Yanqing | Li, Long | Xi, Yan | Qi, Qiuhui | Li, Wenjie | Zhang, Bo | Hu, Wei | Zhang, Yanling | Tian, Xiangjun | Jiao, Yongzhi | Liang, Xiaohu | Jin, Jiao | Gao, Lei | Zheng, Weimou | Hao, Bailin | Liu, Siqi | Wang, Wen | Yuan, Longping | Cao, Mengliang | McDermott, Jason | Samudrala, Ram | Wang, Jian | Wong, Gane Ka-Shu | Yang, Huanming
PLoS Biology  2005;3(2):e38.
We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000–40,000. Only 2%–3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family.
Comparative genome sequencing of indica and japonica rice reveals that duplication of genes and genomic regions has played a major part in the evolution of grass genomes
PMCID: PMC546038  PMID: 15685292

Results 1-11 (11)