1.  The sequence and de novo assembly of the giant panda genome 
Li, Ruiqiang | Fan, Wei | Tian, Geng | Zhu, Hongmei | He, Lin | Cai, Jing | Huang, Quanfei | Cai, Qingle | Li, Bo | Bai, Yinqi | Zhang, Zhihe | Zhang, Yaping | Wang, Wen | Li, Jun | Wei, Fuwen | Li, Heng | Jian, Min | Li, Jianwen | Zhang, Zhaolei | Nielsen, Rasmus | Li, Dawei | Gu, Wanjun | Yang, Zhentao | Xuan, Zhaoling | Ryder, Oliver A. | Leung, Frederick Chi-Ching | Zhou, Yan | Cao, Jianjun | Sun, Xiao | Fu, Yonggui | Fang, Xiaodong | Guo, Xiaosen | Wang, Bo | Hou, Rong | Shen, Fujun | Mu, Bo | Ni, Peixiang | Lin, Runmao | Qian, Wubin | Wang, Guodong | Yu, Chang | Nie, Wenhui | Wang, Jinhuan | Wu, Zhigang | Liang, Huiqing | Min, Jiumeng | Wu, Qi | Cheng, Shifeng | Ruan, Jue | Wang, Mingwei | Shi, Zhongbin | Wen, Ming | Liu, Binghang | Ren, Xiaoli | Zheng, Huisong | Dong, Dong | Cook, Kathleen | Shan, Gao | Zhang, Hao | Kosiol, Carolin | Xie, Xueying | Lu, Zuhong | Zheng, Hancheng | Li, Yingrui | Steiner, Cynthia C. | Lam, Tommy Tsan-Yuk | Lin, Siyuan | Zhang, Qinghui | Li, Guoqing | Tian, Jing | Gong, Timing | Liu, Hongde | Zhang, Dejin | Fang, Lin | Ye, Chen | Zhang, Juanbin | Hu, Wenbo | Xu, Anlong | Ren, Yuanyuan | Zhang, Guojie | Bruford, Michael W. | Li, Qibin | Ma, Lijia | Guo, Yiran | An, Na | Hu, Yujie | Zheng, Yang | Shi, Yongyong | Li, Zhiqiang | Liu, Qing | Chen, Yanling | Zhao, Jing | Qu, Ning | Zhao, Shancen | Tian, Feng | Wang, Xiaoling | Wang, Haiyin | Xu, Lizhi | Liu, Xiao | Vinar, Tomas | Wang, Yajun | Lam, Tak-Wah | Yiu, Siu-Ming | Liu, Shiping | Zhang, Hemin | Li, Desheng | Huang, Yan | Wang, Xia | Yang, Guohua | Jiang, Zhi | Wang, Junyi | Qin, Nan | Li, Li | Li, Jingxiang | Bolund, Lars | Kristiansen, Karsten | Wong, Gane Ka-Shu | Olson, Maynard | Zhang, Xiuqing | Li, Songgang | Yang, Huanming | Wang, Jian | Wang, Jun
Nature  2009;463(7279):311-317.
Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes.
PMCID: PMC3951497  PMID: 20010809
2.  The Genome of the Netherlands: design, and project goals 
Within the Netherlands a national network of biobanks has been established (Biobanking and Biomolecular Research Infrastructure-Netherlands (BBMRI-NL)) as a national node of the European BBMRI. One of the aims of BBMRI-NL is to enrich biobanks with different types of molecular and phenotype data. Here, we describe the Genome of the Netherlands (GoNL), one of the projects within BBMRI-NL. GoNL is a whole-genome-sequencing project in a representative sample consisting of 250 trio-families from all provinces in the Netherlands, which aims to characterize DNA sequence variation in the Dutch population. The parent–offspring trios include adult individuals ranging in age from 19 to 87 years (mean=53 years; SD=16 years) from birth cohorts 1910–1994. Sequencing was done on blood-derived DNA from uncultured cells and accomplished coverage was 14–15x. The family-based design represents a unique resource to assess the frequency of regional variants, accurately reconstruct haplotypes by family-based phasing, characterize short indels and complex structural variants, and establish the rate of de novo mutational events. GoNL will also serve as a reference panel for imputation in the available genome-wide association studies in Dutch and other cohorts to refine association signals and uncover population-specific variants. GoNL will create a catalog of human genetic variation in this sample that is uniquely characterized with respect to micro-geographic location and a wide range of phenotypes. The resource will be made available to the research and medical community to guide the interpretation of sequencing projects. The present paper summarizes the global characteristics of the project.
PMCID: PMC3895638  PMID: 23714750
whole-genome sequence; trio-design; population genetics
3.  Comprehensive evaluation of SNP identification with the Restriction Enzyme-based Reduced Representation Library (RRL) method 
BMC Genomics  2012;13:77.
Restriction Enzyme-based Reduced Representation Library (RRL) method represents a relatively feasible and flexible strategy used for Single Nucleotide Polymorphism (SNP) identification in different species. It has remarkable advantage of reducing the complexity of the genome by orders of magnitude. However, comprehensive evaluation for actual efficacy of SNP identification by this method is still unavailable.
In order to evaluate the efficacy of Restriction Enzyme-based RRL method, we selected Tsp 45I enzyme which covers 266 Mb flanking region of the enzyme recognition site according to in silico simulation on human reference genome, then we sequenced YH RRL after Tsp 45I treatment and obtained reads of which 80.8% were mapped to target region with an 20-fold average coverage, about 96.8% of target region was covered by at least one read and 257 K SNPs were identified in the region using SOAPsnp software.
Compared with whole genome resequencing data, we observed false discovery rate (FDR) of 13.95% and false negative rate (FNR) of 25.90%. The concordance rate of homozygote loci was over 99.8%, but that of heterozygote were only 92.56%. Repeat sequences and bases quality were proved to have a great effect on the accuracy of SNP calling, SNPs in recognition sites contributed evidently to the high FNR and the low concordance rate of heterozygote. Our results indicated that repeat masking and high stringent filter criteria could significantly decrease both FDR and FNR.
This study demonstrates that Restriction Enzyme-based RRL method was effective for SNP identification. The results highlight the important role of bias and the method-derived defects represented in this method and emphasize the special attentions noteworthy.
PMCID: PMC3305556  PMID: 22340203
4.  MicroRNA expression profiling during the life cycle of the silkworm (Bombyx mori) 
BMC Genomics  2011;12:284.
Retraction article
PMCID: PMC3228560  PMID: 21635743
5.  The DNA Methylome of Human Peripheral Blood Mononuclear Cells 
PLoS Biology  2010;8(11):e1000533.
Analysis across the genome of patterns of DNA methylation reveals a rich landscape of allele-specific epigenetic modification and consequent effects on allele-specific gene expression.
DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies.
Author Summary
Epigenetic modifications such as addition of methyl groups to cytosine in DNA play a role in regulating gene expression. To better understand these processes, knowledge of the methylation status of all cytosine bases in the genome (the methylome) is required. DNA methylation can differ between the two gene copies (alleles) in each cell. Such allele-specific methylation (ASM) can be due to parental origin of the alleles (imprinting), X chromosome inactivation in females, and other as yet unknown mechanisms. This may significantly alter the expression profile arising from different allele combinations in different individuals. Using advanced sequencing technology, we have determined the methylome of human peripheral blood mononuclear cells (PBMC). Importantly, the PBMC were obtained from the same male Han Chinese individual whose complete genome had previously been determined. This allowed us, for the first time, to study genome-wide differences in ASM. Our analysis shows that ASM in PBMC is higher than can be accounted for by regions known to undergo parent-of-origin imprinting and frequently (>80%) correlates with allele-specific expression (ASE) of the corresponding gene. In addition, our data reveal a rich landscape of epigenomic variation for 20 genomic features, including regulatory, coding, and non-coding sequences, and provide a valuable resource for future studies. Our work further establishes whole-genome sequencing as an efficient method for methylome analysis.
PMCID: PMC2976721  PMID: 21085693
6.  MicroRNAs of Bombyx mori identified by Solexa sequencing 
BMC Genomics  2010;11:148.
MicroRNA (miRNA) and other small regulatory RNAs contribute to the modulation of a large number of cellular processes. We sequenced three small RNA libraries prepared from the whole body, and the anterior-middle and posterior silk glands of Bombyx mori, with a view to expanding the repertoire of silkworm miRNAs and exploring transcriptional differences in miRNAs between segments of the silk gland.
With the aid of large-scale Solexa sequencing technology, we validated 257 unique miRNA genes, including 202 novel and 55 previously reported genes, corresponding to 324 loci in the silkworm genome. Over 30 known silkworm miRNAs were further corrected in their sequence constitutes and length. A number of reads originated from the loop regions of the precursors of two previously reported miRNAs (bmo-miR-1920 and miR-1921). Interestingly, the majority of the newly identified miRNAs were silkworm-specific, 23 unique miRNAs were widely conserved from invertebrates to vertebrates, 13 unique miRNAs were limited to invertebrates, and 32 were confined to insects. We identified 24 closely positioned clusters and 45 paralogs of miRNAs in the silkworm genome. However, sequence tags showed that paralogs or clusters were not prerequisites for coordinated transcription and accumulation. The majority of silkworm-specific miRNAs were located in transposable elements, and displayed significant differences in abundance between the anterior-middle and posterior silk gland.
Conservative analysis revealed that miRNAs can serve as phylogenetic markers and function in evolutionary signaling. The newly identified miRNAs greatly enrich the repertoire of insect miRNAs, and provide insights into miRNA evolution, biogenesis, and expression in insects. The differential expression of miRNAs in the anterior-middle and posterior silk glands supports their involvement as new levels in the regulation of the silkworm silk gland.
PMCID: PMC2838851  PMID: 20199675
7.  MicroRNA expression profiling during the life cycle of the silkworm (Bombyx mori) 
BMC Genomics  2009;10:455.
MicroRNAs (miRNAs) are expressed by a wide range of eukaryotic organisms, and function in diverse biological processes. Numerous miRNAs have been identified in Bombyx mori, but the temporal expression profiles of miRNAs corresponding to each stage transition over the entire life cycle of the silkworm remain to be established. To obtain a comprehensive overview of the correlation between miRNA expression and stage transitions, we performed a whole-life test and subsequent stage-by-stage examinations on nearly one hundred miRNAs in the silkworm.
Our results show that miRNAs display a wide variety of expression profiles over the whole life of the silkworm, including continuous expression from embryo to adult (miR-184), up-regulation over the entire life cycle (let-7 and miR-100), down-regulation over the entire life cycle (miR-124), expression associated with embryogenesis (miR-29 and miR-92), up-regulation from early 3rd instar to pupa (miR-275), and complementary pulses in expression between miR-34b and miR-275. Stage-by-stage examinations revealed further expression patterns, such as emergence at specific time-points during embryogenesis and up-regulation of miRNA groups in late embryos (miR-1 and bantam), expression associated with stage transition between instar and molt larval stages (miR-34b), expression associated with silk gland growth and spinning activity (miR-274), continuous high expression from the spinning larval to pupal and adult stages (miR-252 and miR-31a), a coordinate expression trough in day 3 pupae of both sexes (miR-10b and miR-281), up-regulation in pupal metamorphosis of both sexes (miR-29b), and down-regulation in pupal metamorphosis of both sexes (miR-275).
We present the full-scale expression profiles of miRNAs throughout the life cycle of Bombyx mori. The whole-life expression profile was further investigated via stage-by-stage analysis. Our data provide an important resource for more detailed functional analysis of miRNAs in this animal.
PMCID: PMC2761947  PMID: 19785751
8.  The diploid genome sequence of an Asian individual 
Nature  2008;456(7218):60-65.
Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics.
PMCID: PMC2716080  PMID: 18987735
9.  Identification and characterization of novel amphioxus microRNAs by Solexa sequencing 
Genome Biology  2009;10(7):R78.
An analysis of amphioxus miRNAs suggests an expansion of miRNAs played a key role in the evolution of chordates to vertebrates
microRNAs (miRNAs) are endogenous small non-coding RNAs that regulate gene expression at the post-transcriptional level. While the number of known human and murine miRNAs is continuously increasing, information regarding miRNAs from other species such as amphioxus remains limited.
We combined Solexa sequencing with computational techniques to identify novel miRNAs in the amphioxus species B. belcheri (Gray). This approach allowed us to identify 113 amphioxus miRNA genes. Among them, 55 were conserved across species and encoded 45 non-redundant mature miRNAs, whereas 58 were amphioxus-specific and encoded 53 mature miRNAs. Validation of our results with microarray and stem-loop quantitative RT-PCR revealed that Solexa sequencing is a powerful tool for miRNA discovery. Analyzing the evolutionary history of amphioxus miRNAs, we found that amphioxus possesses many miRNAs unique to chordates and vertebrates, and these may thus represent key steps in the evolutionary progression from cephalochordates to vertebrates. We also found that amphioxus is more similar to vertebrates than are tunicates with respect to their miRNA phylogenetic histories.
Taken together, our results indicate that Solexa sequencing allows the successful discovery of novel miRNAs from amphioxus with high accuracy and efficiency. More importantly, our study provides an opportunity to decipher how the elaboration of the miRNA repertoire that occurred during chordate evolution contributed to the evolution of the vertebrate body plan.
PMCID: PMC2728532  PMID: 19615057
10.  Complete Genome Sequence of the Mosquitocidal Bacterium Bacillus sphaericus C3-41 and Comparison with Those of Closely Related Bacillus Species▿ † 
Journal of Bacteriology  2008;190(8):2892-2902.
Bacillus sphaericus strain C3-41 is an aerobic, mesophilic, spore-forming bacterium that has been used with great success in mosquito control programs worldwide. Genome sequencing revealed that the complete genome of this entomopathogenic bacterium is composed of a chromosomal replicon of 4,639,821 bp and a plasmid replicon of 177,642 bp, containing 4,786 and 186 potential protein-coding sequences, respectively. Comparison of the genome with other published sequences indicated that the B. sphaericus C3-41 chromosome is most similar to that of Bacillus sp. strain NRRL B-14905, a marine species that, like B. sphaericus, is unable to metabolize polysaccharides. The lack of key enzymes and sugar transport systems in the two bacteria appears to be the main reason for this inability, and the abundance of proteolytic enzymes and transport systems may endow these bacteria with exclusive metabolic pathways for a wide variety of organic compounds and amino acids. The genes shared between B. sphaericus C3-41 and Bacillus sp. strain NRRL B-14905, including mobile genetic elements, membrane-associated proteins, and transport systems, demonstrated that these two species are a biologically and phylogenetically divergent group. Knowledge of the genome sequence of B. sphaericus C3-41 thus increases our understanding of the bacilli and may also offer prospects for future genetic improvement of this important biological control agent.
PMCID: PMC2293248  PMID: 18296527

