Search tips
Search criteria

Results 1-25 (56)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
2.  Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology 
GigaScience  2014;3(1):34.
Structural variants (SVs) are less common than single nucleotide polymorphisms and indels in the population, but collectively account for a significant fraction of genetic polymorphism and diseases. Base pair differences arising from SVs are on a much higher order (>100 fold) than point mutations; however, none of the current detection methods are comprehensive, and currently available methodologies are incapable of providing sufficient resolution and unambiguous information across complex regions in the human genome. To address these challenges, we applied a high-throughput, cost-effective genome mapping technology to comprehensively discover genome-wide SVs and characterize complex regions of the YH genome using long single molecules (>150 kb) in a global fashion.
Utilizing nanochannel-based genome mapping technology, we obtained 708 insertions/deletions and 17 inversions larger than 1 kb. Excluding the 59 SVs (54 insertions/deletions, 5 inversions) that overlap with N-base gaps in the reference assembly hg19, 666 non-gap SVs remained, and 396 of them (60%) were verified by paired-end data from whole-genome sequencing-based re-sequencing or de novo assembly sequence from fosmid data. Of the remaining 270 SVs, 260 are insertions and 213 overlap known SVs in the Database of Genomic Variants. Overall, 609 out of 666 (90%) variants were supported by experimental orthogonal methods or historical evidence in public databases. At the same time, genome mapping also provides valuable information for complex regions with haplotypes in a straightforward fashion. In addition, with long single-molecule labeling patterns, exogenous viral sequences were mapped on a whole-genome scale, and sample heterogeneity was analyzed at a new level.
Our study highlights genome mapping technology as a comprehensive and cost-effective method for detecting structural variation and studying complex regions in the human genome, as well as deciphering viral integration into the host genome.
Electronic supplementary material
The online version of this article (doi:10.1186/2047-217X-3-34) contains supplementary material, which is available to authorized users.
PMCID: PMC4322599  PMID: 25671094
Genome mapping; Structural variation; Repeat units; Epstein-Barr virus (EBV) integration
3.  Two Antarctic penguin genomes reveal insights into their evolutionary history and molecular changes related to the Antarctic environment 
GigaScience  2014;3(1):27.
Penguins are flightless aquatic birds widely distributed in the Southern Hemisphere. The distinctive morphological and physiological features of penguins allow them to live an aquatic life, and some of them have successfully adapted to the hostile environments in Antarctica. To study the phylogenetic and population history of penguins and the molecular basis of their adaptations to Antarctica, we sequenced the genomes of the two Antarctic dwelling penguin species, the Adélie penguin [Pygoscelis adeliae] and emperor penguin [Aptenodytes forsteri].
Phylogenetic dating suggests that early penguins arose ~60 million years ago, coinciding with a period of global warming. Analysis of effective population sizes reveals that the two penguin species experienced population expansions from ~1 million years ago to ~100 thousand years ago, but responded differently to the climatic cooling of the last glacial period. Comparative genomic analyses with other available avian genomes identified molecular changes in genes related to epidermal structure, phototransduction, lipid metabolism, and forelimb morphology.
Our sequencing and initial analyses of the first two penguin genomes provide insights into the timing of penguin origin, fluctuations in effective population sizes of the two penguin species over the past 10 million years, and the potential associations between these biological patterns and global climate change. The molecular changes compared with other avian genomes reflect both shared and diverse adaptations of the two penguin species to the Antarctic environment.
Electronic supplementary material
The online version of this article (doi:10.1186/2047-217X-3-27) contains supplementary material, which is available to authorized users.
PMCID: PMC4322438  PMID: 25671092
Penguins; Avian genomics; Evolution; Adaptation; Antarctica
4.  Clinical outcome of preimplantation genetic diagnosis and screening using next generation sequencing 
GigaScience  2014;3(1):30.
Next generation sequencing (NGS) is now being used for detecting chromosomal abnormalities in blastocyst trophectoderm (TE) cells from in vitro fertilized embryos. However, few data are available regarding the clinical outcome, which provides vital reference for further application of the methodology. Here, we present a clinical evaluation of NGS-based preimplantation genetic diagnosis/screening (PGD/PGS) compared with single nucleotide polymorphism (SNP) array-based PGD/PGS as a control.
A total of 395 couples participated. They were carriers of either translocation or inversion mutations, or were patients with recurrent miscarriage and/or advanced maternal age. A total of 1,512 blastocysts were biopsied on D5 after fertilization, with 1,058 blastocysts set aside for SNP array testing and 454 blastocysts for NGS testing. In the NGS cycles group, the implantation, clinical pregnancy and miscarriage rates were 52.6% (60/114), 61.3% (49/80) and 14.3% (7/49), respectively. In the SNP array cycles group, the implantation, clinical pregnancy and miscarriage rates were 47.6% (139/292), 56.7% (115/203) and 14.8% (17/115), respectively. The outcome measures of both the NGS and SNP array cycles were the same with insignificant differences. There were 150 blastocysts that underwent both NGS and SNP array analysis, of which seven blastocysts were found with inconsistent signals. All other signals obtained from NGS analysis were confirmed to be accurate by validation with qPCR. The relative copy number of mitochondrial DNA (mtDNA) for each blastocyst that underwent NGS testing was evaluated, and a significant difference was found between the copy number of mtDNA for the euploid and the chromosomally abnormal blastocysts. So far, out of 42 ongoing pregnancies, 24 babies were born in NGS cycles; all of these babies are healthy and free of any developmental problems.
This study provides the first evaluation of the clinical outcomes of NGS-based pre-implantation genetic diagnosis/screening, and shows the reliability of this method in a clinical and array-based laboratory setting. NGS provides an accurate approach to detect embryonic imbalanced segmental rearrangements, to avoid the potential risks of false signals from SNP array in this study.
Electronic supplementary material
The online version of this article (doi:10.1186/2047-217X-3-30) contains supplementary material, which is available to authorized users.
PMCID: PMC4326468
Preimplantation genetic diagnosis/screening; Next generation sequencing; Blastocyst; Cryopreserved embryo transfer; Clinical outcome
5.  Comparative Analysis of Evolutionarily Conserved Motifs of Epidermal Growth Factor Receptor 2 (HER2) Predicts Novel Potential Therapeutic Epitopes 
PLoS ONE  2014;9(9):e106448.
Overexpression of human epidermal growth factor receptor 2 (HER2) is associated with tumor aggressiveness and poor prognosis in breast cancer. With the availability of therapeutic antibodies against HER2, great strides have been made in the clinical management of HER2 overexpressing breast cancer. However, de novo and acquired resistance to these antibodies presents a serious limitation to successful HER2 targeting treatment. The identification of novel epitopes of HER2 that can be used for functional/region-specific blockade could represent a central step in the development of new clinically relevant anti-HER2 antibodies. In the present study, we present a novel computational approach as an auxiliary tool for identification of novel HER2 epitopes. We hypothesized that the structurally and linearly evolutionarily conserved motifs of the extracellular domain of HER2 (ECD HER2) contain potential druggable epitopes/targets. We employed the PROSITE Scan to detect structurally conserved motifs and PRINTS to search for linearly conserved motifs of ECD HER2. We found that the epitopes recognized by trastuzumab and pertuzumab are located in the predicted conserved motifs of ECD HER2, supporting our initial hypothesis. Considering that structurally and linearly conserved motifs can provide functional specific configurations, we propose that by comparing the two types of conserved motifs, additional druggable epitopes/targets in the ECD HER2 protein can be identified, which can be further modified for potential therapeutic application. Thus, this novel computational process for predicting or searching for potential epitopes or key target sites may contribute to epitope-based vaccine and function-selected drug design, especially when x-ray crystal structure protein data is not available.
PMCID: PMC4156330  PMID: 25192037
6.  A Survey of Overlooked Viral Infections in Biological Experiment Systems 
PLoS ONE  2014;9(8):e105348.
It is commonly accepted that there are many unknown viruses on the planet. For the known viruses, do we know their prevalence, even in our experimental systems? Here we report a virus survey using recently published small (s)RNA sequencing datasets. The sRNA reads were assembled and contigs were screened for virus homologues against the NCBI nucleotide (nt) database using the BLASTn program. To our surprise, approximately 30% (28 out of 94) of publications had highly scored viral sequences in their datasets. Among them, only two publications reported virus infections. Though viral vectors were used in some of the publications, virus sequences without any identifiable source appeared in more than 20 publications. By determining the distributions of viral reads and the antiviral RNA interference (RNAi) pathways using the sRNA profiles, we showed evidence that many of the viruses identified were indeed infecting and generated host RNAi responses. As virus infections affect many aspects of host molecular biology and metabolism, the presence and impact of viruses needs to be actively investigated in experimental systems.
PMCID: PMC4140767  PMID: 25144530
7.  The co-occurrence of mtDNA mutations on different oxidative phosphorylation subunits, not detected by haplogroup analysis, affects human longevity and is population specific 
Aging Cell  2013;13(3):401-407.
To re-examine the correlation between mtDNA variability and longevity, we examined mtDNAs from samples obtained from over 2200 ultranonagenarians (and an equal number of controls) collected within the framework of the GEHA EU project. The samples were categorized by high-resolution classification, while about 1300 mtDNA molecules (650 ultranonagenarians and an equal number of controls) were completely sequenced. Sequences, unlike standard haplogroup analysis, made possible to evaluate for the first time the cumulative effects of specific, concomitant mtDNA mutations, including those that per se have a low, or very low, impact. In particular, the analysis of the mutations occurring in different OXPHOS complex showed a complex scenario with a different mutation burden in 90+ subjects with respect to controls. These findings suggested that mutations in subunits of the OXPHOS complex I had a beneficial effect on longevity, while the simultaneous presence of mutations in complex I and III (which also occurs in J subhaplogroups involved in LHON) and in complex I and V seemed to be detrimental, likely explaining previous contradictory results. On the whole, our study, which goes beyond haplogroup analysis, suggests that mitochondrial DNA variation does affect human longevity, but its effect is heavily influenced by the interaction between mutations concomitantly occurring on different mtDNA genes.
PMCID: PMC4326891  PMID: 24341918
genetics of longevity; longevity; mitochondrial DNA; mtDNA sequencing; oxidative phosphorylation
9.  Integrated detection of both 5-mC and 5-hmC by high-throughput tag sequencing technology highlights methylation reprogramming of bivalent genes during cellular differentiation 
Epigenetics  2013;8(4):421-430.
5-methylcytosine (5-mC) can be oxidized to 5-hydroxymethylcytosine (5-hmC). Genome-wide profiling of 5-hmC thus far indicates 5-hmC may not only be an intermediate form of DNA demethylation but could also constitute an epigenetic mark per se. Here we describe a cost-effective and selective method to detect both the hydroxymethylation and methylation status of cytosines in a subset of cytosines in the human genome. This method involves the selective glucosylation of 5-hmC residues, short-Sequence tag generation and high-throughput sequencing. We tested this method by screening H9 human embryonic stem cells and their differentiated embroid body cells, and found that differential hydroxymethylation preferentially occurs in bivalent genes during cellular differentiation. Especially, our results support hydroxymethylation can regulate key transcription regulators with bivalent marks through demethylation and affect cellular decision on choosing active or inactive state of these genes upon cellular differentiation. Future application of this technology would enable us to uncover the status of methylation and hydroxymethylation in dynamic biological processes and disease development in multiple biological samples.
PMCID: PMC3674051  PMID: 23502161
HMST-Seq; differentiation; embryonic stem cells; hydroxymethylation; methylation
10.  Complete Resequencing of 40 Genomes Reveals Domestication Events and Genes in Silkworm (Bombyx) 
Science (New York, N.Y.)  2009;326(5951):433-436.
A single–base pair resolution silkworm genetic variation map was constructed from 40 domesticated and wild silkworms, each sequenced to approximately threefold coverage, representing 99.88% of the genome. We identified ∼16 million single-nucleotide polymorphisms, many indels, and structural variations. We find that the domesticated silkworms are clearly genetically differentiated from the wild ones, but they have maintained large levels of genetic variability, suggesting a short domestication event involving a large number of individuals. We also identified signals of selection at 354 candidate genes that may have been important during domestication, some of which have enriched expression in the silk gland, midgut, and testis. These data add to our understanding of the domestication processes and may have applications in devising pest control strategies and advancing the use of silkworms as efficient bioreactors.
PMCID: PMC3951477  PMID: 19713493
11.  The sequence and de novo assembly of the giant panda genome 
Li, Ruiqiang | Fan, Wei | Tian, Geng | Zhu, Hongmei | He, Lin | Cai, Jing | Huang, Quanfei | Cai, Qingle | Li, Bo | Bai, Yinqi | Zhang, Zhihe | Zhang, Yaping | Wang, Wen | Li, Jun | Wei, Fuwen | Li, Heng | Jian, Min | Li, Jianwen | Zhang, Zhaolei | Nielsen, Rasmus | Li, Dawei | Gu, Wanjun | Yang, Zhentao | Xuan, Zhaoling | Ryder, Oliver A. | Leung, Frederick Chi-Ching | Zhou, Yan | Cao, Jianjun | Sun, Xiao | Fu, Yonggui | Fang, Xiaodong | Guo, Xiaosen | Wang, Bo | Hou, Rong | Shen, Fujun | Mu, Bo | Ni, Peixiang | Lin, Runmao | Qian, Wubin | Wang, Guodong | Yu, Chang | Nie, Wenhui | Wang, Jinhuan | Wu, Zhigang | Liang, Huiqing | Min, Jiumeng | Wu, Qi | Cheng, Shifeng | Ruan, Jue | Wang, Mingwei | Shi, Zhongbin | Wen, Ming | Liu, Binghang | Ren, Xiaoli | Zheng, Huisong | Dong, Dong | Cook, Kathleen | Shan, Gao | Zhang, Hao | Kosiol, Carolin | Xie, Xueying | Lu, Zuhong | Zheng, Hancheng | Li, Yingrui | Steiner, Cynthia C. | Lam, Tommy Tsan-Yuk | Lin, Siyuan | Zhang, Qinghui | Li, Guoqing | Tian, Jing | Gong, Timing | Liu, Hongde | Zhang, Dejin | Fang, Lin | Ye, Chen | Zhang, Juanbin | Hu, Wenbo | Xu, Anlong | Ren, Yuanyuan | Zhang, Guojie | Bruford, Michael W. | Li, Qibin | Ma, Lijia | Guo, Yiran | An, Na | Hu, Yujie | Zheng, Yang | Shi, Yongyong | Li, Zhiqiang | Liu, Qing | Chen, Yanling | Zhao, Jing | Qu, Ning | Zhao, Shancen | Tian, Feng | Wang, Xiaoling | Wang, Haiyin | Xu, Lizhi | Liu, Xiao | Vinar, Tomas | Wang, Yajun | Lam, Tak-Wah | Yiu, Siu-Ming | Liu, Shiping | Zhang, Hemin | Li, Desheng | Huang, Yan | Wang, Xia | Yang, Guohua | Jiang, Zhi | Wang, Junyi | Qin, Nan | Li, Li | Li, Jingxiang | Bolund, Lars | Kristiansen, Karsten | Wong, Gane Ka-Shu | Olson, Maynard | Zhang, Xiuqing | Li, Songgang | Yang, Huanming | Wang, Jian | Wang, Jun
Nature  2009;463(7279):311-317.
Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes.
PMCID: PMC3951497  PMID: 20010809
13.  Integrative analyses of gene expression and DNA methylation profiles in breast cancer cell line models of tamoxifen-resistance indicate a potential role of cells with stem-like properties 
Breast Cancer Research : BCR  2013;15(6):R119.
Development of resistance to tamoxifen is an important clinical issue in the treatment of breast cancer. Tamoxifen resistance may be the result of acquisition of epigenetic regulation within breast cancer cells, such as DNA methylation, resulting in changed mRNA expression of genes pivotal for estrogen-dependent growth. Alternatively, tamoxifen resistance may be due to selection of pre-existing resistant cells, or a combination of the two mechanisms.
To evaluate the contribution of these possible tamoxifen resistance mechanisms, we applied modified DNA methylation-specific digital karyotyping (MMSDK) and digital gene expression (DGE) in combination with massive parallel sequencing to analyze a well-established tamoxifen-resistant cell line model (TAMR), consisting of 4 resistant and one parental cell line. Another tamoxifen-resistant cell line model system (LCC1/LCC2) was used to validate the DNA methylation and gene expression results.
Significant differences were observed in global gene expression and DNA methylation profiles between the parental tamoxifen-sensitive cell line and the 4 tamoxifen-resistant TAMR sublines. The 4 TAMR cell lines exhibited higher methylation levels as well as an inverse relationship between gene expression and DNA methylation in the promoter regions. A panel of genes, including NRIP1, HECA and FIS1, exhibited lower gene expression in resistant vs. parental cells and concurrent increased promoter CGI methylation in resistant vs. parental cell lines. A major part of the methylation, gene expression, and pathway alterations observed in the TAMR model were also present in the LCC1/LCC2 cell line model. More importantly, high expression of SOX2 and alterations of other SOX and E2F gene family members, as well as RB-related pocket protein genes in TAMR highlighted stem cell-associated pathways as being central in the resistant cells and imply that cancer-initiating cells/cancer stem-like cells may be involved in tamoxifen resistance in this model.
Our data highlight the likelihood that resistant cells emerge from cancer-initiating cells/cancer stem-like cells and imply that these cells may gain further advantage in growth via epigenetic mechanisms. Illuminating the expression and DNA methylation features of putative cancer-initiating cells/cancer stem cells may suggest novel strategies to overcome tamoxifen resistance.
PMCID: PMC4057522  PMID: 24355041
14.  High genome heterozygosity and endemic genetic recombination in the wheat stripe rust fungus 
Nature Communications  2013;4:2673.
Stripe rust, caused by Puccinia striiformis f. sp. tritici (Pst), is one of the most destructive diseases of wheat. Here we report a 110-Mb draft sequence of Pst isolate CY32, obtained using a ‘fosmid-to-fosmid’ strategy, to better understand its race evolution and pathogenesis. The Pst genome is highly heterozygous and contains 25,288 protein-coding genes. Compared with non-obligate fungal pathogens, Pst has a more diverse gene composition and more genes encoding secreted proteins. Re-sequencing analysis indicates significant genetic variation among six isolates collected from different continents. Approximately 35% of SNPs are in the coding sequence regions, and half of them are non-synonymous. High genetic diversity in Pst suggests that sexual reproduction has an important role in the origin of different regional races. Our results show the effectiveness of the ‘fosmid-to-fosmid’ strategy for sequencing dikaryotic genomes and the feasibility of genome analysis to understand race evolution in Pst and other obligate pathogens.
Stripe rust is one of the most destructive wheat diseases. Here, Zheng and colleagues report a draft genome sequence of wheat stripe rust fungus, generated using a fosmid-to-fosmid approach, and provide insight into its race evolution and pathogenesis.
PMCID: PMC3826619  PMID: 24150273
15.  Development of Transgenic Minipigs with Expression of Antimorphic Human Cryptochrome 1 
PLoS ONE  2013;8(10):e76098.
Minipigs have become important biomedical models for human ailments due to similarities in organ anatomy, physiology, and circadian rhythms relative to humans. The homeostasis of circadian rhythms in both central and peripheral tissues is pivotal for numerous biological processes. Hence, biological rhythm disorders may contribute to the onset of cancers and metabolic disorders including obesity and type II diabetes, amongst others. A tight regulation of circadian clock effectors ensures a rhythmic expression profile of output genes which, depending on cell type, constitute about 3–20% of the transcribed mammalian genome. Central to this system is the negative regulator protein Cryptochrome 1 (CRY1) of which the dysfunction or absence has been linked to the pathogenesis of rhythm disorders. In this study, we generated transgenic Bama-minipigs featuring expression of the Cys414-Ala antimorphic human Cryptochrome 1 mutant (hCRY1AP). Using transgenic donor fibroblasts as nuclear donors, the method of handmade cloning (HMC) was used to produce reconstructed embryos, subsequently transferred to surrogate sows. A total of 23 viable piglets were delivered. All were transgenic and seemingly healthy. However, two pigs with high transgene expression succumbed during the first two months. Molecular analyzes in epidermal fibroblasts demonstrated disturbances to the expression profile of core circadian clock genes and elevated expression of the proinflammatory cytokines IL-6 and TNF-α, known to be risk factors in cancer and metabolic disorders.
PMCID: PMC3797822  PMID: 24146819
16.  A human gut microbial gene catalog established by metagenomic sequencing 
Nature  2010;464(7285):59-65.
To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million nonredundant microbial genes, derived from 576.7 Gb sequence, from faecal samples of 124 European individuals. The gene set, ~150 times larger than the human gene complement, contains an overwhelming majority of the prevalent microbial genes of the cohort and likely includes a large proportion of the prevalent human intestinal microbial genes. The genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, suggesting that the entire cohort harbours between 1000 and 1150 prevalent bacterial species and each individual at least 160 such species, which are also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of functions encoded by the gene set.
PMCID: PMC3779803  PMID: 20203603
17.  Sequencing of Fifty Human Exomes Reveals Adaptation to High Altitude 
Science (New York, N.Y.)  2010;329(5987):75-78.
Residents of the Tibetan Plateau show heritable adaptations to extreme altitude. We sequenced 50 exomes of ethnic Tibetans, encompassing coding sequences of 92% of human genes, with an average coverage of 18X per individual. Genes showing population-specific allele frequency changes, which represent strong candidates for altitude adaptation, were identified. The strongest signal of natural selection came from EPAS1, a transcription factor involved in response to hypoxia. One SNP at EPAS1 shows a 78% frequency difference between Tibetan and Han samples, representing the fastest allele frequency change observed at any human gene to date. This SNP’s association with erythrocyte abundance supports the role of EPAS1 in adaptation to hypoxia. Thus, a population genomic survey has revealed a functionally important locus in genetic adaptation to high altitude.
PMCID: PMC3711608  PMID: 20595611
18.  RNA-seq analysis of prostate cancer in the Chinese population identifies recurrent gene fusions, cancer-associated long noncoding RNAs and aberrant alternative splicings 
Cell Research  2012;22(5):806-821.
There are remarkable disparities among patients of different races with prostate cancer; however, the mechanism underlying this difference remains unclear. Here, we present a comprehensive landscape of the transcriptome profiles of 14 primary prostate cancers and their paired normal counterparts from the Chinese population using RNA-seq, revealing tremendous diversity across prostate cancer transcriptomes with respect to gene fusions, long noncoding RNAs (long ncRNA), alternative splicing and somatic mutations. Three of the 14 tumors (21.4%) harbored a TMPRSS2-ERG fusion, and the low prevalence of this fusion in Chinese patients was further confirmed in an additional tumor set (10/54=18.5%). Notably, two novel gene fusions, CTAGE5-KHDRBS3 (20/54=37%) and USP9Y-TTTY15 (19/54=35.2%), occurred frequently in our patient cohort. Further systematic transcriptional profiling identified numerous long ncRNAs that were differentially expressed in the tumors. An analysis of the correlation between expression of long ncRNA and genes suggested that long ncRNAs may have functions beyond transcriptional regulation. This study yielded new insights into the pathogenesis of prostate cancer in the Chinese population.
PMCID: PMC3343650  PMID: 22349460
prostate cancer; RNA sequencing; gene fusions; long ncRNAs; alternative splicing
19.  Low incidence of DNA sequence variation in human induced pluripotent stem cells generated by non-integrating plasmid expression 
Cell Stem Cell  2012;10(3):337-344.
The utility of induced pluripotent stem cells (iPSCs) as models to study diseases and as sources for cell therapy depends on the integrity of their genomes. Despite recent publications of DNA sequence variations in the iPSCs, the true scope of such changes for the entire genome is not clear. Here we report the whole-genome sequencing of three human iPSC lines derived from two cell types of an adult donor by episomal vectors. The vector sequence was undetectable in the deeply sequenced iPSC lines. We identified 1058–1808 heterozygous single nucleotide variants (SNVs), but no copy number variants, in each iPSC line. Six to twelve of these SNVs were within coding regions in each iPSC line, but ~50% of them are synonymous changes and the remaining are not selectively enriched for known genes associated with cancers. Our data thus suggest that episome-mediated reprogramming is not inherently mutagenic during integration-free iPSC induction.
PMCID: PMC3298448  PMID: 22385660
Human iPS cells; Reprogramming; Episomal vectors; Integration-free; Genetic mutations; Whole Genome Sequencing
20.  Haplotype-assisted accurate non-invasive fetal whole genome recovery through maternal plasma sequencing 
Genome Medicine  2013;5(2):18.
The applications of massively parallel sequencing technology to fetal cell-free DNA (cff-DNA) have brought new insight to non-invasive prenatal diagnosis. However, most previous research based on maternal plasma sequencing has been restricted to fetal aneuploidies. To detect specific parentally inherited mutations, invasive approaches to obtain fetal DNA are the current standard in the clinic because of the experimental complexity and resource consumption of previously reported non-invasive approaches.
Here, we present a simple and effective non-invasive method for accurate fetal genome recovery-assisted with parental haplotypes. The parental haplotype were firstly inferred using a combination strategy of trio and unrelated individuals. Assisted with the parental haplotype, we then employed a hidden Markov model to non-invasively recover the fetal genome through maternal plasma sequencing.
Using a sequence depth of approximately 44X against a an approximate 5.69% cff-DNA concentration, we non-invasively inferred fetal genotype and haplotype under different situations of parental heterozygosity. Our data show that 98.57%, 95.37%, and 98.45% of paternal autosome alleles, maternal autosome alleles, and maternal chromosome X in the fetal haplotypes, respectively, were recovered accurately. Additionally, we obtained efficient coverage or strong linkage of 96.65% of reported Mendelian-disorder genes and 98.90% of complex disease-associated markers.
Our method provides a useful strategy for non-invasive whole fetal genome recovery.
PMCID: PMC3706925  PMID: 23445748
21.  Exome Capture Sequencing of Adenoma Reveals Genetic Alterations in Multiple Cellular Pathways at the Early Stage of Colorectal Tumorigenesis 
PLoS ONE  2013;8(1):e53310.
Most of colorectal adenocarcinomas are believed to arise from adenomas, which are premalignant lesions. Sequencing the whole exome of the adenoma will help identifying molecular biomarkers that can predict the occurrence of adenocarcinoma more precisely and help understanding the molecular pathways underlying the initial stage of colorectal tumorigenesis. We performed the exome capture sequencing of the normal mucosa, adenoma and adenocarcinoma tissues from the same patient and sequenced the identified mutations in additional 73 adenomas and 288 adenocarcinomas. Somatic single nucleotide variations (SNVs) were identified in both the adenoma and adenocarcinoma by comparing with the normal control from the same patient. We identified 12 nonsynonymous somatic SNVs in the adenoma and 42 nonsynonymous somatic SNVs in the adenocarcinoma. Most of these mutations including OR6X1, SLC15A3, KRTHB4, RBFOX1, LAMA3, CDH20, BIRC6, NMBR, GLCCI1, EFR3A, and FTHL17 were newly reported in colorectal adenomas. Functional annotation of these mutated genes showed that multiple cellular pathways including Wnt, cell adhesion and ubiquitin mediated proteolysis pathways were altered genetically in the adenoma and that the genetic alterations in the same pathways persist in the adenocarcinoma. CDH20 and LAMA3 were mutated in the adenoma while NRXN3 and COL4A6 were mutated in the adenocarcinoma from the same patient, suggesting for the first time that genetic alterations in the cell adhesion pathway occur as early as in the adenoma. Thus, the comparison of genomic mutations between adenoma and adenocarcinoma provides us a new insight into the molecular events governing the early step of colorectal tumorigenesis.
PMCID: PMC3534699  PMID: 23301059
22.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler 
GigaScience  2012;1:18.
There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions.
To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome.
Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.
PMCID: PMC3626529  PMID: 23587118
Genome; Assembly; Contig; Scaffold; Error correction; Gap-filling
23.  An atlas of DNA methylomes in porcine adipose and muscle tissues 
Nature communications  2012;3:850.
It is evident that epigenetic factors, especially DNA methylation, play essential roles in obesity development. Using pig as a model, here we investigated the systematic association between DNA methylation and obesity. We sampled eight variant adipose and two distinct skeletal muscle tissues from three pig breeds living within comparable environments but displaying distinct fat level. We generated 1,381 gigabases (Gb) of sequence data from 180 methylated DNA immunoprecipitation (MeDIP) libraries, and provided a genome-wide DNA methylation map as well as a gene expression map for adipose and muscle studies. The analysis showed global similarity and difference among breeds, sexes and anatomic locations, and identified the differentially methylated regions (DMRs). The DMRs in promoters are highly associated with obesity development via expression repression of both known obesity-related genes and novel genes. This comprehensive map provides a solid basis for exploring epigenetic mechanisms of adipose deposition and muscle growth.
PMCID: PMC3508711  PMID: 22617290
24.  The sequence and analysis of a Chinese pig genome 
GigaScience  2012;1:16.
The pig is an economically important food source, amounting to approximately 40% of all meat consumed worldwide. Pigs also serve as an important model organism because of their similarity to humans at the anatomical, physiological and genetic level, making them very useful for studying a variety of human diseases. A pig strain of particular interest is the miniature pig, specifically the Wuzhishan pig (WZSP), as it has been extensively inbred. Its high level of homozygosity offers increased ease for selective breeding for specific traits and a more straightforward understanding of the genetic changes that underlie its biological characteristics. WZSP also serves as a promising means for applications in surgery, tissue engineering, and xenotransplantation. Here, we report the sequencing and analysis of an inbreeding WZSP genome.
Our results reveal some unique genomic features, including a relatively high level of homozygosity in the diploid genome, an unusual distribution of heterozygosity, an over-representation of tRNA-derived transposable elements, a small amount of porcine endogenous retrovirus, and a lack of type C retroviruses. In addition, we carried out systematic research on gene evolution, together with a detailed investigation of the counterparts of human drug target genes.
Our results provide the opportunity to more clearly define the genomic character of pig, which could enhance our ability to create more useful pig models.
PMCID: PMC3626506  PMID: 23587058
Wuzhishan pig; Genome; Homozygosis; Transposable element; Endogenous retrovirus; Animal model
25.  Differential DNA methylation in discrete developmental stages of the parasitic nematode Trichinella spiralis 
Genome Biology  2012;13(10):R100.
DNA methylation plays an essential role in regulating gene expression under a variety of conditions and it has therefore been hypothesized to underlie the transitions between life cycle stages in parasitic nematodes. So far, however, 5'-cytosine methylation has not been detected during any developmental stage of the nematode Caenorhabditis elegans. Given the new availability of high-resolution methylation detection methods, an investigation of life cycle methylation in a parasitic nematode can now be carried out.
Here, using MethylC-seq, we present the first study to confirm the existence of DNA methylation in the parasitic nematode Trichinella spiralis, and we characterize the methylomes of the three life-cycle stages of this food-borne infectious human pathogen. We observe a drastic increase in DNA methylation during the transition from the new born to mature stage, and we further identify parasitism-related genes that show changes in DNA methylation status between life cycle stages.
Our data contribute to the understanding of the developmental changes that occur in an important human parasite, and raises the possibility that targeting DNA methylation processes may be a useful strategy in developing therapeutics to impede infection. In addition, our conclusion that DNA methylation is a mechanism for life cycle transition in T. spiralis prompts the question of whether this may also be the case in any other metazoans. Finally, our work constitutes the first report, to our knowledge, of DNA methylation in a nematode, prompting a re-evaluation of phyla in which this epigenetic mark was thought to be absent.
PMCID: PMC4053732  PMID: 23075480

Results 1-25 (56)