1.  Performance of genetic risk factors in prediction of trichloroethylene induced hypersensitivity syndrome 
Scientific Reports  2015;5:12169.
Trichloroethylene induced hypersensitivity syndrome is dose-independent and potentially life threatening disease, which has become one of the serious occupational health issues and requires intensive treatment. To discover the genetic risk factors and evaluate the performance of risk prediction model for the disease, we conducted genomewide association study and replication study with total of 174 cases and 1761 trichloroethylene-tolerant controls. Fifty seven SNPs that exceeded the threshold for genome-wide significance (P < 5 × 10−8) were screened to relate with the disease, among which two independent SNPs were identified, that is rs2857281 at MICA (odds ratio, 11.92; Pmeta = 1.33 × 10−37) and rs2523557 between HLA-B and MICA (odds ratio, 7.33; Pmeta = 8.79 × 10−35). The genetic risk score with these two SNPs explains at least 20.9% of the disease variance and up to 32.5-fold variation in inter-individual risk. Combining of two SNPs as predictors for the disease would have accuracy of 80.73%, the area under receiver operator characteristic curves (AUC) scores was 0.82 with sensitivity of 74% and specificity of 85%, which was considered to have excellent discrimination for the disease, and could be considered for translational application for screening employees before exposure.
PMCID: PMC4507183  PMID: 26190474
3.  Discovery of biclonal origin and a novel oncogene SLC12A5 in colon cancer by single-cell sequencing 
Cell Research  2014;24(6):701-712.
Single-cell sequencing is a powerful tool for delineating clonal relationship and identifying key driver genes for personalized cancer management. Here we performed single-cell sequencing analysis of a case of colon cancer. Population genetics analyses identified two independent clones in tumor cell population. The major tumor clone harbored APC and TP53 mutations as early oncogenic events, whereas the minor clone contained preponderant CDC27 and PABPC1 mutations. The absence of APC and TP53 mutations in the minor clone supports that these two clones were derived from two cellular origins. Examination of somatic mutation allele frequency spectra of additional 21 whole-tissue exome-sequenced cases revealed the heterogeneity of clonal origins in colon cancer. Next, we identified a mutated gene SLC12A5 that showed a high frequency of mutation at the single-cell level but exhibited low prevalence at the population level. Functional characterization of mutant SLC12A5 revealed its potential oncogenic effect in colon cancer. Our study provides the first exome-wide evidence at single-cell level supporting that colon cancer could be of a biclonal origin, and suggests that low-prevalence mutations in a cohort may also play important protumorigenic roles at the individual level.
PMCID: PMC4042168  PMID: 24699064
single-cell sequencing; colon cancer; SLC12A5; biclonal; oncogene
4.  Molecular Signatures of Major Depression 
Current Biology  2015;25(9):1146-1156.
Adversity, particularly in early life, can cause illness. Clues to the responsible mechanisms may lie with the discovery of molecular signatures of stress, some of which include alterations to an individual’s somatic genome. Here, using genome sequences from 11,670 women, we observed a highly significant association between a stress-related disease, major depression, and the amount of mtDNA (p = 9.00 × 10−42, odds ratio 1.33 [95% confidence interval [CI] = 1.29–1.37]) and telomere length (p = 2.84 × 10−14, odds ratio 0.85 [95% CI = 0.81–0.89]). While both telomere length and mtDNA amount were associated with adverse life events, conditional regression analyses showed the molecular changes were contingent on the depressed state. We tested this hypothesis with experiments in mice, demonstrating that stress causes both molecular changes, which are partly reversible and can be elicited by the administration of corticosterone. Together, these results demonstrate that changes in the amount of mtDNA and telomere length are consequences of stress and entering a depressed state. These findings identify increased amounts of mtDNA as a molecular marker of MD and have important implications for understanding how stress causes the disease.
•Amount of mtDNA is increased, and telomeric DNA is shortened in major depression•Both changes can be induced with stress but are contingent on the depressed state•Changes are tissue specific and in part due to glucocorticoid secretion•Changes are in part reversible and represent switches in metabolic strategy
Cai et al. found increases in mtDNA and a reduction in telomeric DNA in cases of major depression using whole-genome sequencing. Both changes are depression state dependent. Mice exposed to chronic stress or glucorticoids showed that these changes reflect switches in metabolic strategy and are tissue specific and partial reversible.
PMCID: PMC4425463  PMID: 25913401
5.  Whole-genome analyses resolve early branches in the tree of life of modern birds 
Jarvis, Erich D. | Mirarab, Siavash | Aberer, Andre J. | Li, Bo | Houde, Peter | Li, Cai | Ho, Simon Y. W. | Faircloth, Brant C. | Nabholz, Benoit | Howard, Jason T. | Suh, Alexander | Weber, Claudia C. | da Fonseca, Rute R. | Li, Jianwen | Zhang, Fang | Li, Hui | Zhou, Long | Narula, Nitish | Liu, Liang | Ganapathy, Ganesh | Boussau, Bastien | Bayzid, Md. Shamsuzzoha | Zavidovych, Volodymyr | Subramanian, Sankar | Gabaldón, Toni | Capella-Gutiérrez, Salvador | Huerta-Cepas, Jaime | Rekepalli, Bhanu | Munch, Kasper | Schierup, Mikkel | Lindow, Bent | Warren, Wesley C. | Ray, David | Green, Richard E. | Bruford, Michael W. | Zhan, Xiangjiang | Dixon, Andrew | Li, Shengbin | Li, Ning | Huang, Yinhua | Derryberry, Elizabeth P. | Bertelsen, Mads Frost | Sheldon, Frederick H. | Brumfield, Robb T. | Mello, Claudio V. | Lovell, Peter V. | Wirthlin, Morgan | Schneider, Maria Paula Cruz | Prosdocimi, Francisco | Samaniego, José Alfredo | Velazquez, Amhed Missael Vargas | Alfaro-Núñez, Alonzo | Campos, Paula F. | Petersen, Bent | Sicheritz-Ponten, Thomas | Pas, An | Bailey, Tom | Scofield, Paul | Bunce, Michael | Lambert, David M. | Zhou, Qi | Perelman, Polina | Driskell, Amy C. | Shapiro, Beth | Xiong, Zijun | Zeng, Yongli | Liu, Shiping | Li, Zhenyu | Liu, Binghang | Wu, Kui | Xiao, Jin | Yinqi, Xiong | Zheng, Qiuemei | Zhang, Yong | Yang, Huanming | Wang, Jian | Smeds, Linnea | Rheindt, Frank E. | Braun, Michael | Fjeldsa, Jon | Orlando, Ludovic | Barker, F. Keith | Jønsson, Knud Andreas | Johnson, Warren | Koepfli, Klaus-Peter | O’Brien, Stephen | Haussler, David | Ryder, Oliver A. | Rahbek, Carsten | Willerslev, Eske | Graves, Gary R. | Glenn, Travis C. | McCormack, John | Burt, Dave | Ellegren, Hans | Alström, Per | Edwards, Scott V. | Stamatakis, Alexandros | Mindell, David P. | Cracraft, Joel | Braun, Edward L. | Warnow, Tandy | Jun, Wang | Gilbert, M. Thomas P. | Zhang, Guojie
Science (New York, N.Y.)  2014;346(6215):1320-1331.
To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods created to handle genome-scale data. We recovered a highly resolved tree that confirms previously controversial sister or close relationships. We identified the first divergence in Neoaves, two groups we named Passerea and Columbea, representing independent lineages of diverse and convergently evolved land and water bird species. Among Passerea, we infer the common ancestor of core landbirds to have been an apex predator and confirm independent gains of vocal learning. Among Columbea, we identify pigeons and flamingoes as belonging to sister clades. Even with whole genomes, some of the earliest branches in Neoaves proved challenging to resolve, which was best explained by massive protein-coding sequence convergence and high levels of incomplete lineage sorting that occurred during a rapid radiation after the Cretaceous-Paleogene mass extinction event about 66 million years ago.
PMCID: PMC4405904  PMID: 25504713
6.  Novel recurrently mutated genes and a prognostic mutation signature in colorectal cancer 
Gut  2014;64(4):636-645.
Characterisation of colorectal cancer (CRC) genomes by next-generation sequencing has led to the discovery of novel recurrently mutated genes. Nevertheless, genomic data has not yet been used for CRC prognostication.
To identify recurrent somatic mutations with prognostic significance in patients with CRC.
Exome sequencing was performed to identify somatic mutations in tumour tissues of 22 patients with CRC, followed by validation of 187 recurrent and pathway-related genes using targeted capture sequencing in additional 160 cases.
Seven significantly mutated genes, including four reported (APC, TP53, KRAS and SMAD4) and three novel recurrently mutated genes (CDH10, FAT4 and DOCK2), exhibited high mutation prevalence (6–14% for novel cancer genes) and higher-than-expected number of non-silent mutations in our CRC cohort. For prognostication, a five-gene-signature (CDH10, COL6A3, SMAD4, TMEM132D, VCAN) was devised, in which mutation(s) in one or more of these genes was significantly associated with better overall survival independent of tumor-node-metastasis (TNM) staging. The median survival time was 80.4 months in the mutant group versus 42.4 months in the wild type group (p=0.0051). The prognostic significance of this signature was successfully verified using the data set from the Cancer Genome Atlas study.
The application of next-generation sequencing has led to the identification of three novel significantly mutated genes in CRC and a mutation signature that predicts survival outcomes for stratifying patients with CRC independent of TNM staging.
PMCID: PMC4392212  PMID: 24951259
7.  Comparative genomics reveals insights into avian genome evolution and adaptation 
Zhang, Guojie | Li, Cai | Li, Qiye | Li, Bo | Larkin, Denis M. | Lee, Chul | Storz, Jay F. | Antunes, Agostinho | Greenwold, Matthew J. | Meredith, Robert W. | Ödeen, Anders | Cui, Jie | Zhou, Qi | Xu, Luohao | Pan, Hailin | Wang, Zongji | Jin, Lijun | Zhang, Pei | Hu, Haofu | Yang, Wei | Hu, Jiang | Xiao, Jin | Yang, Zhikai | Liu, Yang | Xie, Qiaolin | Yu, Hao | Lian, Jinmin | Wen, Ping | Zhang, Fang | Li, Hui | Zeng, Yongli | Xiong, Zijun | Liu, Shiping | Zhou, Long | Huang, Zhiyong | An, Na | Wang, Jie | Zheng, Qiumei | Xiong, Yingqi | Wang, Guangbiao | Wang, Bo | Wang, Jingjing | Fan, Yu | da Fonseca, Rute R. | Alfaro-Núñez, Alonzo | Schubert, Mikkel | Orlando, Ludovic | Mourier, Tobias | Howard, Jason T. | Ganapathy, Ganeshkumar | Pfenning, Andreas | Whitney, Osceola | Rivas, Miriam V. | Hara, Erina | Smith, Julia | Farré, Marta | Narayan, Jitendra | Slavov, Gancho | Romanov, Michael N | Borges, Rui | Machado, João Paulo | Khan, Imran | Springer, Mark S. | Gatesy, John | Hoffmann, Federico G. | Opazo, Juan C. | Håstad, Olle | Sawyer, Roger H. | Kim, Heebal | Kim, Kyu-Won | Kim, Hyeon Jeong | Cho, Seoae | Li, Ning | Huang, Yinhua | Bruford, Michael W. | Zhan, Xiangjiang | Dixon, Andrew | Bertelsen, Mads F. | Derryberry, Elizabeth | Warren, Wesley | Wilson, Richard K | Li, Shengbin | Ray, David A. | Green, Richard E. | O’Brien, Stephen J. | Griffin, Darren | Johnson, Warren E. | Haussler, David | Ryder, Oliver A. | Willerslev, Eske | Graves, Gary R. | Alström, Per | Fjeldså, Jon | Mindell, David P. | Edwards, Scott V. | Braun, Edward L. | Rahbek, Carsten | Burt, David W. | Houde, Peter | Zhang, Yong | Yang, Huanming | Wang, Jian | Jarvis, Erich D. | Gilbert, M. Thomas P. | Wang, Jun
Science (New York, N.Y.)  2014;346(6215):1311-1320.
Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits.
PMCID: PMC4390078  PMID: 25504712
8.  Excess of Rare Variants in Genes that are Key Epigenetic Regulators of Spermatogenesis in the Patients with Non-Obstructive Azoospermia 
Scientific Reports  2015;5:8785.
Non-obstructive azoospermia (NOA), a severe form of male infertility, is often suspected to be linked to currently undefined genetic abnormalities. To explore the genetic basis of this condition, we successfully sequenced ~650 infertility-related genes in 757 NOA patients and 709 fertile males. We evaluated the contributions of rare variants to the etiology of NOA by identifying individual genes showing nominal associations and testing the genetic burden of a given biological process as a whole. We found a significant excess of rare, non-silent variants in genes that are key epigenetic regulators of spermatogenesis, such as BRWD1, DNMT1, DNMT3B, RNF17, UBR2, USP1 and USP26, in NOA patients (P = 5.5 × 10−7), corresponding to a carrier frequency of 22.5% of patients and 13.7% of controls (P = 1.4 × 10−5). An accumulation of low-frequency variants was also identified in additional epigenetic genes (BRDT and MTHFR). Our study suggested the potential associations of genetic defects in genes that are epigenetic regulators with spermatogenic failure in human.
PMCID: PMC4350091  PMID: 25739334
10.  Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology 
GigaScience  2014;3:34.
Structural variants (SVs) are less common than single nucleotide polymorphisms and indels in the population, but collectively account for a significant fraction of genetic polymorphism and diseases. Base pair differences arising from SVs are on a much higher order (>100 fold) than point mutations; however, none of the current detection methods are comprehensive, and currently available methodologies are incapable of providing sufficient resolution and unambiguous information across complex regions in the human genome. To address these challenges, we applied a high-throughput, cost-effective genome mapping technology to comprehensively discover genome-wide SVs and characterize complex regions of the YH genome using long single molecules (>150 kb) in a global fashion.
Utilizing nanochannel-based genome mapping technology, we obtained 708 insertions/deletions and 17 inversions larger than 1 kb. Excluding the 59 SVs (54 insertions/deletions, 5 inversions) that overlap with N-base gaps in the reference assembly hg19, 666 non-gap SVs remained, and 396 of them (60%) were verified by paired-end data from whole-genome sequencing-based re-sequencing or de novo assembly sequence from fosmid data. Of the remaining 270 SVs, 260 are insertions and 213 overlap known SVs in the Database of Genomic Variants. Overall, 609 out of 666 (90%) variants were supported by experimental orthogonal methods or historical evidence in public databases. At the same time, genome mapping also provides valuable information for complex regions with haplotypes in a straightforward fashion. In addition, with long single-molecule labeling patterns, exogenous viral sequences were mapped on a whole-genome scale, and sample heterogeneity was analyzed at a new level.
Our study highlights genome mapping technology as a comprehensive and cost-effective method for detecting structural variation and studying complex regions in the human genome, as well as deciphering viral integration into the host genome.
Electronic supplementary material
The online version of this article (doi:10.1186/2047-217X-3-34) contains supplementary material, which is available to authorized users.
PMCID: PMC4322599  PMID: 25671094
Genome mapping; Structural variation; Repeat units; Epstein-Barr virus (EBV) integration
11.  Mudskipper genomes provide insights into the terrestrial adaptation of amphibious fishes 
Nature Communications  2014;5:5594.
Mudskippers are amphibious fishes that have developed morphological and physiological adaptations to match their unique lifestyles. Here we perform whole-genome sequencing of four representative mudskippers to elucidate the molecular mechanisms underlying these adaptations. We discover an expansion of innate immune system genes in the mudskippers that may provide defence against terrestrial pathogens. Several genes of the ammonia excretion pathway in the gills have experienced positive selection, suggesting their important roles in mudskippers’ tolerance to environmental ammonia. Some vision-related genes are differentially lost or mutated, illustrating genomic changes associated with aerial vision. Transcriptomic analyses of mudskippers exposed to air highlight regulatory pathways that are up- or down-regulated in response to hypoxia. The present study provides a valuable resource for understanding the molecular mechanisms underlying water-to-land transition of vertebrates.
Mudskippers are amphibious fishes that have adapted to live on mudflats. Here, the authors sequence the genomes of four different mudskipper species and highlight genetic changes that may have had an evolutionary role in the water-to-land transition of vertebrates.
PMCID: PMC4268706  PMID: 25463417
12.  Two Antarctic penguin genomes reveal insights into their evolutionary history and molecular changes related to the Antarctic environment 
GigaScience  2014;3:27.
Penguins are flightless aquatic birds widely distributed in the Southern Hemisphere. The distinctive morphological and physiological features of penguins allow them to live an aquatic life, and some of them have successfully adapted to the hostile environments in Antarctica. To study the phylogenetic and population history of penguins and the molecular basis of their adaptations to Antarctica, we sequenced the genomes of the two Antarctic dwelling penguin species, the Adélie penguin [Pygoscelis adeliae] and emperor penguin [Aptenodytes forsteri].
Phylogenetic dating suggests that early penguins arose ~60 million years ago, coinciding with a period of global warming. Analysis of effective population sizes reveals that the two penguin species experienced population expansions from ~1 million years ago to ~100 thousand years ago, but responded differently to the climatic cooling of the last glacial period. Comparative genomic analyses with other available avian genomes identified molecular changes in genes related to epidermal structure, phototransduction, lipid metabolism, and forelimb morphology.
Our sequencing and initial analyses of the first two penguin genomes provide insights into the timing of penguin origin, fluctuations in effective population sizes of the two penguin species over the past 10 million years, and the potential associations between these biological patterns and global climate change. The molecular changes compared with other avian genomes reflect both shared and diverse adaptations of the two penguin species to the Antarctic environment.
Electronic supplementary material
The online version of this article (doi:10.1186/2047-217X-3-27) contains supplementary material, which is available to authorized users.
PMCID: PMC4322438  PMID: 25671092
Penguins; Avian genomics; Evolution; Adaptation; Antarctica
13.  Clinical outcome of preimplantation genetic diagnosis and screening using next generation sequencing 
GigaScience  2014;3:30.
Next generation sequencing (NGS) is now being used for detecting chromosomal abnormalities in blastocyst trophectoderm (TE) cells from in vitro fertilized embryos. However, few data are available regarding the clinical outcome, which provides vital reference for further application of the methodology. Here, we present a clinical evaluation of NGS-based preimplantation genetic diagnosis/screening (PGD/PGS) compared with single nucleotide polymorphism (SNP) array-based PGD/PGS as a control.
A total of 395 couples participated. They were carriers of either translocation or inversion mutations, or were patients with recurrent miscarriage and/or advanced maternal age. A total of 1,512 blastocysts were biopsied on D5 after fertilization, with 1,058 blastocysts set aside for SNP array testing and 454 blastocysts for NGS testing. In the NGS cycles group, the implantation, clinical pregnancy and miscarriage rates were 52.6% (60/114), 61.3% (49/80) and 14.3% (7/49), respectively. In the SNP array cycles group, the implantation, clinical pregnancy and miscarriage rates were 47.6% (139/292), 56.7% (115/203) and 14.8% (17/115), respectively. The outcome measures of both the NGS and SNP array cycles were the same with insignificant differences. There were 150 blastocysts that underwent both NGS and SNP array analysis, of which seven blastocysts were found with inconsistent signals. All other signals obtained from NGS analysis were confirmed to be accurate by validation with qPCR. The relative copy number of mitochondrial DNA (mtDNA) for each blastocyst that underwent NGS testing was evaluated, and a significant difference was found between the copy number of mtDNA for the euploid and the chromosomally abnormal blastocysts. So far, out of 42 ongoing pregnancies, 24 babies were born in NGS cycles; all of these babies are healthy and free of any developmental problems.
This study provides the first evaluation of the clinical outcomes of NGS-based pre-implantation genetic diagnosis/screening, and shows the reliability of this method in a clinical and array-based laboratory setting. NGS provides an accurate approach to detect embryonic imbalanced segmental rearrangements, to avoid the potential risks of false signals from SNP array in this study.
Electronic supplementary material
The online version of this article (doi:10.1186/2047-217X-3-30) contains supplementary material, which is available to authorized users.
PMCID: PMC4326468  PMID: 25685330
Preimplantation genetic diagnosis/screening; Next generation sequencing; Blastocyst; Cryopreserved embryo transfer; Clinical outcome
14.  Comparative Analysis of Evolutionarily Conserved Motifs of Epidermal Growth Factor Receptor 2 (HER2) Predicts Novel Potential Therapeutic Epitopes 
PLoS ONE  2014;9(9):e106448.
Overexpression of human epidermal growth factor receptor 2 (HER2) is associated with tumor aggressiveness and poor prognosis in breast cancer. With the availability of therapeutic antibodies against HER2, great strides have been made in the clinical management of HER2 overexpressing breast cancer. However, de novo and acquired resistance to these antibodies presents a serious limitation to successful HER2 targeting treatment. The identification of novel epitopes of HER2 that can be used for functional/region-specific blockade could represent a central step in the development of new clinically relevant anti-HER2 antibodies. In the present study, we present a novel computational approach as an auxiliary tool for identification of novel HER2 epitopes. We hypothesized that the structurally and linearly evolutionarily conserved motifs of the extracellular domain of HER2 (ECD HER2) contain potential druggable epitopes/targets. We employed the PROSITE Scan to detect structurally conserved motifs and PRINTS to search for linearly conserved motifs of ECD HER2. We found that the epitopes recognized by trastuzumab and pertuzumab are located in the predicted conserved motifs of ECD HER2, supporting our initial hypothesis. Considering that structurally and linearly conserved motifs can provide functional specific configurations, we propose that by comparing the two types of conserved motifs, additional druggable epitopes/targets in the ECD HER2 protein can be identified, which can be further modified for potential therapeutic application. Thus, this novel computational process for predicting or searching for potential epitopes or key target sites may contribute to epitope-based vaccine and function-selected drug design, especially when x-ray crystal structure protein data is not available.
PMCID: PMC4156330  PMID: 25192037
15.  A Survey of Overlooked Viral Infections in Biological Experiment Systems 
PLoS ONE  2014;9(8):e105348.
It is commonly accepted that there are many unknown viruses on the planet. For the known viruses, do we know their prevalence, even in our experimental systems? Here we report a virus survey using recently published small (s)RNA sequencing datasets. The sRNA reads were assembled and contigs were screened for virus homologues against the NCBI nucleotide (nt) database using the BLASTn program. To our surprise, approximately 30% (28 out of 94) of publications had highly scored viral sequences in their datasets. Among them, only two publications reported virus infections. Though viral vectors were used in some of the publications, virus sequences without any identifiable source appeared in more than 20 publications. By determining the distributions of viral reads and the antiviral RNA interference (RNAi) pathways using the sRNA profiles, we showed evidence that many of the viruses identified were indeed infecting and generated host RNAi responses. As virus infections affect many aspects of host molecular biology and metabolism, the presence and impact of viruses needs to be actively investigated in experimental systems.
PMCID: PMC4140767  PMID: 25144530
16.  The co-occurrence of mtDNA mutations on different oxidative phosphorylation subunits, not detected by haplogroup analysis, affects human longevity and is population specific 
Aging Cell  2013;13(3):401-407.
To re-examine the correlation between mtDNA variability and longevity, we examined mtDNAs from samples obtained from over 2200 ultranonagenarians (and an equal number of controls) collected within the framework of the GEHA EU project. The samples were categorized by high-resolution classification, while about 1300 mtDNA molecules (650 ultranonagenarians and an equal number of controls) were completely sequenced. Sequences, unlike standard haplogroup analysis, made possible to evaluate for the first time the cumulative effects of specific, concomitant mtDNA mutations, including those that per se have a low, or very low, impact. In particular, the analysis of the mutations occurring in different OXPHOS complex showed a complex scenario with a different mutation burden in 90+ subjects with respect to controls. These findings suggested that mutations in subunits of the OXPHOS complex I had a beneficial effect on longevity, while the simultaneous presence of mutations in complex I and III (which also occurs in J subhaplogroups involved in LHON) and in complex I and V seemed to be detrimental, likely explaining previous contradictory results. On the whole, our study, which goes beyond haplogroup analysis, suggests that mitochondrial DNA variation does affect human longevity, but its effect is heavily influenced by the interaction between mutations concomitantly occurring on different mtDNA genes.
PMCID: PMC4326891  PMID: 24341918
genetics of longevity; longevity; mitochondrial DNA; mtDNA sequencing; oxidative phosphorylation
18.  Integrated detection of both 5-mC and 5-hmC by high-throughput tag sequencing technology highlights methylation reprogramming of bivalent genes during cellular differentiation 
Epigenetics  2013;8(4):421-430.
5-methylcytosine (5-mC) can be oxidized to 5-hydroxymethylcytosine (5-hmC). Genome-wide profiling of 5-hmC thus far indicates 5-hmC may not only be an intermediate form of DNA demethylation but could also constitute an epigenetic mark per se. Here we describe a cost-effective and selective method to detect both the hydroxymethylation and methylation status of cytosines in a subset of cytosines in the human genome. This method involves the selective glucosylation of 5-hmC residues, short-Sequence tag generation and high-throughput sequencing. We tested this method by screening H9 human embryonic stem cells and their differentiated embroid body cells, and found that differential hydroxymethylation preferentially occurs in bivalent genes during cellular differentiation. Especially, our results support hydroxymethylation can regulate key transcription regulators with bivalent marks through demethylation and affect cellular decision on choosing active or inactive state of these genes upon cellular differentiation. Future application of this technology would enable us to uncover the status of methylation and hydroxymethylation in dynamic biological processes and disease development in multiple biological samples.
PMCID: PMC3674051  PMID: 23502161
HMST-Seq; differentiation; embryonic stem cells; hydroxymethylation; methylation
19.  Complete Resequencing of 40 Genomes Reveals Domestication Events and Genes in Silkworm (Bombyx) 
Science (New York, N.Y.)  2009;326(5951):433-436.
A single–base pair resolution silkworm genetic variation map was constructed from 40 domesticated and wild silkworms, each sequenced to approximately threefold coverage, representing 99.88% of the genome. We identified ∼16 million single-nucleotide polymorphisms, many indels, and structural variations. We find that the domesticated silkworms are clearly genetically differentiated from the wild ones, but they have maintained large levels of genetic variability, suggesting a short domestication event involving a large number of individuals. We also identified signals of selection at 354 candidate genes that may have been important during domestication, some of which have enriched expression in the silk gland, midgut, and testis. These data add to our understanding of the domestication processes and may have applications in devising pest control strategies and advancing the use of silkworms as efficient bioreactors.
PMCID: PMC3951477  PMID: 19713493
20.  The sequence and de novo assembly of the giant panda genome 
Li, Ruiqiang | Fan, Wei | Tian, Geng | Zhu, Hongmei | He, Lin | Cai, Jing | Huang, Quanfei | Cai, Qingle | Li, Bo | Bai, Yinqi | Zhang, Zhihe | Zhang, Yaping | Wang, Wen | Li, Jun | Wei, Fuwen | Li, Heng | Jian, Min | Li, Jianwen | Zhang, Zhaolei | Nielsen, Rasmus | Li, Dawei | Gu, Wanjun | Yang, Zhentao | Xuan, Zhaoling | Ryder, Oliver A. | Leung, Frederick Chi-Ching | Zhou, Yan | Cao, Jianjun | Sun, Xiao | Fu, Yonggui | Fang, Xiaodong | Guo, Xiaosen | Wang, Bo | Hou, Rong | Shen, Fujun | Mu, Bo | Ni, Peixiang | Lin, Runmao | Qian, Wubin | Wang, Guodong | Yu, Chang | Nie, Wenhui | Wang, Jinhuan | Wu, Zhigang | Liang, Huiqing | Min, Jiumeng | Wu, Qi | Cheng, Shifeng | Ruan, Jue | Wang, Mingwei | Shi, Zhongbin | Wen, Ming | Liu, Binghang | Ren, Xiaoli | Zheng, Huisong | Dong, Dong | Cook, Kathleen | Shan, Gao | Zhang, Hao | Kosiol, Carolin | Xie, Xueying | Lu, Zuhong | Zheng, Hancheng | Li, Yingrui | Steiner, Cynthia C. | Lam, Tommy Tsan-Yuk | Lin, Siyuan | Zhang, Qinghui | Li, Guoqing | Tian, Jing | Gong, Timing | Liu, Hongde | Zhang, Dejin | Fang, Lin | Ye, Chen | Zhang, Juanbin | Hu, Wenbo | Xu, Anlong | Ren, Yuanyuan | Zhang, Guojie | Bruford, Michael W. | Li, Qibin | Ma, Lijia | Guo, Yiran | An, Na | Hu, Yujie | Zheng, Yang | Shi, Yongyong | Li, Zhiqiang | Liu, Qing | Chen, Yanling | Zhao, Jing | Qu, Ning | Zhao, Shancen | Tian, Feng | Wang, Xiaoling | Wang, Haiyin | Xu, Lizhi | Liu, Xiao | Vinar, Tomas | Wang, Yajun | Lam, Tak-Wah | Yiu, Siu-Ming | Liu, Shiping | Zhang, Hemin | Li, Desheng | Huang, Yan | Wang, Xia | Yang, Guohua | Jiang, Zhi | Wang, Junyi | Qin, Nan | Li, Li | Li, Jingxiang | Bolund, Lars | Kristiansen, Karsten | Wong, Gane Ka-Shu | Olson, Maynard | Zhang, Xiuqing | Li, Songgang | Yang, Huanming | Wang, Jian | Wang, Jun
Nature  2009;463(7279):311-317.
Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes.
PMCID: PMC3951497  PMID: 20010809
22.  Integrative analyses of gene expression and DNA methylation profiles in breast cancer cell line models of tamoxifen-resistance indicate a potential role of cells with stem-like properties 
Breast Cancer Research : BCR  2013;15(6):R119.
Development of resistance to tamoxifen is an important clinical issue in the treatment of breast cancer. Tamoxifen resistance may be the result of acquisition of epigenetic regulation within breast cancer cells, such as DNA methylation, resulting in changed mRNA expression of genes pivotal for estrogen-dependent growth. Alternatively, tamoxifen resistance may be due to selection of pre-existing resistant cells, or a combination of the two mechanisms.
To evaluate the contribution of these possible tamoxifen resistance mechanisms, we applied modified DNA methylation-specific digital karyotyping (MMSDK) and digital gene expression (DGE) in combination with massive parallel sequencing to analyze a well-established tamoxifen-resistant cell line model (TAMR), consisting of 4 resistant and one parental cell line. Another tamoxifen-resistant cell line model system (LCC1/LCC2) was used to validate the DNA methylation and gene expression results.
Significant differences were observed in global gene expression and DNA methylation profiles between the parental tamoxifen-sensitive cell line and the 4 tamoxifen-resistant TAMR sublines. The 4 TAMR cell lines exhibited higher methylation levels as well as an inverse relationship between gene expression and DNA methylation in the promoter regions. A panel of genes, including NRIP1, HECA and FIS1, exhibited lower gene expression in resistant vs. parental cells and concurrent increased promoter CGI methylation in resistant vs. parental cell lines. A major part of the methylation, gene expression, and pathway alterations observed in the TAMR model were also present in the LCC1/LCC2 cell line model. More importantly, high expression of SOX2 and alterations of other SOX and E2F gene family members, as well as RB-related pocket protein genes in TAMR highlighted stem cell-associated pathways as being central in the resistant cells and imply that cancer-initiating cells/cancer stem-like cells may be involved in tamoxifen resistance in this model.
Our data highlight the likelihood that resistant cells emerge from cancer-initiating cells/cancer stem-like cells and imply that these cells may gain further advantage in growth via epigenetic mechanisms. Illuminating the expression and DNA methylation features of putative cancer-initiating cells/cancer stem cells may suggest novel strategies to overcome tamoxifen resistance.
PMCID: PMC4057522  PMID: 24355041
23.  High genome heterozygosity and endemic genetic recombination in the wheat stripe rust fungus 
Nature Communications  2013;4:2673.
Stripe rust, caused by Puccinia striiformis f. sp. tritici (Pst), is one of the most destructive diseases of wheat. Here we report a 110-Mb draft sequence of Pst isolate CY32, obtained using a ‘fosmid-to-fosmid’ strategy, to better understand its race evolution and pathogenesis. The Pst genome is highly heterozygous and contains 25,288 protein-coding genes. Compared with non-obligate fungal pathogens, Pst has a more diverse gene composition and more genes encoding secreted proteins. Re-sequencing analysis indicates significant genetic variation among six isolates collected from different continents. Approximately 35% of SNPs are in the coding sequence regions, and half of them are non-synonymous. High genetic diversity in Pst suggests that sexual reproduction has an important role in the origin of different regional races. Our results show the effectiveness of the ‘fosmid-to-fosmid’ strategy for sequencing dikaryotic genomes and the feasibility of genome analysis to understand race evolution in Pst and other obligate pathogens.
Stripe rust is one of the most destructive wheat diseases. Here, Zheng and colleagues report a draft genome sequence of wheat stripe rust fungus, generated using a fosmid-to-fosmid approach, and provide insight into its race evolution and pathogenesis.
PMCID: PMC3826619  PMID: 24150273
24.  Development of Transgenic Minipigs with Expression of Antimorphic Human Cryptochrome 1 
PLoS ONE  2013;8(10):e76098.
Minipigs have become important biomedical models for human ailments due to similarities in organ anatomy, physiology, and circadian rhythms relative to humans. The homeostasis of circadian rhythms in both central and peripheral tissues is pivotal for numerous biological processes. Hence, biological rhythm disorders may contribute to the onset of cancers and metabolic disorders including obesity and type II diabetes, amongst others. A tight regulation of circadian clock effectors ensures a rhythmic expression profile of output genes which, depending on cell type, constitute about 3–20% of the transcribed mammalian genome. Central to this system is the negative regulator protein Cryptochrome 1 (CRY1) of which the dysfunction or absence has been linked to the pathogenesis of rhythm disorders. In this study, we generated transgenic Bama-minipigs featuring expression of the Cys414-Ala antimorphic human Cryptochrome 1 mutant (hCRY1AP). Using transgenic donor fibroblasts as nuclear donors, the method of handmade cloning (HMC) was used to produce reconstructed embryos, subsequently transferred to surrogate sows. A total of 23 viable piglets were delivered. All were transgenic and seemingly healthy. However, two pigs with high transgene expression succumbed during the first two months. Molecular analyzes in epidermal fibroblasts demonstrated disturbances to the expression profile of core circadian clock genes and elevated expression of the proinflammatory cytokines IL-6 and TNF-α, known to be risk factors in cancer and metabolic disorders.
PMCID: PMC3797822  PMID: 24146819
25.  A human gut microbial gene catalog established by metagenomic sequencing 
Nature  2010;464(7285):59-65.
To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million nonredundant microbial genes, derived from 576.7 Gb sequence, from faecal samples of 124 European individuals. The gene set, ~150 times larger than the human gene complement, contains an overwhelming majority of the prevalent microbial genes of the cohort and likely includes a large proportion of the prevalent human intestinal microbial genes. The genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, suggesting that the entire cohort harbours between 1000 and 1150 prevalent bacterial species and each individual at least 160 such species, which are also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of functions encoded by the gene set.
PMCID: PMC3779803  PMID: 20203603

