Single-cell sequencing is a powerful tool for delineating clonal relationship and identifying key driver genes for personalized cancer management. Here we performed single-cell sequencing analysis of a case of colon cancer. Population genetics analyses identified two independent clones in tumor cell population. The major tumor clone harbored APC and TP53 mutations as early oncogenic events, whereas the minor clone contained preponderant CDC27 and PABPC1 mutations. The absence of APC and TP53 mutations in the minor clone supports that these two clones were derived from two cellular origins. Examination of somatic mutation allele frequency spectra of additional 21 whole-tissue exome-sequenced cases revealed the heterogeneity of clonal origins in colon cancer. Next, we identified a mutated gene SLC12A5 that showed a high frequency of mutation at the single-cell level but exhibited low prevalence at the population level. Functional characterization of mutant SLC12A5 revealed its potential oncogenic effect in colon cancer. Our study provides the first exome-wide evidence at single-cell level supporting that colon cancer could be of a biclonal origin, and suggests that low-prevalence mutations in a cohort may also play important protumorigenic roles at the individual level.
single-cell sequencing; colon cancer; SLC12A5; biclonal; oncogene
Hair follicles (HF) undergo precisely regulated recurrent cycles of growth, cessation, and rest. The transitions from anagen (growth), to catagen (regression), to telogen (rest) involve a physiological involution of the HF. This process is likely coordinated by a variety of mechanisms including apoptosis and loss of growth factor signaling. However, the precise molecular mechanisms underlying follicle involution after hair keratinocyte differentiation and hair shaft assembly remain poorly understood. Here we demonstrate that a highly conserved microRNA, miR-22 is markedly upregulated during catagen and peaks in telogen. Using gain- and loss-of-function approaches in vivo, we find that miR-22 overexpression leads to hair loss by promoting anagen-to-catagen transition of the HF, and that deletion of miR-22 delays entry to catagen and accelerates the transition from telogen to anagen. Ectopic activation of miR-22 results in hair loss due to the repression a hair keratinocyte differentiation program and keratinocyte progenitor expansion, as well as promotion of apoptosis. At the molecular level, we demonstrate that miR-22 directly represses numerous transcription factors upstream of phenotypic keratin genes, including Dlx3, Foxn1, and Hoxc13. We conclude that miR-22 is a critical post-transcriptional regulator of the hair cycle and may represent a novel target for therapeutic modulation of hair growth.
Up to 60% people suffer from hair loss throughout their lifetime. Hair growth undergoes recurrent cycling of growth, regression, and resting phases with a defined periodicity. The main cause of human hair loss is due to the premature transition from growth to regression. Understanding of the molecular basis underlying hair regression is important to elucidate the mechanisms of hair loss. Here, we demonstrated that miR-22, a highly conserved microRNA, is critical for the transition from growth to regression of the hair follicle. Importantly, miR-22 could be a novel target for therapeutic therapy of hair loss disorders.
Adversity, particularly in early life, can cause illness. Clues to the responsible mechanisms may lie with the discovery of molecular signatures of stress, some of which include alterations to an individual’s somatic genome. Here, using genome sequences from 11,670 women, we observed a highly significant association between a stress-related disease, major depression, and the amount of mtDNA (p = 9.00 × 10−42, odds ratio 1.33 [95% confidence interval [CI] = 1.29–1.37]) and telomere length (p = 2.84 × 10−14, odds ratio 0.85 [95% CI = 0.81–0.89]). While both telomere length and mtDNA amount were associated with adverse life events, conditional regression analyses showed the molecular changes were contingent on the depressed state. We tested this hypothesis with experiments in mice, demonstrating that stress causes both molecular changes, which are partly reversible and can be elicited by the administration of corticosterone. Together, these results demonstrate that changes in the amount of mtDNA and telomere length are consequences of stress and entering a depressed state. These findings identify increased amounts of mtDNA as a molecular marker of MD and have important implications for understanding how stress causes the disease.
•Amount of mtDNA is increased, and telomeric DNA is shortened in major depression•Both changes can be induced with stress but are contingent on the depressed state•Changes are tissue specific and in part due to glucocorticoid secretion•Changes are in part reversible and represent switches in metabolic strategy
Cai et al. found increases in mtDNA and a reduction in telomeric DNA in cases of major depression using whole-genome sequencing. Both changes are depression state dependent. Mice exposed to chronic stress or glucorticoids showed that these changes reflect switches in metabolic strategy and are tissue specific and partial reversible.
Hereditary ataxias are a heterogeneous group of neurodegenerative disorders, where exome sequencing may become an important diagnostic tool to solve clinically or genetically complex cases.
We describe an Italian family in which three sisters were affected by ataxia with postural/intentional myoclonus and involuntary movements at onset, which persisted during the disease. Oculomotor apraxia was absent. Clinical and genetic data did not allow us to exclude autosomal dominant or recessive inheritance and suggest a disease gene.
Exome sequencing identified a homozygous c.6292C > T (p.Arg2098*) mutation in SETX and a heterozygous c.346G > A (p.Gly116Arg) mutation in AFG3L2 shared by all three affected individuals. A fourth sister (II.7) had subclinical myoclonic jerks at proximal upper limbs and perioral district, confirmed by electrophysiology, and carried the p.Gly116Arg change. Three siblings were healthy.
Pathogenicity prediction and a yeast-functional assay suggested p.Gly116Arg impaired m-AAA (ATPases associated with various cellular activities) complex function.
Exome sequencing is a powerful tool in identifying disease genes. We identified an atypical form of Ataxia with Oculoapraxia type 2 (AOA2) with myoclonus at onset associated with the c.6292C > T (p.Arg2098*) homozygous mutation. Because the same genotype was described in six cases from a Tunisian family with a typical AOA2 without myoclonus, we speculate this latter feature is associated with a second mutated gene, namely AFG3L2 (p.Gly116Arg variant).
We suggest that variant phenotypes may be due to the combined effect of different mutated genes associated to ataxia or related disorders, that will become more apparent as the costs of exome sequencing progressively will reduce, amplifying its diagnostics use, and meanwhile proposing significant challenges in the interpretation of the data.
Electronic supplementary material
The online version of this article (doi:10.1186/s12881-015-0159-0) contains supplementary material, which is available to authorized users.
AFG3L2; Exome sequencing; Senataxin; SETX; Modifier genes; SCAR1; Ataxia with Oculomotor Apraxia Type 2; Autosomal recessive ataxia; Myoclonus
Non-obstructive azoospermia (NOA), a severe form of male infertility, is often suspected to be linked to currently undefined genetic abnormalities. To explore the genetic basis of this condition, we successfully sequenced ~650 infertility-related genes in 757 NOA patients and 709 fertile males. We evaluated the contributions of rare variants to the etiology of NOA by identifying individual genes showing nominal associations and testing the genetic burden of a given biological process as a whole. We found a significant excess of rare, non-silent variants in genes that are key epigenetic regulators of spermatogenesis, such as BRWD1, DNMT1, DNMT3B, RNF17, UBR2, USP1 and USP26, in NOA patients (P = 5.5 × 10−7), corresponding to a carrier frequency of 22.5% of patients and 13.7% of controls (P = 1.4 × 10−5). An accumulation of low-frequency variants was also identified in additional epigenetic genes (BRDT and MTHFR). Our study suggested the potential associations of genetic defects in genes that are epigenetic regulators with spermatogenic failure in human.
Next generation sequencing (NGS) is now being used for detecting chromosomal abnormalities in blastocyst trophectoderm (TE) cells from in vitro fertilized embryos. However, few data are available regarding the clinical outcome, which provides vital reference for further application of the methodology. Here, we present a clinical evaluation of NGS-based preimplantation genetic diagnosis/screening (PGD/PGS) compared with single nucleotide polymorphism (SNP) array-based PGD/PGS as a control.
A total of 395 couples participated. They were carriers of either translocation or inversion mutations, or were patients with recurrent miscarriage and/or advanced maternal age. A total of 1,512 blastocysts were biopsied on D5 after fertilization, with 1,058 blastocysts set aside for SNP array testing and 454 blastocysts for NGS testing. In the NGS cycles group, the implantation, clinical pregnancy and miscarriage rates were 52.6% (60/114), 61.3% (49/80) and 14.3% (7/49), respectively. In the SNP array cycles group, the implantation, clinical pregnancy and miscarriage rates were 47.6% (139/292), 56.7% (115/203) and 14.8% (17/115), respectively. The outcome measures of both the NGS and SNP array cycles were the same with insignificant differences. There were 150 blastocysts that underwent both NGS and SNP array analysis, of which seven blastocysts were found with inconsistent signals. All other signals obtained from NGS analysis were confirmed to be accurate by validation with qPCR. The relative copy number of mitochondrial DNA (mtDNA) for each blastocyst that underwent NGS testing was evaluated, and a significant difference was found between the copy number of mtDNA for the euploid and the chromosomally abnormal blastocysts. So far, out of 42 ongoing pregnancies, 24 babies were born in NGS cycles; all of these babies are healthy and free of any developmental problems.
This study provides the first evaluation of the clinical outcomes of NGS-based pre-implantation genetic diagnosis/screening, and shows the reliability of this method in a clinical and array-based laboratory setting. NGS provides an accurate approach to detect embryonic imbalanced segmental rearrangements, to avoid the potential risks of false signals from SNP array in this study.
Electronic supplementary material
The online version of this article (doi:10.1186/2047-217X-3-30) contains supplementary material, which is available to authorized users.
Preimplantation genetic diagnosis/screening; Next generation sequencing; Blastocyst; Cryopreserved embryo transfer; Clinical outcome
Differences in 5-hydroxymethylcytosine, 5hmC, distributions may complicate previous observations of abnormal cytosine methylation statuses that are used for the identification of new tumor suppressor gene candidates that are relevant to human hepatocarcinogenesis. The simultaneous detection of 5-methylcytosine and 5-hydroxymethylcytosine is likely to stimulate the discovery of aberrantly methylated genes with increased accuracy in human hepatocellular carcinoma.
Here, we performed ultra-performance liquid chromatography/tandem mass spectrometry and single-base high-throughput sequencing, Hydroxymethylation and Methylation Sensitive Tag sequencing, HMST-seq, to synchronously measure these two modifications in human hepatocellular carcinoma samples. After identification of differentially methylated and hydroxymethylated genes in human hepatocellular carcinoma, we integrate DNA copy-number alterations, as determined using array-based comparative genomic hybridization data, with gene expression to identify genes that are potentially silenced by promoter hypermethylation.
We report a high enrichment of genes with epigenetic aberrations in cancer signaling pathways. Six genes were selected as tumor suppressor gene candidates, among which, ECM1, ATF5 and EOMES are confirmed via siRNA experiments to have potential anti-cancer functions.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0533-9) contains supplementary material, which is available to authorized users.
5-methylcytosine (5-mC) can be oxidized to 5-hydroxymethylcytosine (5-hmC). Genome-wide profiling of 5-hmC thus far indicates 5-hmC may not only be an intermediate form of DNA demethylation but could also constitute an epigenetic mark per se. Here we describe a cost-effective and selective method to detect both the hydroxymethylation and methylation status of cytosines in a subset of cytosines in the human genome. This method involves the selective glucosylation of 5-hmC residues, short-Sequence tag generation and high-throughput sequencing. We tested this method by screening H9 human embryonic stem cells and their differentiated embroid body cells, and found that differential hydroxymethylation preferentially occurs in bivalent genes during cellular differentiation. Especially, our results support hydroxymethylation can regulate key transcription regulators with bivalent marks through demethylation and affect cellular decision on choosing active or inactive state of these genes upon cellular differentiation. Future application of this technology would enable us to uncover the status of methylation and hydroxymethylation in dynamic biological processes and disease development in multiple biological samples.
HMST-Seq; differentiation; embryonic stem cells; hydroxymethylation; methylation
A single–base pair resolution silkworm genetic variation map was constructed from 40 domesticated and wild silkworms, each sequenced to approximately threefold coverage, representing 99.88% of the genome. We identified ∼16 million single-nucleotide polymorphisms, many indels, and structural variations. We find that the domesticated silkworms are clearly genetically differentiated from the wild ones, but they have maintained large levels of genetic variability, suggesting a short domestication event involving a large number of individuals. We also identified signals of selection at 354 candidate genes that may have been important during domestication, some of which have enriched expression in the silk gland, midgut, and testis. These data add to our understanding of the domestication processes and may have applications in devising pest control strategies and advancing the use of silkworms as efficient bioreactors.
We report here the genome sequence of an ancient human. Obtained from ∼4,000-year-old permafrost-preserved hair, the genome represents a male individual from the first known culture to settle in Greenland. Sequenced to an average depth of 20×, we recover 79% of the diploid genome, an amount close to the practical limit of current sequencing technologies. We identify 353,151 high-confidence single-nucleotide polymorphisms (SNPs), of which 6.8% have not been reported previously. We estimate raw read contamination to be no higher than 0.8%. We use functional SNP assessment to assign possible phenotypic characteristics of the individual that belonged to a culture whose location has yielded only trace human remains. We compare the high-confidence SNPs to those of contemporary populations to find the populations most closely related to the individual. This provides evidence for a migration from Siberia into the New World some 5,500 years ago, independent of that giving rise to the modern Native Americans and Inuit.
Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes.
Despite an increase in the number of molecular epidemiological studies conducted in recent years to evaluate the association between human papillomavirus (HPV) and the risk of breast carcinoma, these studies remain inconclusive. Here we aim to detect HPV DNA in various tissues from patients with breast carcinoma using the method of HPV capture combined with massive paralleled sequencing (MPS). To validate the confidence of our methods, 15 cervical cancer samples were tested by PCR and the new method. Results showed that there was 100% consistence between the two methods.DNA from peripheral blood, tumor tissue, adjacent lymph nodes and adjacent normal tissue were collected from seven malignant breast cancer patients, and HPV type 16(HPV16) was detected in 1/7, 1/7, 1/7and 1/7 of patients respectively. Peripheral blood, tumor tissue and adjacent normal tissue were also collected from two patients with benign breast tumor, and 1/2, 2/2 and 2/2 was detected to have HPV16 DNA respectively. MPS metrics including mapping ratio, coverage, depth and SNVs were provided to characterize HPV in samples. The average coverage was 69% and 61.2% for malignant and benign samples respectively. 126 SNVs were identified in all 9 samples. The maximum number of SNVs was located in the gene of E2 and E4 among all samples. Our study not only provided an efficient method to capture HPV DNA, but detected the SNVS, coverage, SNV type and depth. The finding has provided further clue of association between HPV16 and breast cancer.
Copy number variations (CNVs) represent an important type of genetic variation that deeply impact phenotypic polymorphisms and human diseases. The advent of high-throughput sequencing technologies provides an opportunity to revolutionize the discovery of CNVs and to explore their relationship with diseases. However, most of the existing methods depend on sequencing depth and show instability with low sequence coverage. In this study, using low coverage whole-genome sequencing (LCS) we have developed an effective population-scale CNV calling (PSCC) method.
In our novel method, two-step correction was used to remove biases caused by local GC content and complex genomic characteristics. We chose a binary segmentation method to locate CNV segments and designed combined statistics tests to ensure the stable performance of the false positive control. The simulation data showed that our PSCC method could achieve 99.7%/100% and 98.6%/100% sensitivity and specificity for over 300 kb CNV calling in the condition of LCS (∼2×) and ultra LCS (∼0.2×), respectively. Finally, we applied this novel method to analyze 34 clinical samples with an average of 2× LCS. In the final results, all the 31 pathogenic CNVs identified by aCGH were successfully detected. In addition, the performance comparison revealed that our method had significant advantages over existing methods using ultra LCS.
Our study showed that PSCC can sensitively and reliably detect CNVs using low coverage or even ultra-low coverage data through population-scale sequencing.
To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million nonredundant microbial genes, derived from 576.7 Gb sequence, from faecal samples of 124 European individuals. The gene set, ~150 times larger than the human gene complement, contains an overwhelming majority of the prevalent microbial genes of the cohort and likely includes a large proportion of the prevalent human intestinal microbial genes. The genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, suggesting that the entire cohort harbours between 1000 and 1150 prevalent bacterial species and each individual at least 160 such species, which are also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of functions encoded by the gene set.
Residents of the Tibetan Plateau show heritable adaptations to extreme altitude. We sequenced 50 exomes of ethnic Tibetans, encompassing coding sequences of 92% of human genes, with an average coverage of 18X per individual. Genes showing population-specific allele frequency changes, which represent strong candidates for altitude adaptation, were identified. The strongest signal of natural selection came from EPAS1, a transcription factor involved in response to hypoxia. One SNP at EPAS1 shows a 78% frequency difference between Tibetan and Han samples, representing the fastest allele frequency change observed at any human gene to date. This SNP’s association with erythrocyte abundance supports the role of EPAS1 in adaptation to hypoxia. Thus, a population genomic survey has revealed a functionally important locus in genetic adaptation to high altitude.
The applications of massively parallel sequencing technology to fetal cell-free DNA (cff-DNA) have brought new insight to non-invasive prenatal diagnosis. However, most previous research based on maternal plasma sequencing has been restricted to fetal aneuploidies. To detect specific parentally inherited mutations, invasive approaches to obtain fetal DNA are the current standard in the clinic because of the experimental complexity and resource consumption of previously reported non-invasive approaches.
Here, we present a simple and effective non-invasive method for accurate fetal genome recovery-assisted with parental haplotypes. The parental haplotype were firstly inferred using a combination strategy of trio and unrelated individuals. Assisted with the parental haplotype, we then employed a hidden Markov model to non-invasively recover the fetal genome through maternal plasma sequencing.
Using a sequence depth of approximately 44X against a an approximate 5.69% cff-DNA concentration, we non-invasively inferred fetal genotype and haplotype under different situations of parental heterozygosity. Our data show that 98.57%, 95.37%, and 98.45% of paternal autosome alleles, maternal autosome alleles, and maternal chromosome X in the fetal haplotypes, respectively, were recovered accurately. Additionally, we obtained efficient coverage or strong linkage of 96.65% of reported Mendelian-disorder genes and 98.90% of complex disease-associated markers.
Our method provides a useful strategy for non-invasive whole fetal genome recovery.
Copy number variations (CNVs), a common genomic mutation associated with various diseases, are important in research and clinical applications. Whole genome amplification (WGA) and massively parallel sequencing have been applied to single cell CNVs analysis, which provides new insight for the fields of biology and medicine. However, the WGA-induced bias significantly limits sensitivity and specificity for CNVs detection. Addressing these limitations, we developed a practical bioinformatic methodology for CNVs detection at the single cell level using low coverage massively parallel sequencing. This method consists of GC correction for WGA-induced bias removal, binary segmentation algorithm for locating CNVs breakpoints, and dynamic threshold determination for final signals filtering. Afterwards, we evaluated our method with seven test samples using low coverage sequencing (4∼9.5%). Four single-cell samples from peripheral blood, whose karyotypes were confirmed by whole genome sequencing analysis, were acquired. Three other test samples derived from blastocysts whose karyotypes were confirmed by SNP-array analysis were also recruited. The detection results for CNVs of larger than 1 Mb were highly consistent with confirmed results reaching 99.63% sensitivity and 97.71% specificity at base-pair level. Our study demonstrates the potential to overcome WGA-bias and to detect CNVs (>1 Mb) at the single cell level through low coverage massively parallel sequencing. It highlights the potential for CNVs research on single cells or limited DNA samples and may prove as a promising tool for research and clinical applications, such as pre-implantation genetic diagnosis/screening, fetal nucleated red blood cells research and cancer heterogeneity analysis.
Reduced representation bisulfite sequencing (RRBS) was developed to measure DNA methylation of high-CG regions at single base-pair resolution, and has been widely used because of its minimal DNA requirements and cost efficacy; however, the CpG coverage of genomic regions is restricted and important regions with low-CG will be ignored in DNA methylation profiling. This method could be improved to generate a more comprehensive representation.
Based on in silico simulation of enzyme digestion of human and mouse genomes, we have optimized the current single-enzyme RRBS by applying double enzyme digestion in the library construction to interrogate more representative regions. CpG coverage of genomic regions was considerably increased in both high-CG and low-CG regions using the double-enzyme RRBS method, leading to more accurate detection of their average methylation levels and identification of differential methylation regions between samples. We also applied this double-enzyme RRBS method to comprehensively analyze the CpG methylation profiles of two colorectal cancer cell lines.
The double-enzyme RRBS increases the CpG coverage of genomic regions considerably over the previous single-enzyme RRBS method, leading to more accurate detection of their average methylation levels. It will facilitate genome-wide DNA methylation studies in multiple and complex clinical samples.
Single-enzyme RRBS; Double-enzyme RRBS; DNA methylation; CpG coverage
Conventional prenatal screening tests, such as maternal serum tests and ultrasound scan, have limited resolution and accuracy.
We developed an advanced noninvasive prenatal diagnosis method based on massively parallel sequencing. The Noninvasive Fetal Trisomy (NIFTY) test, combines an optimized Student’s t-test with a locally weighted polynomial regression and binary hypotheses. We applied the NIFTY test to 903 pregnancies and compared the diagnostic results with those of full karyotyping.
16 of 16 trisomy 21, 12 of 12 trisomy 18, two of two trisomy 13, three of four 45, X, one of one XYY and two of two XXY abnormalities were correctly identified. But one false positive case of trisomy 18 and one false negative case of 45, X were observed. The test performed with 100% sensitivity and 99.9% specificity for autosomal aneuploidies and 85.7% sensitivity and 99.9% specificity for sex chromosomal aneuploidies. Compared with three previously reported z-score approaches with/without GC-bias removal and with internal control, the NIFTY test was more accurate and robust for the detection of both autosomal and sex chromosomal aneuploidies in fetuses.
Our study demonstrates a powerful and reliable methodology for noninvasive prenatal diagnosis.
Noninvasive Fetal Trisomy (NIFTY) test; Massively parallel sequencing; Autosomal aneuploidies; Sex chromosomal aneuploidies
It is evident that epigenetic factors, especially DNA methylation, play essential roles in obesity development. Using pig as a model, here we investigated the systematic association between DNA methylation and obesity. We sampled eight variant adipose and two distinct skeletal muscle tissues from three pig breeds living within comparable environments but displaying distinct fat level. We generated 1,381 gigabases (Gb) of sequence data from 180 methylated DNA immunoprecipitation (MeDIP) libraries, and provided a genome-wide DNA methylation map as well as a gene expression map for adipose and muscle studies. The analysis showed global similarity and difference among breeds, sexes and anatomic locations, and identified the differentially methylated regions (DMRs). The DMRs in promoters are highly associated with obesity development via expression repression of both known obesity-related genes and novel genes. This comprehensive map provides a solid basis for exploring epigenetic mechanisms of adipose deposition and muscle growth.
DNA methylation plays an essential role in regulating gene expression under a variety of conditions and it has therefore been hypothesized to underlie the transitions between life cycle stages in parasitic nematodes. So far, however, 5'-cytosine methylation has not been detected during any developmental stage of the nematode Caenorhabditis elegans. Given the new availability of high-resolution methylation detection methods, an investigation of life cycle methylation in a parasitic nematode can now be carried out.
Here, using MethylC-seq, we present the first study to confirm the existence of DNA methylation in the parasitic nematode Trichinella spiralis, and we characterize the methylomes of the three life-cycle stages of this food-borne infectious human pathogen. We observe a drastic increase in DNA methylation during the transition from the new born to mature stage, and we further identify parasitism-related genes that show changes in DNA methylation status between life cycle stages.
Our data contribute to the understanding of the developmental changes that occur in an important human parasite, and raises the possibility that targeting DNA methylation processes may be a useful strategy in developing therapeutics to impede infection. In addition, our conclusion that DNA methylation is a mechanism for life cycle transition in T. spiralis prompts the question of whether this may also be the case in any other metazoans. Finally, our work constitutes the first report, to our knowledge, of DNA methylation in a nematode, prompting a re-evaluation of phyla in which this epigenetic mark was thought to be absent.
The relatively short read lengths from next generation sequencing (NGS) technologies still pose a challenge for de novo assembly of complex mammal genomes. One important solution is to use paired-end (PE) sequence information experimentally obtained from long-range DNA fragments (>1 kb). Here, we characterize and extend a long-range PE library construction method based on direct intra-molecule ligation (or molecular linker-free circularization) for NGS.
We found that the method performs stably for PE sequencing of 2- to 5- kb DNA fragments, and can be extended to 10–20 kb (and even in extremes, up to ∼35 kb). We also characterized the impact of low quality input DNA on the method, and develop a whole-genome amplification (WGA) based protocol using limited input DNA (<1 µg). Using this PE dataset, we accurately assembled the YanHuang (YH) genome, the first sequenced Asian genome, into a scaffold N50 size of >2 Mb, which is over100-times greater than the initial size produced with only small insert PE reads(17 kb). In addition, we mapped two 7- to 8- kb insertions in the YH genome using the larger insert sizes of the long-range PE data.
In conclusion, we demonstrate here the effectiveness of this long-range PE sequencing method and its use for the de novo assembly of a large, complex genome using NGS short reads.
Cancers arise through an evolutionary process in which cell populations are subjected to selection; however, to date, the process of bladder cancer, which is one of the most common cancers in the world, remains unknown at a single-cell level.
We carried out single-cell exome sequencing of 66 individual tumor cells from a muscle-invasive bladder transitional cell carcinoma (TCC). Analyses of the somatic mutant allele frequency spectrum and clonal structure revealed that the tumor cells were derived from a single ancestral cell, but that subsequent evolution occurred, leading to two distinct tumor cell subpopulations. By analyzing recurrently mutant genes in an additional cohort of 99 TCC tumors, we identified genes that might play roles in the maintenance of the ancestral clone and in the muscle-invasive capability of subclones of this bladder cancer, respectively.
This work provides a new approach of investigating the genetic details of bladder tumoral changes at the single-cell level and a new method for assessing bladder cancer evolution at a cell-population level.
Single-cell exome sequencing; Bladder cancer; Tumor evolution; Population genetics
Fetal chromosomal abnormalities are the most common reasons for invasive prenatal testing. Currently, G-band karyotyping and several molecular genetic methods have been established for diagnosis of chromosomal abnormalities. Although these testing methods are highly reliable, the major limitation remains restricted resolutions or can only achieve limited coverage on the human genome at one time. The massively parallel sequencing (MPS) technologies which can reach single base pair resolution allows detection of genome-wide intragenic deletions and duplication challenging karyotyping and microarrays as the tool for prenatal diagnosis. Here we reported a novel and robust MPS-based method to detect aneuploidy and imbalanced chromosomal arrangements in amniotic fluid (AF) samples. We sequenced 62 AF samples on Illumina GAIIx platform and with averagely 0.01× whole genome sequencing data we detected 13 samples with numerical chromosomal abnormalities by z-test. With up to 2× whole genome sequencing data we were able to detect microdeletion/microduplication (ranged from 1.4 Mb to 37.3 Mb of 5 samples from chorionic villus sampling (CVS) using SeqSeq algorithm. Our work demonstrated MPS is a robust and accurate approach to detect aneuploidy and imbalanced chromosomal arrangements in prenatal samples.