To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million nonredundant microbial genes, derived from 576.7 Gb sequence, from faecal samples of 124 European individuals. The gene set, ~150 times larger than the human gene complement, contains an overwhelming majority of the prevalent microbial genes of the cohort and likely includes a large proportion of the prevalent human intestinal microbial genes. The genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, suggesting that the entire cohort harbours between 1000 and 1150 prevalent bacterial species and each individual at least 160 such species, which are also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of functions encoded by the gene set.
The applications of massively parallel sequencing technology to fetal cell-free DNA (cff-DNA) have brought new insight to non-invasive prenatal diagnosis. However, most previous research based on maternal plasma sequencing has been restricted to fetal aneuploidies. To detect specific parentally inherited mutations, invasive approaches to obtain fetal DNA are the current standard in the clinic because of the experimental complexity and resource consumption of previously reported non-invasive approaches.
Here, we present a simple and effective non-invasive method for accurate fetal genome recovery-assisted with parental haplotypes. The parental haplotype were firstly inferred using a combination strategy of trio and unrelated individuals. Assisted with the parental haplotype, we then employed a hidden Markov model to non-invasively recover the fetal genome through maternal plasma sequencing.
Using a sequence depth of approximately 44X against a an approximate 5.69% cff-DNA concentration, we non-invasively inferred fetal genotype and haplotype under different situations of parental heterozygosity. Our data show that 98.57%, 95.37%, and 98.45% of paternal autosome alleles, maternal autosome alleles, and maternal chromosome X in the fetal haplotypes, respectively, were recovered accurately. Additionally, we obtained efficient coverage or strong linkage of 96.65% of reported Mendelian-disorder genes and 98.90% of complex disease-associated markers.
Our method provides a useful strategy for non-invasive whole fetal genome recovery.
Copy number variations (CNVs), a common genomic mutation associated with various diseases, are important in research and clinical applications. Whole genome amplification (WGA) and massively parallel sequencing have been applied to single cell CNVs analysis, which provides new insight for the fields of biology and medicine. However, the WGA-induced bias significantly limits sensitivity and specificity for CNVs detection. Addressing these limitations, we developed a practical bioinformatic methodology for CNVs detection at the single cell level using low coverage massively parallel sequencing. This method consists of GC correction for WGA-induced bias removal, binary segmentation algorithm for locating CNVs breakpoints, and dynamic threshold determination for final signals filtering. Afterwards, we evaluated our method with seven test samples using low coverage sequencing (4∼9.5%). Four single-cell samples from peripheral blood, whose karyotypes were confirmed by whole genome sequencing analysis, were acquired. Three other test samples derived from blastocysts whose karyotypes were confirmed by SNP-array analysis were also recruited. The detection results for CNVs of larger than 1 Mb were highly consistent with confirmed results reaching 99.63% sensitivity and 97.71% specificity at base-pair level. Our study demonstrates the potential to overcome WGA-bias and to detect CNVs (>1 Mb) at the single cell level through low coverage massively parallel sequencing. It highlights the potential for CNVs research on single cells or limited DNA samples and may prove as a promising tool for research and clinical applications, such as pre-implantation genetic diagnosis/screening, fetal nucleated red blood cells research and cancer heterogeneity analysis.
Reduced representation bisulfite sequencing (RRBS) was developed to measure DNA methylation of high-CG regions at single base-pair resolution, and has been widely used because of its minimal DNA requirements and cost efficacy; however, the CpG coverage of genomic regions is restricted and important regions with low-CG will be ignored in DNA methylation profiling. This method could be improved to generate a more comprehensive representation.
Based on in silico simulation of enzyme digestion of human and mouse genomes, we have optimized the current single-enzyme RRBS by applying double enzyme digestion in the library construction to interrogate more representative regions. CpG coverage of genomic regions was considerably increased in both high-CG and low-CG regions using the double-enzyme RRBS method, leading to more accurate detection of their average methylation levels and identification of differential methylation regions between samples. We also applied this double-enzyme RRBS method to comprehensively analyze the CpG methylation profiles of two colorectal cancer cell lines.
The double-enzyme RRBS increases the CpG coverage of genomic regions considerably over the previous single-enzyme RRBS method, leading to more accurate detection of their average methylation levels. It will facilitate genome-wide DNA methylation studies in multiple and complex clinical samples.
Single-enzyme RRBS; Double-enzyme RRBS; DNA methylation; CpG coverage
Conventional prenatal screening tests, such as maternal serum tests and ultrasound scan, have limited resolution and accuracy.
We developed an advanced noninvasive prenatal diagnosis method based on massively parallel sequencing. The Noninvasive Fetal Trisomy (NIFTY) test, combines an optimized Student’s t-test with a locally weighted polynomial regression and binary hypotheses. We applied the NIFTY test to 903 pregnancies and compared the diagnostic results with those of full karyotyping.
16 of 16 trisomy 21, 12 of 12 trisomy 18, two of two trisomy 13, three of four 45, X, one of one XYY and two of two XXY abnormalities were correctly identified. But one false positive case of trisomy 18 and one false negative case of 45, X were observed. The test performed with 100% sensitivity and 99.9% specificity for autosomal aneuploidies and 85.7% sensitivity and 99.9% specificity for sex chromosomal aneuploidies. Compared with three previously reported z-score approaches with/without GC-bias removal and with internal control, the NIFTY test was more accurate and robust for the detection of both autosomal and sex chromosomal aneuploidies in fetuses.
Our study demonstrates a powerful and reliable methodology for noninvasive prenatal diagnosis.
Noninvasive Fetal Trisomy (NIFTY) test; Massively parallel sequencing; Autosomal aneuploidies; Sex chromosomal aneuploidies
It is evident that epigenetic factors, especially DNA methylation, play essential roles in obesity development. Using pig as a model, here we investigated the systematic association between DNA methylation and obesity. We sampled eight variant adipose and two distinct skeletal muscle tissues from three pig breeds living within comparable environments but displaying distinct fat level. We generated 1,381 gigabases (Gb) of sequence data from 180 methylated DNA immunoprecipitation (MeDIP) libraries, and provided a genome-wide DNA methylation map as well as a gene expression map for adipose and muscle studies. The analysis showed global similarity and difference among breeds, sexes and anatomic locations, and identified the differentially methylated regions (DMRs). The DMRs in promoters are highly associated with obesity development via expression repression of both known obesity-related genes and novel genes. This comprehensive map provides a solid basis for exploring epigenetic mechanisms of adipose deposition and muscle growth.
The relatively short read lengths from next generation sequencing (NGS) technologies still pose a challenge for de novo assembly of complex mammal genomes. One important solution is to use paired-end (PE) sequence information experimentally obtained from long-range DNA fragments (>1 kb). Here, we characterize and extend a long-range PE library construction method based on direct intra-molecule ligation (or molecular linker-free circularization) for NGS.
We found that the method performs stably for PE sequencing of 2- to 5- kb DNA fragments, and can be extended to 10–20 kb (and even in extremes, up to ∼35 kb). We also characterized the impact of low quality input DNA on the method, and develop a whole-genome amplification (WGA) based protocol using limited input DNA (<1 µg). Using this PE dataset, we accurately assembled the YanHuang (YH) genome, the first sequenced Asian genome, into a scaffold N50 size of >2 Mb, which is over100-times greater than the initial size produced with only small insert PE reads(17 kb). In addition, we mapped two 7- to 8- kb insertions in the YH genome using the larger insert sizes of the long-range PE data.
In conclusion, we demonstrate here the effectiveness of this long-range PE sequencing method and its use for the de novo assembly of a large, complex genome using NGS short reads.
Cancers arise through an evolutionary process in which cell populations are subjected to selection; however, to date, the process of bladder cancer, which is one of the most common cancers in the world, remains unknown at a single-cell level.
We carried out single-cell exome sequencing of 66 individual tumor cells from a muscle-invasive bladder transitional cell carcinoma (TCC). Analyses of the somatic mutant allele frequency spectrum and clonal structure revealed that the tumor cells were derived from a single ancestral cell, but that subsequent evolution occurred, leading to two distinct tumor cell subpopulations. By analyzing recurrently mutant genes in an additional cohort of 99 TCC tumors, we identified genes that might play roles in the maintenance of the ancestral clone and in the muscle-invasive capability of subclones of this bladder cancer, respectively.
This work provides a new approach of investigating the genetic details of bladder tumoral changes at the single-cell level and a new method for assessing bladder cancer evolution at a cell-population level.
Single-cell exome sequencing; Bladder cancer; Tumor evolution; Population genetics
Fetal chromosomal abnormalities are the most common reasons for invasive prenatal testing. Currently, G-band karyotyping and several molecular genetic methods have been established for diagnosis of chromosomal abnormalities. Although these testing methods are highly reliable, the major limitation remains restricted resolutions or can only achieve limited coverage on the human genome at one time. The massively parallel sequencing (MPS) technologies which can reach single base pair resolution allows detection of genome-wide intragenic deletions and duplication challenging karyotyping and microarrays as the tool for prenatal diagnosis. Here we reported a novel and robust MPS-based method to detect aneuploidy and imbalanced chromosomal arrangements in amniotic fluid (AF) samples. We sequenced 62 AF samples on Illumina GAIIx platform and with averagely 0.01× whole genome sequencing data we detected 13 samples with numerical chromosomal abnormalities by z-test. With up to 2× whole genome sequencing data we were able to detect microdeletion/microduplication (ranged from 1.4 Mb to 37.3 Mb of 5 samples from chorionic villus sampling (CVS) using SeqSeq algorithm. Our work demonstrated MPS is a robust and accurate approach to detect aneuploidy and imbalanced chromosomal arrangements in prenatal samples.
Restriction Enzyme-based Reduced Representation Library (RRL) method represents a relatively feasible and flexible strategy used for Single Nucleotide Polymorphism (SNP) identification in different species. It has remarkable advantage of reducing the complexity of the genome by orders of magnitude. However, comprehensive evaluation for actual efficacy of SNP identification by this method is still unavailable.
In order to evaluate the efficacy of Restriction Enzyme-based RRL method, we selected Tsp 45I enzyme which covers 266 Mb flanking region of the enzyme recognition site according to in silico simulation on human reference genome, then we sequenced YH RRL after Tsp 45I treatment and obtained reads of which 80.8% were mapped to target region with an 20-fold average coverage, about 96.8% of target region was covered by at least one read and 257 K SNPs were identified in the region using SOAPsnp software.
Compared with whole genome resequencing data, we observed false discovery rate (FDR) of 13.95% and false negative rate (FNR) of 25.90%. The concordance rate of homozygote loci was over 99.8%, but that of heterozygote were only 92.56%. Repeat sequences and bases quality were proved to have a great effect on the accuracy of SNP calling, SNPs in recognition sites contributed evidently to the high FNR and the low concordance rate of heterozygote. Our results indicated that repeat masking and high stringent filter criteria could significantly decrease both FDR and FNR.
This study demonstrates that Restriction Enzyme-based RRL method was effective for SNP identification. The results highlight the important role of bias and the method-derived defects represented in this method and emphasize the special attentions noteworthy.
DNA methylation plays important roles in gene regulation during both normal developmental and disease states. In the past decade, a number of methods have been developed and applied to characterize the genome-wide distribution of DNA methylation. Most of these methods endeavored to screen whole genome and turned to be enormously costly and time consuming for studies of the complex mammalian genome. Thus, they are not practical for researchers to study multiple clinical samples in biomarker research.
Here, we display a novel strategy that relies on the selective capture of target regions by liquid hybridization followed by bisulfite conversion and deep sequencing, which is referred to as liquid hybridization capture-based bisulfite sequencing (LHC-BS). To estimate this method, we utilized about 2 μg of native genomic DNA from YanHuang (YH) whole blood samples and a mature dendritic cell (mDC) line, respectively, to evaluate their methylation statuses of target regions of exome. The results indicated that the LHC-BS system was able to cover more than 97% of the exome regions and detect their methylation statuses with acceptable allele dropouts. Most of the regions that couldn't provide accurate methylation information were distributed in chromosomes 6 and Y because of multiple mapping to those regions. The accuracy of this strategy was evaluated by pair-wise comparisons using the results from whole genome bisulfite sequencing and validated by bisulfite specific PCR sequencing.
In the present study, we employed a liquid hybridisation capture system to enrich for exon regions and then combined with bisulfite sequencing to examine the methylation statuses for the first time. This technique is highly sensitive and flexible and can be applied to identify differentially methylated regions (DMRs) at specific genomic locations of interest, such as regulatory elements or promoters.
DNA methylation aberration and microRNA (miRNA) deregulation have been observed in many types of cancers. A systematic study of methylome and transcriptome in bladder urothelial carcinoma has never been reported.
The DNA methylation was profiled by modified methylation-specific digital karyotyping (MMSDK) and the expression of mRNAs and miRNAs was analyzed by digital gene expression (DGE) sequencing in tumors and matched normal adjacent tissues obtained from 9 bladder urothelial carcinoma patients. We found that a set of significantly enriched pathways disrupted in bladder urothelial carcinoma primarily related to “neurogenesis” and “cell differentiation” by integrated analysis of -omics data. Furthermore, we identified an intriguing collection of cancer-related genes that were deregulated at the levels of DNA methylation and mRNA expression, and we validated several of these genes (HIC1, SLIT2, RASAL1, and KRT17) by Bisulfite Sequencing PCR and Reverse Transcription qPCR in a panel of 33 bladder cancer samples.
We characterized the profiles between methylome and transcriptome in bladder urothelial carcinoma, identified a set of significantly enriched key pathways, and screened four aberrantly methylated and expressed genes. Conclusively, our findings shed light on a new avenue for basic bladder cancer research.
Animal breeding via Somatic Cell Nuclear Transfer (SCNT) has enormous potential in agriculture and biomedicine. However, concerns about whether SCNT animals are as healthy or epigenetically normal as conventionally bred ones are raised as the efficiency of cloning by SCNT is much lower than natural breeding or In-vitro fertilization (IVF). Thus, we have conducted a genome-wide gene expression and DNA methylation profiling between phenotypically normal cloned pigs and control pigs in two tissues (muscle and liver), using Affymetrix Porcine expression array as well as modified methylation-specific digital karyotyping (MMSDK) and Solexa sequencing technology. Typical tissue-specific differences with respect to both gene expression and DNA methylation were observed in muscle and liver from cloned as well as control pigs. Gene expression profiles were highly similar between cloned pigs and controls, though a small set of genes showed altered expression. Cloned pigs presented a more different pattern of DNA methylation in unique sequences in both tissues. Especially a small set of genomic sites had different DNA methylation status with a trend towards slightly increased methylation levels in cloned pigs. Molecular network analysis of the genes that contained such differential methylation loci revealed a significant network related to tissue development. In conclusion, our study showed that phenotypically normal cloned pigs were highly similar with normal breeding pigs in their gene expression, but moderate alteration in DNA methylation aspects still exists, especially in certain unique genomic regions.
Exome sequencing, which allows the global analysis of protein coding sequences in the human genome, has become an effective and affordable approach to detecting causative genetic mutations in diseases. Currently, there are several commercial human exome capture platforms; however, the relative performances of these have not been characterized sufficiently to know which is best for a particular study.
We comprehensively compared three platforms: NimbleGen's Sequence Capture Array and SeqCap EZ, and Agilent's SureSelect. We assessed their performance in a variety of ways, including number of genes covered and capture efficacy. Differences that may impact on the choice of platform were that Agilent SureSelect covered approximately 1,100 more genes, while NimbleGen provided better flanking sequence capture. Although all three platforms achieved similar capture specificity of targeted regions, the NimbleGen platforms showed better uniformity of coverage and greater genotype sensitivity at 30- to 100-fold sequencing depth. All three platforms showed similar power in exome SNP calling, including medically relevant SNPs. Compared with genotyping and whole-genome sequencing data, the three platforms achieved a similar accuracy of genotype assignment and SNP detection. Importantly, all three platforms showed similar levels of reproducibility, GC bias and reference allele bias.
We demonstrate key differences between the three platforms, particularly advantages of solutions over array capture and the importance of a large gene target set.
Genome-wide gene expression profile using deep sequencing technologies can drive the discovery of cancer biomarkers and therapeutic targets. Such efforts are often limited to profiling the expression signature of either mRNA or microRNA (miRNA) in a single type of cancer.
Here we provided an integrated analysis of the genome-wide mRNA and miRNA expression profiles of three different genitourinary cancers: carcinomas of the bladder, kidney and testis.
Our results highlight the general or cancer-specific roles of several genes and miRNAs that may serve as candidate oncogenes or suppressors of tumor development. Further comparative analyses at the systems level revealed that significant aberrations of the cell adhesion process, p53 signaling, calcium signaling, the ECM-receptor and cell cycle pathways, the DNA repair and replication processes and the immune and inflammatory response processes were the common hallmarks of human cancers. Gene sets showing testicular cancer-specific deregulation patterns were mainly implicated in processes related to male reproductive function, and general disruptions of multiple metabolic pathways and processes related to cell migration were the characteristic molecular events for renal and bladder cancer, respectively. Furthermore, we also demonstrated that tumors with the same histological origins and genes with similar functions tended to group together in a clustering analysis. By assessing the correlation between the expression of each miRNA and its targets, we determined that deregulation of ‘key’ miRNAs may result in the global aberration of one or more pathways or processes as a whole.
This systematic analysis deciphered the molecular phenotypes of three genitourinary cancers and investigated their variations at the miRNA level simultaneously. Our results provided a valuable source for future studies and highlighted some promising genes, miRNAs, pathways and processes that may be useful for diagnostic or therapeutic applications.
Massively parallel sequencing of DNA molecules in the plasma of pregnant women has been shown to allow accurate and noninvasive prenatal detection of fetal trisomy 21. However, whether the sequencing approach is as accurate for the noninvasive prenatal diagnosis of trisomy 13 and 18 is unclear due to the lack of data from a large sample set. We studied 392 pregnancies, among which 25 involved a trisomy 13 fetus and 37 involved a trisomy 18 fetus, by massively parallel sequencing. By using our previously reported standard z-score approach, we demonstrated that this approach could identify 36.0% and 73.0% of trisomy 13 and 18 at specificities of 92.4% and 97.2%, respectively. We aimed to improve the detection of trisomy 13 and 18 by using a non-repeat-masked reference human genome instead of a repeat-masked one to increase the number of aligned sequence reads for each sample. We then applied a bioinformatics approach to correct GC content bias in the sequencing data. With these measures, we detected all (25 out of 25) trisomy 13 fetuses at a specificity of 98.9% (261 out of 264 non-trisomy 13 cases), and 91.9% (34 out of 37) of the trisomy 18 fetuses at 98.0% specificity (247 out of 252 non-trisomy 18 cases). These data indicate that with appropriate bioinformatics analysis, noninvasive prenatal diagnosis of trisomy 13 and trisomy 18 by maternal plasma DNA sequencing is achievable.
We have investigated the role of human endogenous retroviruses in multiple sclerosis by analyzing the DNA of patients and controls in 4 cohorts for associations between multiple sclerosis and polymorphisms near viral restriction genes or near endogenous retroviral loci with one or more intact or almost-intact genes. We found that SNPs in the gene TRIM5 were inversely correlated with disease. Conversely, SNPs around one retroviral locus, HERV-Fc1, showed a highly significant association with disease. The latter association was limited to a narrow region that contains no other known genes. We conclude that HERV-Fc1 and TRIM5 play a role in the etiology of multiple sclerosis. If these results are confirmed, they point to new modes of treatment for multiple sclerosis.
Objectives To validate the clinical efficacy and practical feasibility of massively parallel maternal plasma DNA sequencing to screen for fetal trisomy 21 among high risk pregnancies clinically indicated for amniocentesis or chorionic villus sampling.
Design Diagnostic accuracy validated against full karyotyping, using prospectively collected or archived maternal plasma samples.
Setting Prenatal diagnostic units in Hong Kong, United Kingdom, and the Netherlands.
Participants 753 pregnant women at high risk for fetal trisomy 21 who underwent definitive diagnosis by full karyotyping, of whom 86 had a fetus with trisomy 21.
Intervention Multiplexed massively parallel sequencing of DNA molecules in maternal plasma according to two protocols with different levels of sample throughput: 2-plex and 8-plex sequencing.
Main outcome measures Proportion of DNA molecules that originated from chromosome 21. A trisomy 21 fetus was diagnosed when the z score for the proportion of chromosome 21 DNA molecules was >3. Diagnostic sensitivity, specificity, positive predictive value, and negative predictive value were calculated for trisomy 21 detection.
Results Results were available from 753 pregnancies with the 8-plex sequencing protocol and from 314 pregnancies with the 2-plex protocol. The performance of the 2-plex protocol was superior to that of the 8-plex protocol. With the 2-plex protocol, trisomy 21 fetuses were detected at 100% sensitivity and 97.9% specificity, which resulted in a positive predictive value of 96.6% and negative predictive value of 100%. The 8-plex protocol detected 79.1% of the trisomy 21 fetuses and 98.9% specificity, giving a positive predictive value of 91.9% and negative predictive value of 96.9%.
Conclusion Multiplexed maternal plasma DNA sequencing analysis could be used to rule out fetal trisomy 21 among high risk pregnancies. If referrals for amniocentesis or chorionic villus sampling were based on the sequencing test results, about 98% of the invasive diagnostic procedures could be avoided.
With the advent of second-generation sequencing, the expression of gene transcripts can be digitally measured with high accuracy. The purpose of this study was to systematically profile the expression of both mRNA and miRNA genes in clear cell renal cell carcinoma (ccRCC) using massively parallel sequencing technology.
The expression of mRNAs and miRNAs were analyzed in tumor tissues and matched normal adjacent tissues obtained from 10 ccRCC patients without distant metastases. In a prevalence screen, some of the most interesting results were validated in a large cohort of ccRCC patients.
A total of 404 miRNAs and 9,799 mRNAs were detected to be differentially expressed in the 10 ccRCC patients. We also identified 56 novel miRNA candidates in at least two samples. In addition to confirming that canonical cancer genes and miRNAs (including VEGFA, DUSP9 and ERBB4; miR-210, miR-184 and miR-206) play pivotal roles in ccRCC development, promising novel candidates (such as PNCK and miR-122) without previous annotation in ccRCC carcinogenesis were also discovered in this study. Pathways controlling cell fates (e.g., cell cycle and apoptosis pathways) and cell communication (e.g., focal adhesion and ECM-receptor interaction) were found to be significantly more likely to be disrupted in ccRCC. Additionally, the results of the prevalence screen revealed that the expression of a miRNA gene cluster located on Xq27.3 was consistently downregulated in at least 76.7% of ∼50 ccRCC patients.
Our study provided a two-dimensional map of the mRNA and miRNA expression profiles of ccRCC using deep sequencing technology. Our results indicate that the phenotypic status of ccRCC is characterized by a loss of normal renal function, downregulation of metabolic genes, and upregulation of many signal transduction genes in key pathways. Furthermore, it can be concluded that downregulation of miRNA genes clustered on Xq27.3 is associated with ccRCC.
We characterize and extend a highly efficient method for constructing shotgun fragment libraries in which transposase catalyzes in vitro DNA fragmentation and adaptor incorporation simultaneously. We apply this method to sequencing a human genome and find that coverage biases are comparable to those of conventional protocols. We also extend its capabilities by developing protocols for sub-nanogram library construction, exome capture from 50 ng of input DNA, PCR-free and colony PCR library construction, and 96-plex sample indexing.
Analysis across the genome of patterns of DNA methylation reveals a rich landscape of allele-specific epigenetic modification and consequent effects on allele-specific gene expression.
DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies.
Epigenetic modifications such as addition of methyl groups to cytosine in DNA play a role in regulating gene expression. To better understand these processes, knowledge of the methylation status of all cytosine bases in the genome (the methylome) is required. DNA methylation can differ between the two gene copies (alleles) in each cell. Such allele-specific methylation (ASM) can be due to parental origin of the alleles (imprinting), X chromosome inactivation in females, and other as yet unknown mechanisms. This may significantly alter the expression profile arising from different allele combinations in different individuals. Using advanced sequencing technology, we have determined the methylome of human peripheral blood mononuclear cells (PBMC). Importantly, the PBMC were obtained from the same male Han Chinese individual whose complete genome had previously been determined. This allowed us, for the first time, to study genome-wide differences in ASM. Our analysis shows that ASM in PBMC is higher than can be accounted for by regions known to undergo parent-of-origin imprinting and frequently (>80%) correlates with allele-specific expression (ASE) of the corresponding gene. In addition, our data reveal a rich landscape of epigenomic variation for 20 genomic features, including regulatory, coding, and non-coding sequences, and provide a valuable resource for future studies. Our work further establishes whole-genome sequencing as an efficient method for methylome analysis.
MicroRNAs(miRNAs) are 18-25 nt small RNAs playing critical roles in many biological processes. The majority of known miRNAs were discovered by conventional cloning and a Sanger sequencing approach. The next-generation sequencing (NGS) technologies enable in-depth characterization of the global repertoire of miRNAs, and different protocols for miRNA library construction have been developed. However, the possible bias between the relative expression levels and sequences introduced by different protocols of library preparation have rarely been explored.
We assessed three different miRNA library preparation protocols, SOLiD, Illumina versions 1 and 1.5, using cloning or SBS sequencing of total RNA samples extracted from skeletal muscles from Hu sheep and Dorper sheep, and then validated 9 miRNAs by qRT-PCR. Our results show that SBS sequencing data highly correlate with Illumina cloning data. The SOLiD data, when compared to Illumina's, indicate more dispersed distribution of length, higher frequency variation for nucleotides near the 3'- and 5'-ends, higher frequency occurrence for reads containing end secondary structure (ESS), and higher frequency for reads that do not map to known miRNAs. qRT-PCR results showed the best correlation with SOLiD cloning data. Fold difference of Hu sheep and Dorper sheep between qRT-PCR result and SBS sequencing data correlated well (r = 0.937), and fold difference of miR-1 and miR-206 among SOLiD cloning data, qRT-PCR and SBS sequencing data was similar.
The sequencing depth can influence the quantitative measurement of miRNA abundance, but the discrepancy caused by it was not statistically significant as high correlation was observed between Illumina cloning and SBS sequencing data. Bias of length distribution, sequence variation, and ESS was observed between data obtained with the different protocols. SOLiD cloning data differ from Illumina cloning data mainly because of distinct methods of adapter ligation. The good correlation between qRT-PCR result and SOLiD data might be due to the similarities of the hybridization-based methods. The fold difference analysis indicated that methods based on hybridization may be superior for quantitative measurement of miRNA abundance. Because of the genome sequence of the sheep is not available, our data may not explain how the entire miRNA bias in the natural miRNAs in sheep or other mammal miRNA expression, unbiased artificially synthesized miRNA will help on evaluating the methodology of miRNA library preparation.
Important biological and pathological properties are often conserved across species. Although several mouse leukemia models have been well established, the genes deregulated in both human and murine leukemia cells have not been studied systematically. We performed a serial analysis of gene expression (SAGE) analysis on gene expression in both human and murine MLL-ELL or MLL-ENL leukemia cells, and identified 88 genes that appeared to be significantly deregulated in both types of leukemia cells, including 57 genes not reported previously as being deregulated in MLL-associated leukemias. These changes were validated by quantitative PCR. The most up-regulated genes include several HOX genes (e.g., HOX A5, HOXA9 and HOXA10) and MEIS1 that are the typical hallmark of MLL-rearrangement leukemia. The most down-regulated genes include LTF, LCN2, MMP9, S100A8, S100A9, PADI4, TGFBI and CYBB. Notably, the up-regulated genes are enriched in Gene Ontology terms such as “gene expression” and “transcription”, whereas the down-regulated genes are enriched in “signal transduction” and “apoptosis”. We showed that the CpG islands of the down-regulated genes are hypermethylated. We also showed that seven individual microRNAs from the mir-17-92 cluster, which are known to be overexpressed in human MLL-rearrangement leukemias, are also consistently overexpressed in mouse MLL-rearrangement leukemia cells. Nineteen possible targets of these microRNAs were identified and two of them (i.e., APP and RASSF2) were confirmed further by luciferase reporter and mutagenesis assays. The identification and validation of consistent changes of gene expression in human and murine MLL-rearrangement leukemias provides important insights into the genetic base for MLL-associated leukemogenesis.
MLL-rearrangement leukemia; evolutionarily conservation; gene expression; gene ontology; DNA methylation
DNA methylation is a widely studied epigenetic mechanism known to correlate with gene repression and genomic stability. Development of sensitive methods for global detection of DNA methylation events is of particular importance.
We here describe a technique, called modified methylation-specific digital karyotyping (MMSDK) based on methylation-specific digital karyotyping (MSDK) with a novel sequencing approach. Briefly, after a tandem digestion of genomic DNA with a methylation-sensitive mapping enzyme and a fragmenting enzyme, short sequence tags are obtained. These tags are amplified, followed by direct, massively parallel sequencing (Solexa 1G Genome Analyzer). This method allows high-throughput and low-cost genome-wide DNA methylation mapping. We applied this method to investigate global DNA methylation profiles for widely used breast cancer cell lines, MCF-7 and MDA-MB-231, which are representatives for luminal-like and mesenchymal-like cancer types, respectively. By comparison, a highly similar overall DNA methylation pattern was revealed for the two cell lines. However a cohort of individual genomic loci with significantly different DNA methylation status between two cell lines was identified. Furthermore, we revealed a genome-wide significant correlation between gene expression and the methylation status of gene promoters with CpG islands (CGIs) in the two cancer cell lines, and a correlation of gene expression and the methylation status of promoters without CGIs in MCF-7 cells.
The MMSDK method will be a valuable tool to increase the current knowledge of genome wide DNA methylation profiles.