5-methylcytosine (5-mC) can be oxidized to 5-hydroxymethylcytosine (5-hmC). Genome-wide profiling of 5-hmC thus far indicates 5-hmC may not only be an intermediate form of DNA demethylation but could also constitute an epigenetic mark per se. Here we describe a cost-effective and selective method to detect both the hydroxymethylation and methylation status of cytosines in a subset of cytosines in the human genome. This method involves the selective glucosylation of 5-hmC residues, short-Sequence tag generation and high-throughput sequencing. We tested this method by screening H9 human embryonic stem cells and their differentiated embroid body cells, and found that differential hydroxymethylation preferentially occurs in bivalent genes during cellular differentiation. Especially, our results support hydroxymethylation can regulate key transcription regulators with bivalent marks through demethylation and affect cellular decision on choosing active or inactive state of these genes upon cellular differentiation. Future application of this technology would enable us to uncover the status of methylation and hydroxymethylation in dynamic biological processes and disease development in multiple biological samples.
HMST-Seq; differentiation; embryonic stem cells; hydroxymethylation; methylation
A single–base pair resolution silkworm genetic variation map was constructed from 40 domesticated and wild silkworms, each sequenced to approximately threefold coverage, representing 99.88% of the genome. We identified ∼16 million single-nucleotide polymorphisms, many indels, and structural variations. We find that the domesticated silkworms are clearly genetically differentiated from the wild ones, but they have maintained large levels of genetic variability, suggesting a short domestication event involving a large number of individuals. We also identified signals of selection at 354 candidate genes that may have been important during domestication, some of which have enriched expression in the silk gland, midgut, and testis. These data add to our understanding of the domestication processes and may have applications in devising pest control strategies and advancing the use of silkworms as efficient bioreactors.
We report here the genome sequence of an ancient human. Obtained from ∼4,000-year-old permafrost-preserved hair, the genome represents a male individual from the first known culture to settle in Greenland. Sequenced to an average depth of 20×, we recover 79% of the diploid genome, an amount close to the practical limit of current sequencing technologies. We identify 353,151 high-confidence single-nucleotide polymorphisms (SNPs), of which 6.8% have not been reported previously. We estimate raw read contamination to be no higher than 0.8%. We use functional SNP assessment to assign possible phenotypic characteristics of the individual that belonged to a culture whose location has yielded only trace human remains. We compare the high-confidence SNPs to those of contemporary populations to find the populations most closely related to the individual. This provides evidence for a migration from Siberia into the New World some 5,500 years ago, independent of that giving rise to the modern Native Americans and Inuit.
Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes.
Despite an increase in the number of molecular epidemiological studies conducted in recent years to evaluate the association between human papillomavirus (HPV) and the risk of breast carcinoma, these studies remain inconclusive. Here we aim to detect HPV DNA in various tissues from patients with breast carcinoma using the method of HPV capture combined with massive paralleled sequencing (MPS). To validate the confidence of our methods, 15 cervical cancer samples were tested by PCR and the new method. Results showed that there was 100% consistence between the two methods.DNA from peripheral blood, tumor tissue, adjacent lymph nodes and adjacent normal tissue were collected from seven malignant breast cancer patients, and HPV type 16(HPV16) was detected in 1/7, 1/7, 1/7and 1/7 of patients respectively. Peripheral blood, tumor tissue and adjacent normal tissue were also collected from two patients with benign breast tumor, and 1/2, 2/2 and 2/2 was detected to have HPV16 DNA respectively. MPS metrics including mapping ratio, coverage, depth and SNVs were provided to characterize HPV in samples. The average coverage was 69% and 61.2% for malignant and benign samples respectively. 126 SNVs were identified in all 9 samples. The maximum number of SNVs was located in the gene of E2 and E4 among all samples. Our study not only provided an efficient method to capture HPV DNA, but detected the SNVS, coverage, SNV type and depth. The finding has provided further clue of association between HPV16 and breast cancer.
Copy number variations (CNVs) represent an important type of genetic variation that deeply impact phenotypic polymorphisms and human diseases. The advent of high-throughput sequencing technologies provides an opportunity to revolutionize the discovery of CNVs and to explore their relationship with diseases. However, most of the existing methods depend on sequencing depth and show instability with low sequence coverage. In this study, using low coverage whole-genome sequencing (LCS) we have developed an effective population-scale CNV calling (PSCC) method.
In our novel method, two-step correction was used to remove biases caused by local GC content and complex genomic characteristics. We chose a binary segmentation method to locate CNV segments and designed combined statistics tests to ensure the stable performance of the false positive control. The simulation data showed that our PSCC method could achieve 99.7%/100% and 98.6%/100% sensitivity and specificity for over 300 kb CNV calling in the condition of LCS (∼2×) and ultra LCS (∼0.2×), respectively. Finally, we applied this novel method to analyze 34 clinical samples with an average of 2× LCS. In the final results, all the 31 pathogenic CNVs identified by aCGH were successfully detected. In addition, the performance comparison revealed that our method had significant advantages over existing methods using ultra LCS.
Our study showed that PSCC can sensitively and reliably detect CNVs using low coverage or even ultra-low coverage data through population-scale sequencing.
To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million nonredundant microbial genes, derived from 576.7 Gb sequence, from faecal samples of 124 European individuals. The gene set, ~150 times larger than the human gene complement, contains an overwhelming majority of the prevalent microbial genes of the cohort and likely includes a large proportion of the prevalent human intestinal microbial genes. The genes are largely shared among individuals of the cohort. Over 99% of the genes are bacterial, suggesting that the entire cohort harbours between 1000 and 1150 prevalent bacterial species and each individual at least 160 such species, which are also largely shared. We define and describe the minimal gut metagenome and the minimal gut bacterial genome in terms of functions encoded by the gene set.
Residents of the Tibetan Plateau show heritable adaptations to extreme altitude. We sequenced 50 exomes of ethnic Tibetans, encompassing coding sequences of 92% of human genes, with an average coverage of 18X per individual. Genes showing population-specific allele frequency changes, which represent strong candidates for altitude adaptation, were identified. The strongest signal of natural selection came from EPAS1, a transcription factor involved in response to hypoxia. One SNP at EPAS1 shows a 78% frequency difference between Tibetan and Han samples, representing the fastest allele frequency change observed at any human gene to date. This SNP’s association with erythrocyte abundance supports the role of EPAS1 in adaptation to hypoxia. Thus, a population genomic survey has revealed a functionally important locus in genetic adaptation to high altitude.
The applications of massively parallel sequencing technology to fetal cell-free DNA (cff-DNA) have brought new insight to non-invasive prenatal diagnosis. However, most previous research based on maternal plasma sequencing has been restricted to fetal aneuploidies. To detect specific parentally inherited mutations, invasive approaches to obtain fetal DNA are the current standard in the clinic because of the experimental complexity and resource consumption of previously reported non-invasive approaches.
Here, we present a simple and effective non-invasive method for accurate fetal genome recovery-assisted with parental haplotypes. The parental haplotype were firstly inferred using a combination strategy of trio and unrelated individuals. Assisted with the parental haplotype, we then employed a hidden Markov model to non-invasively recover the fetal genome through maternal plasma sequencing.
Using a sequence depth of approximately 44X against a an approximate 5.69% cff-DNA concentration, we non-invasively inferred fetal genotype and haplotype under different situations of parental heterozygosity. Our data show that 98.57%, 95.37%, and 98.45% of paternal autosome alleles, maternal autosome alleles, and maternal chromosome X in the fetal haplotypes, respectively, were recovered accurately. Additionally, we obtained efficient coverage or strong linkage of 96.65% of reported Mendelian-disorder genes and 98.90% of complex disease-associated markers.
Our method provides a useful strategy for non-invasive whole fetal genome recovery.
Copy number variations (CNVs), a common genomic mutation associated with various diseases, are important in research and clinical applications. Whole genome amplification (WGA) and massively parallel sequencing have been applied to single cell CNVs analysis, which provides new insight for the fields of biology and medicine. However, the WGA-induced bias significantly limits sensitivity and specificity for CNVs detection. Addressing these limitations, we developed a practical bioinformatic methodology for CNVs detection at the single cell level using low coverage massively parallel sequencing. This method consists of GC correction for WGA-induced bias removal, binary segmentation algorithm for locating CNVs breakpoints, and dynamic threshold determination for final signals filtering. Afterwards, we evaluated our method with seven test samples using low coverage sequencing (4∼9.5%). Four single-cell samples from peripheral blood, whose karyotypes were confirmed by whole genome sequencing analysis, were acquired. Three other test samples derived from blastocysts whose karyotypes were confirmed by SNP-array analysis were also recruited. The detection results for CNVs of larger than 1 Mb were highly consistent with confirmed results reaching 99.63% sensitivity and 97.71% specificity at base-pair level. Our study demonstrates the potential to overcome WGA-bias and to detect CNVs (>1 Mb) at the single cell level through low coverage massively parallel sequencing. It highlights the potential for CNVs research on single cells or limited DNA samples and may prove as a promising tool for research and clinical applications, such as pre-implantation genetic diagnosis/screening, fetal nucleated red blood cells research and cancer heterogeneity analysis.
Reduced representation bisulfite sequencing (RRBS) was developed to measure DNA methylation of high-CG regions at single base-pair resolution, and has been widely used because of its minimal DNA requirements and cost efficacy; however, the CpG coverage of genomic regions is restricted and important regions with low-CG will be ignored in DNA methylation profiling. This method could be improved to generate a more comprehensive representation.
Based on in silico simulation of enzyme digestion of human and mouse genomes, we have optimized the current single-enzyme RRBS by applying double enzyme digestion in the library construction to interrogate more representative regions. CpG coverage of genomic regions was considerably increased in both high-CG and low-CG regions using the double-enzyme RRBS method, leading to more accurate detection of their average methylation levels and identification of differential methylation regions between samples. We also applied this double-enzyme RRBS method to comprehensively analyze the CpG methylation profiles of two colorectal cancer cell lines.
The double-enzyme RRBS increases the CpG coverage of genomic regions considerably over the previous single-enzyme RRBS method, leading to more accurate detection of their average methylation levels. It will facilitate genome-wide DNA methylation studies in multiple and complex clinical samples.
Single-enzyme RRBS; Double-enzyme RRBS; DNA methylation; CpG coverage
Conventional prenatal screening tests, such as maternal serum tests and ultrasound scan, have limited resolution and accuracy.
We developed an advanced noninvasive prenatal diagnosis method based on massively parallel sequencing. The Noninvasive Fetal Trisomy (NIFTY) test, combines an optimized Student’s t-test with a locally weighted polynomial regression and binary hypotheses. We applied the NIFTY test to 903 pregnancies and compared the diagnostic results with those of full karyotyping.
16 of 16 trisomy 21, 12 of 12 trisomy 18, two of two trisomy 13, three of four 45, X, one of one XYY and two of two XXY abnormalities were correctly identified. But one false positive case of trisomy 18 and one false negative case of 45, X were observed. The test performed with 100% sensitivity and 99.9% specificity for autosomal aneuploidies and 85.7% sensitivity and 99.9% specificity for sex chromosomal aneuploidies. Compared with three previously reported z-score approaches with/without GC-bias removal and with internal control, the NIFTY test was more accurate and robust for the detection of both autosomal and sex chromosomal aneuploidies in fetuses.
Our study demonstrates a powerful and reliable methodology for noninvasive prenatal diagnosis.
Noninvasive Fetal Trisomy (NIFTY) test; Massively parallel sequencing; Autosomal aneuploidies; Sex chromosomal aneuploidies
It is evident that epigenetic factors, especially DNA methylation, play essential roles in obesity development. Using pig as a model, here we investigated the systematic association between DNA methylation and obesity. We sampled eight variant adipose and two distinct skeletal muscle tissues from three pig breeds living within comparable environments but displaying distinct fat level. We generated 1,381 gigabases (Gb) of sequence data from 180 methylated DNA immunoprecipitation (MeDIP) libraries, and provided a genome-wide DNA methylation map as well as a gene expression map for adipose and muscle studies. The analysis showed global similarity and difference among breeds, sexes and anatomic locations, and identified the differentially methylated regions (DMRs). The DMRs in promoters are highly associated with obesity development via expression repression of both known obesity-related genes and novel genes. This comprehensive map provides a solid basis for exploring epigenetic mechanisms of adipose deposition and muscle growth.
DNA methylation plays an essential role in regulating gene expression under a variety of conditions and it has therefore been hypothesized to underlie the transitions between life cycle stages in parasitic nematodes. So far, however, 5'-cytosine methylation has not been detected during any developmental stage of the nematode Caenorhabditis elegans. Given the new availability of high-resolution methylation detection methods, an investigation of life cycle methylation in a parasitic nematode can now be carried out.
Here, using MethylC-seq, we present the first study to confirm the existence of DNA methylation in the parasitic nematode Trichinella spiralis, and we characterize the methylomes of the three life-cycle stages of this food-borne infectious human pathogen. We observe a drastic increase in DNA methylation during the transition from the new born to mature stage, and we further identify parasitism-related genes that show changes in DNA methylation status between life cycle stages.
Our data contribute to the understanding of the developmental changes that occur in an important human parasite, and raises the possibility that targeting DNA methylation processes may be a useful strategy in developing therapeutics to impede infection. In addition, our conclusion that DNA methylation is a mechanism for life cycle transition in T. spiralis prompts the question of whether this may also be the case in any other metazoans. Finally, our work constitutes the first report, to our knowledge, of DNA methylation in a nematode, prompting a re-evaluation of phyla in which this epigenetic mark was thought to be absent.
The relatively short read lengths from next generation sequencing (NGS) technologies still pose a challenge for de novo assembly of complex mammal genomes. One important solution is to use paired-end (PE) sequence information experimentally obtained from long-range DNA fragments (>1 kb). Here, we characterize and extend a long-range PE library construction method based on direct intra-molecule ligation (or molecular linker-free circularization) for NGS.
We found that the method performs stably for PE sequencing of 2- to 5- kb DNA fragments, and can be extended to 10–20 kb (and even in extremes, up to ∼35 kb). We also characterized the impact of low quality input DNA on the method, and develop a whole-genome amplification (WGA) based protocol using limited input DNA (<1 µg). Using this PE dataset, we accurately assembled the YanHuang (YH) genome, the first sequenced Asian genome, into a scaffold N50 size of >2 Mb, which is over100-times greater than the initial size produced with only small insert PE reads(17 kb). In addition, we mapped two 7- to 8- kb insertions in the YH genome using the larger insert sizes of the long-range PE data.
In conclusion, we demonstrate here the effectiveness of this long-range PE sequencing method and its use for the de novo assembly of a large, complex genome using NGS short reads.
Cancers arise through an evolutionary process in which cell populations are subjected to selection; however, to date, the process of bladder cancer, which is one of the most common cancers in the world, remains unknown at a single-cell level.
We carried out single-cell exome sequencing of 66 individual tumor cells from a muscle-invasive bladder transitional cell carcinoma (TCC). Analyses of the somatic mutant allele frequency spectrum and clonal structure revealed that the tumor cells were derived from a single ancestral cell, but that subsequent evolution occurred, leading to two distinct tumor cell subpopulations. By analyzing recurrently mutant genes in an additional cohort of 99 TCC tumors, we identified genes that might play roles in the maintenance of the ancestral clone and in the muscle-invasive capability of subclones of this bladder cancer, respectively.
This work provides a new approach of investigating the genetic details of bladder tumoral changes at the single-cell level and a new method for assessing bladder cancer evolution at a cell-population level.
Single-cell exome sequencing; Bladder cancer; Tumor evolution; Population genetics
Fetal chromosomal abnormalities are the most common reasons for invasive prenatal testing. Currently, G-band karyotyping and several molecular genetic methods have been established for diagnosis of chromosomal abnormalities. Although these testing methods are highly reliable, the major limitation remains restricted resolutions or can only achieve limited coverage on the human genome at one time. The massively parallel sequencing (MPS) technologies which can reach single base pair resolution allows detection of genome-wide intragenic deletions and duplication challenging karyotyping and microarrays as the tool for prenatal diagnosis. Here we reported a novel and robust MPS-based method to detect aneuploidy and imbalanced chromosomal arrangements in amniotic fluid (AF) samples. We sequenced 62 AF samples on Illumina GAIIx platform and with averagely 0.01× whole genome sequencing data we detected 13 samples with numerical chromosomal abnormalities by z-test. With up to 2× whole genome sequencing data we were able to detect microdeletion/microduplication (ranged from 1.4 Mb to 37.3 Mb of 5 samples from chorionic villus sampling (CVS) using SeqSeq algorithm. Our work demonstrated MPS is a robust and accurate approach to detect aneuploidy and imbalanced chromosomal arrangements in prenatal samples.
Restriction Enzyme-based Reduced Representation Library (RRL) method represents a relatively feasible and flexible strategy used for Single Nucleotide Polymorphism (SNP) identification in different species. It has remarkable advantage of reducing the complexity of the genome by orders of magnitude. However, comprehensive evaluation for actual efficacy of SNP identification by this method is still unavailable.
In order to evaluate the efficacy of Restriction Enzyme-based RRL method, we selected Tsp 45I enzyme which covers 266 Mb flanking region of the enzyme recognition site according to in silico simulation on human reference genome, then we sequenced YH RRL after Tsp 45I treatment and obtained reads of which 80.8% were mapped to target region with an 20-fold average coverage, about 96.8% of target region was covered by at least one read and 257 K SNPs were identified in the region using SOAPsnp software.
Compared with whole genome resequencing data, we observed false discovery rate (FDR) of 13.95% and false negative rate (FNR) of 25.90%. The concordance rate of homozygote loci was over 99.8%, but that of heterozygote were only 92.56%. Repeat sequences and bases quality were proved to have a great effect on the accuracy of SNP calling, SNPs in recognition sites contributed evidently to the high FNR and the low concordance rate of heterozygote. Our results indicated that repeat masking and high stringent filter criteria could significantly decrease both FDR and FNR.
This study demonstrates that Restriction Enzyme-based RRL method was effective for SNP identification. The results highlight the important role of bias and the method-derived defects represented in this method and emphasize the special attentions noteworthy.
DNA methylation plays important roles in gene regulation during both normal developmental and disease states. In the past decade, a number of methods have been developed and applied to characterize the genome-wide distribution of DNA methylation. Most of these methods endeavored to screen whole genome and turned to be enormously costly and time consuming for studies of the complex mammalian genome. Thus, they are not practical for researchers to study multiple clinical samples in biomarker research.
Here, we display a novel strategy that relies on the selective capture of target regions by liquid hybridization followed by bisulfite conversion and deep sequencing, which is referred to as liquid hybridization capture-based bisulfite sequencing (LHC-BS). To estimate this method, we utilized about 2 μg of native genomic DNA from YanHuang (YH) whole blood samples and a mature dendritic cell (mDC) line, respectively, to evaluate their methylation statuses of target regions of exome. The results indicated that the LHC-BS system was able to cover more than 97% of the exome regions and detect their methylation statuses with acceptable allele dropouts. Most of the regions that couldn't provide accurate methylation information were distributed in chromosomes 6 and Y because of multiple mapping to those regions. The accuracy of this strategy was evaluated by pair-wise comparisons using the results from whole genome bisulfite sequencing and validated by bisulfite specific PCR sequencing.
In the present study, we employed a liquid hybridisation capture system to enrich for exon regions and then combined with bisulfite sequencing to examine the methylation statuses for the first time. This technique is highly sensitive and flexible and can be applied to identify differentially methylated regions (DMRs) at specific genomic locations of interest, such as regulatory elements or promoters.
Exome sequencing, which allows the global analysis of protein coding sequences in the human genome, has become an effective and affordable approach to detecting causative genetic mutations in diseases. Currently, there are several commercial human exome capture platforms; however, the relative performances of these have not been characterized sufficiently to know which is best for a particular study.
We comprehensively compared three platforms: NimbleGen's Sequence Capture Array and SeqCap EZ, and Agilent's SureSelect. We assessed their performance in a variety of ways, including number of genes covered and capture efficacy. Differences that may impact on the choice of platform were that Agilent SureSelect covered approximately 1,100 more genes, while NimbleGen provided better flanking sequence capture. Although all three platforms achieved similar capture specificity of targeted regions, the NimbleGen platforms showed better uniformity of coverage and greater genotype sensitivity at 30- to 100-fold sequencing depth. All three platforms showed similar power in exome SNP calling, including medically relevant SNPs. Compared with genotyping and whole-genome sequencing data, the three platforms achieved a similar accuracy of genotype assignment and SNP detection. Importantly, all three platforms showed similar levels of reproducibility, GC bias and reference allele bias.
We demonstrate key differences between the three platforms, particularly advantages of solutions over array capture and the importance of a large gene target set.
Important biological and pathological properties are often conserved across species. Although several mouse leukemia models have been well established, the genes deregulated in both human and murine leukemia cells have not been studied systematically. We performed a serial analysis of gene expression (SAGE) analysis on gene expression in both human and murine MLL-ELL or MLL-ENL leukemia cells, and identified 88 genes that appeared to be significantly deregulated in both types of leukemia cells, including 57 genes not reported previously as being deregulated in MLL-associated leukemias. These changes were validated by quantitative PCR. The most up-regulated genes include several HOX genes (e.g., HOX A5, HOXA9 and HOXA10) and MEIS1 that are the typical hallmark of MLL-rearrangement leukemia. The most down-regulated genes include LTF, LCN2, MMP9, S100A8, S100A9, PADI4, TGFBI and CYBB. Notably, the up-regulated genes are enriched in Gene Ontology terms such as “gene expression” and “transcription”, whereas the down-regulated genes are enriched in “signal transduction” and “apoptosis”. We showed that the CpG islands of the down-regulated genes are hypermethylated. We also showed that seven individual microRNAs from the mir-17-92 cluster, which are known to be overexpressed in human MLL-rearrangement leukemias, are also consistently overexpressed in mouse MLL-rearrangement leukemia cells. Nineteen possible targets of these microRNAs were identified and two of them (i.e., APP and RASSF2) were confirmed further by luciferase reporter and mutagenesis assays. The identification and validation of consistent changes of gene expression in human and murine MLL-rearrangement leukemias provides important insights into the genetic base for MLL-associated leukemogenesis.
MLL-rearrangement leukemia; evolutionarily conservation; gene expression; gene ontology; DNA methylation
Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual's genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics.
DNA methylation is a widely studied epigenetic mechanism known to correlate with gene repression and genomic stability. Development of sensitive methods for global detection of DNA methylation events is of particular importance.
We here describe a technique, called modified methylation-specific digital karyotyping (MMSDK) based on methylation-specific digital karyotyping (MSDK) with a novel sequencing approach. Briefly, after a tandem digestion of genomic DNA with a methylation-sensitive mapping enzyme and a fragmenting enzyme, short sequence tags are obtained. These tags are amplified, followed by direct, massively parallel sequencing (Solexa 1G Genome Analyzer). This method allows high-throughput and low-cost genome-wide DNA methylation mapping. We applied this method to investigate global DNA methylation profiles for widely used breast cancer cell lines, MCF-7 and MDA-MB-231, which are representatives for luminal-like and mesenchymal-like cancer types, respectively. By comparison, a highly similar overall DNA methylation pattern was revealed for the two cell lines. However a cohort of individual genomic loci with significantly different DNA methylation status between two cell lines was identified. Furthermore, we revealed a genome-wide significant correlation between gene expression and the methylation status of gene promoters with CpG islands (CGIs) in the two cancer cell lines, and a correlation of gene expression and the methylation status of promoters without CGIs in MCF-7 cells.
The MMSDK method will be a valuable tool to increase the current knowledge of genome wide DNA methylation profiles.
The YH database is a server that allows the user to easily browse and download data from the first Asian diploid genome. The aim of this platform is to facilitate the study of this Asian genome and to enable improved organization and presentation large-scale personal genome data. Powered by GBrowse, we illustrate here the genome sequences, SNPs, and sequencing reads in the MapView. The relationships between phenotype and genotype can be searched by location, dbSNP ID, HGMD ID, gene symbol and disease name. A BLAST web service is also provided for the purpose of aligning query sequence against YH genome consensus. The YH database is currently one of the three personal genome database, organizing the original data and analysis results in a user-friendly interface, which is an endeavor to achieve fundamental goals for establishing personal medicine. The database is available at http://yh.genomics.org.cn.
Previous results have suggested an association of the region of 19q13.3 with several forms of cancer. In the present study, we investigated 27 public markers within a previously identified 69 kb stretch of chromosome 19q for association with breast cancer by using linkage disequilibrium mapping. The study groups included 434 postmenopausal breast cancer cases and an identical number of individually matched controls.
Methods and Results
Studying one marker at a time, we found a region spanning the gene RAI (alias PPP1R13L or iASPP) and the 5' portion of XPD to be associated with this cancer. The region corresponds to a haplotype block, in which there seems to be very limited recombination in the Danish population. Studying combinations of markers, we found that two to four neighboring markers gave the most consistent and strongest result. The haplotypes with strongest association with cancers were located in the gene RAI and just 3' to the gene. Coinciding peaks were seen in the region of RAI in groups of women of different age.
In a follow-up to these results we sequenced 10 cases and 10 controls in a 44 kb region spanning the peaks of association. This revealed 106 polymorphisms, many of which were not in the public databases. We tested an additional 44 of these for association with disease and found a new tandem repeat marker, called RAI-3'd1, located downstream of the transcribed region of RAI, which was more strongly associated with breast cancer than any other marker we have tested (RR = 2.44 (1.41–4.23, p = 0.0008, all cases; RR = 6.29 (1.49–26.6), p = 0.01, cases up to 55 years of age).
We expect the marker RAI-3'd1 to be (part of) the cause for the association of the chromosome 19q13.3 region's association with cancer.