1.  A Genome-Wide Screen for Promoter Methylation in Lung Cancer Identifies Novel Methylation Markers for Multiple Malignancies  
PLoS Medicine  2006;3(12):e486.
Promoter hypermethylation coupled with loss of heterozygosity at the same locus results in loss of gene function in many tumor cells. The “rules” governing which genes are methylated during the pathogenesis of individual cancers, how specific methylation profiles are initially established, or what determines tumor type-specific methylation are unknown. However, DNA methylation markers that are highly specific and sensitive for common tumors would be useful for the early detection of cancer, and those required for the malignant phenotype would identify pathways important as therapeutic targets.
Methods and Findings
In an effort to identify new cancer-specific methylation markers, we employed a high-throughput global expression profiling approach in lung cancer cells. We identified 132 genes that have 5′ CpG islands, are induced from undetectable levels by 5-aza-2′-deoxycytidine in multiple non-small cell lung cancer cell lines, and are expressed in immortalized human bronchial epithelial cells. As expected, these genes were also expressed in normal lung, but often not in companion primary lung cancers. Methylation analysis of a subset (45/132) of these promoter regions in primary lung cancer (n = 20) and adjacent nonmalignant tissue (n = 20) showed that 31 genes had acquired methylation in the tumors, but did not show methylation in normal lung or peripheral blood cells. We studied the eight most frequently and specifically methylated genes from our lung cancer dataset in breast cancer (n = 37), colon cancer (n = 24), and prostate cancer (n = 24) along with counterpart nonmalignant tissues. We found that seven loci were frequently methylated in both breast and lung cancers, with four showing extensive methylation in all four epithelial tumors.
By using a systematic biological screen we identified multiple genes that are methylated with high penetrance in primary lung, breast, colon, and prostate cancers. The cross-tumor methylation pattern we observed for these novel markers suggests that we have identified a partial promoter hypermethylation signature for these common malignancies. These data suggest that while tumors in different tissues vary substantially with respect to gene expression, there may be commonalities in their promoter methylation profiles that represent targets for early detection screening or therapeutic intervention.
John Minna and colleagues report that a group of genes are commonly methylated in primary lung, breast, colon, and prostate cancer.
Editors' Summary
Tumors or cancers contain cells that have lost many of the control mechanisms that normally regulate their behavior. Unlike normal cells, which only divide to repair damaged tissues, cancer cells divide uncontrollably. They also gain the ability to move round the body and start metastases in secondary locations. These changes in behavior result from alterations in their genetic material. For example, mutations (permanent changes in the sequence of nucleotides in the cell's DNA) in genes known as oncogenes stimulate cells to divide constantly. Mutations in another group of genes—tumor suppressor genes—disable their ability to restrain cell growth. Key tumor suppressor genes are often completely lost in cancer cells. But not all the genetic changes in cancer cells are mutations. Some are “epigenetic” changes—chemical modifications of genes that affect the amount of protein made from them. In cancer cells, methyl groups are often added to CG-rich regions—this is called hypermethylation. These “CpG islands” lie near gene promoters—sequences that control the transcription of DNA into RNA, the template for protein production—and their methylation switches off the promoter. Methylation of the promoter of one copy of a tumor suppressor gene, which often coincides with the loss of the other copy of the gene, is thought to be involved in cancer development.
Why Was This Study Done?
The rules that govern which genes are hypermethylated during the development of different cancer types are not known, but it would be useful to identify any DNA methylation events that occur regularly in common cancers for two reasons. First, specific DNA methylation markers might be useful for the early detection of cancer. Second, identifying these epigenetic changes might reveal cellular pathways that are changed during cancer development and so identify new therapeutic targets. In this study, the researchers have used a systematic biological screen to identify genes that are methylated in many lung, breast, colon, and prostate cancers—all cancers that form in “epithelial” tissues.
What Did the Researchers Do and Find?
The researchers used microarray expression profiling to examine gene expression patterns in several lung cancer and normal lung cell lines. In this technique, labeled RNA molecules isolated from cells are applied to a “chip” carrying an array of gene fragments. Here, they stick to the fragment that represents the gene from which they were made, which allows the genes that the cells express to be catalogued. By comparing the expression profiles of lung cancer cells and normal lung cells before and after treatment with a chemical that inhibits DNA methylation, the researchers identified genes that were methylated in the cancer cells—that is, genes that were expressed in normal cells but not in cancer cells unless methylation was inhibited. 132 of these genes contained CpG islands. The researchers examined the promoters of 45 of these genes in lung cancer cells taken straight from patients and found that 31 of the promoters were methylated in tumor tissues but not in adjacent normal tissues. Finally, the researchers looked at promoter methylation of the eight genes most frequently and specifically methylated in the lung cancer samples in breast, colon, and prostate cancers. Seven of the genes were frequently methylated in both lung and breast cancers; four were extensively methylated in all the tumor types.
What Do These Findings Mean?
These results identify several new genes that are often methylated in four types of epithelial tumor. The observation that these genes are methylated in multiple independent tumors strongly suggests, but does not prove, that loss of expression of the proteins that they encode helps to convert normal cells into cancer cells. The frequency and diverse patterning of promoter methylation in different tumor types also indicates that methylation is not a random event, although what controls the patterns of methylation is not yet known. The identification of these genes is a step toward building a promoter hypermethylation profile for the early detection of human cancer. Furthermore, although tumors in different tissues vary greatly with respect to gene expression patterns, the similarities seen in this study in promoter methylation profiles might help to identify new therapeutic targets common to several cancer types.
Additional Information.
Please access these Web sites via the online version of this summary at
US National Cancer Institute, information for patients on understanding cancer
CancerQuest, information provided by Emory University about how cancer develops
Cancer Research UK, information for patients on cancer biology
Wikipedia pages on epigenetics (note that Wikipedia is a free online encyclopedia that anyone can edit)
The Epigenome Network of Excellence, background information and latest news about epigenetics
PMCID: PMC1716188  PMID: 17194187
2.  Convergence of Mutation and Epigenetic Alterations Identifies Common Genes in Cancer That Predict for Poor Prognosis  
PLoS Medicine  2008;5(5):e114.
The identification and characterization of tumor suppressor genes has enhanced our understanding of the biology of cancer and enabled the development of new diagnostic and therapeutic modalities. Whereas in past decades, a handful of tumor suppressors have been slowly identified using techniques such as linkage analysis, large-scale sequencing of the cancer genome has enabled the rapid identification of a large number of genes that are mutated in cancer. However, determining which of these many genes play key roles in cancer development has proven challenging. Specifically, recent sequencing of human breast and colon cancers has revealed a large number of somatic gene mutations, but virtually all are heterozygous, occur at low frequency, and are tumor-type specific. We hypothesize that key tumor suppressor genes in cancer may be subject to mutation or hypermethylation.
Methods and Findings
Here, we show that combined genetic and epigenetic analysis of these genes reveals many with a higher putative tumor suppressor status than would otherwise be appreciated. At least 36 of the 189 genes newly recognized to be mutated are targets of promoter CpG island hypermethylation, often in both colon and breast cancer cell lines. Analyses of primary tumors show that 18 of these genes are hypermethylated strictly in primary cancers and often with an incidence that is much higher than for the mutations and which is not restricted to a single tumor-type. In the identical breast cancer cell lines in which the mutations were identified, hypermethylation is usually, but not always, mutually exclusive from genetic changes for a given tumor, and there is a high incidence of concomitant loss of expression. Sixteen out of 18 (89%) of these genes map to loci deleted in human cancers. Lastly, and most importantly, the reduced expression of a subset of these genes strongly correlates with poor clinical outcome.
Using an unbiased genome-wide approach, our analysis has enabled the discovery of a number of clinically significant genes targeted by multiple modes of inactivation in breast and colon cancer. Importantly, we demonstrate that a subset of these genes predict strongly for poor clinical outcome. Our data define a set of genes that are targeted by both genetic and epigenetic events, predict for clinical prognosis, and are likely fundamentally important for cancer initiation or progression.
Stephen Baylin and colleagues show that a combined genetic and epigenetic analysis of breast and colon cancers identifies a number of clinically significant genes targeted by multiple modes of inactivation.
Editors' Summary
Cancer is one of the developed world's biggest killers—over half a million Americans die of cancer each year, for instance. As a result, there is great interest in understanding the genetic and environmental causes of cancer in order to improve cancer prevention, diagnosis, and treatment.
Cancer begins when cells begin to multiply out of control. DNA is the sequence of coded instructions—genes—for how to build and maintain the body. Certain “tumor suppressor” genes, for instance, help to prevent cancer by preventing tumors from developing, but changes that alter the DNA code sequence—mutations—can profoundly affect how a gene works. Modern techniques of genetic analysis have identified genes such as tumor suppressors that, when mutated, are linked to the development of certain cancers.
Why Was This Study Done?
However, in recent years, it has become increasingly apparent that mutations are neither necessary nor sufficient to explain every case of cancer. This has led researchers to look at so-called epigenetic factors, which also alter how a gene works without altering its DNA sequence. An example of this is “methylation,” which prevents a gene from being expressed—deactivates it—by a chemical tag. Methylation of genes is part of the normal functioning of DNA, but abnormal methylation has been linked with cancer, aging, and some rare birth abnormalities.
Previous analysis of DNA from breast and colon cancer cells had revealed 189 “candidate cancer genes”—mutated genes that were linked to the development of breast and colon cancer. However, it was not clear how those mutations gave rise to cancer, and individual mutations were present in only 5% to 15% of specific tumors. The authors of this study wanted to know whether epigenetic factors such as methylation contributed to causing the cancers.
What Did the Researchers Do and Find?
The researchers first identified 56 of the 189 candidate cancer genes as likely tumor suppressors and then determined that 36 of these genes were methylated and deactivated, often in both breast and colon (laboratory-grown) cancer cells. In nearly all cases, the methylated genes were not active but could be reactivated by being demethylated. They further showed that, in normal colon and breast tissue samples, 18 of the 36 genes were unmethylated and functioned normally, but in cells taken from breast and colon cancer tumors they were methylated.
In contrast to the genetic mutations, the 18 genes were frequently methylated across a range of tumor types, and eight genes were methylated in both the breast and colon cancers. The authors found by reviewing the genetics and epigenetics of those 18 genes in breast and colon cancer that they were either mutated, methylated, or both. A literature review showed that at least six of the 18 genes were known to have tumor suppressor properties, and the authors determined that 16 were located in parts of DNA known to be missing from cells taken from a range of cancer tumors.
Finally, the researchers analyzed data on cancer cases to show that methylation of these 18 genes was correlated with reduced function of these genes in tumors and with a greater likelihood that a cancer will be terminal or spread to other parts of the body.
What Do These Findings Mean?
The researchers considered only the 189 candidate cancer genes found in one previous study and not other genes identified elsewhere. They also did not consider the biological effects of the individual mutations found in those genes. Despite this, they have demonstrated that methylation of specific genes is likely to play a role in the development of breast and/or colon cancer cells either together with mutations or independently, most likely by turning off their tumor suppression function.
More broadly, however, the study adds to the evidence that future analysis of the role of genes in cancer should include epigenetic as well as genetic factors. In addition, the authors have also shown that a number of these genes may be useful for predicting clinical outcomes for a range of tumor types.
Additional Information.
Please access these Web sites via the online version of this summary at
A December 2006 PLoS Medicine Perspective article reviews the value of examining methylation as a factor in common cancers and its use for early detection
The Web site of the American Cancer Society has a wealth of information and resources on a variety of cancers, including breast and colon cancer is a nonprofit organization providing information about breast cancer on the Web, including research news
Cancer Research UK provides information on cancer research
The Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins publishes background information on the authors' research on methylation, setting out its potential for earlier diagnosis and better treatment of cancer
PMCID: PMC2429944  PMID: 18507500
3.  An integrative multi-dimensional genetic and epigenetic strategy to identify aberrant genes and pathways in cancer 
BMC Systems Biology  2010;4:67.
Genomics has substantially changed our approach to cancer research. Gene expression profiling, for example, has been utilized to delineate subtypes of cancer, and facilitated derivation of predictive and prognostic signatures. The emergence of technologies for the high resolution and genome-wide description of genetic and epigenetic features has enabled the identification of a multitude of causal DNA events in tumors. This has afforded the potential for large scale integration of genome and transcriptome data generated from a variety of technology platforms to acquire a better understanding of cancer.
Here we show how multi-dimensional genomics data analysis would enable the deciphering of mechanisms that disrupt regulatory/signaling cascades and downstream effects. Since not all gene expression changes observed in a tumor are causal to cancer development, we demonstrate an approach based on multiple concerted disruption (MCD) analysis of genes that facilitates the rational deduction of aberrant genes and pathways, which otherwise would be overlooked in single genomic dimension investigations.
Notably, this is the first comprehensive study of breast cancer cells by parallel integrative genome wide analyses of DNA copy number, LOH, and DNA methylation status to interpret changes in gene expression pattern. Our findings demonstrate the power of a multi-dimensional approach to elucidate events which would escape conventional single dimensional analysis and as such, reduce the cohort sample size for cancer gene discovery.
PMCID: PMC2880289  PMID: 20478067
4.  Clustering phenotype populations by genome-wide RNAi and multiparametric imaging 
How to predict gene function from phenotypic cues is a longstanding question in biology.Using quantitative multiparametric imaging, RNAi-mediated cell phenotypes were measured on a genome-wide scale.On the basis of phenotypic ‘neighbourhoods', we identified previously uncharacterized human genes as mediators of the DNA damage response pathway and the maintenance of genomic integrity.The phenotypic map is provided as an online resource at for discovering further functional relationships for a broad spectrum of biological module
Genetic screens for phenotypic similarity have made key contributions for associating genes with biological processes. Aggregating genes by similarity of their loss-of-function phenotype has provided insights into signalling pathways that have a conserved function from Drosophila to human (Nusslein-Volhard and Wieschaus, 1980; Bier, 2005). Complex visual phenotypes, such as defects in pattern formation during development, greatly facilitated the classification of genes into pathways, and phenotypic similarities in many cases predicted molecular relationships. With RNA interference (RNAi), highly parallel phenotyping of loss-of-function effects in cultured cells has become feasible in many organisms whose genome have been sequenced (Boutros and Ahringer, 2008). One of the current challenges is the computational categorization of visual phenotypes and the prediction of gene function and associated biological processes. With large parts of the genome still being in unchartered territory, deriving functional information from large-scale phenotype analysis promises to uncover novel gene–gene relationships and to generate functional maps to explore cellular processes.
In this study, we developed an automated approach using RNAi-mediated cell phenotypes, multiparametric imaging and computational modelling to obtain functional information on previously uncharacterized genes. To generate broad, computer-readable phenotypic signatures, we measured the effect of RNAi-mediated knockdowns on changes of cell morphology in human cells on a genome-wide scale. First, the several million cells were stained for nuclear and cytoskeletal markers and then imaged using automated microscopy. On the basis of fluorescent markers, we established an automated image analysis to classify individual cells (Figure 1A). After cell segmentation for determining nuclei and cell boundaries (Figure 1C), we computed 51 cell descriptors that quantified intensities, shape characteristics and texture (Figure 1F). Individual cells were categorized into 1 of 10 classes, which included cells showing protrusion/elongation, cells in metaphase, large cells, condensed cells, cells with lamellipodia and cellular debris (Figure 1D and E). Each siRNA knockdown was summarized by a phenotypic profile and differences between RNAi knockdowns were quantified by the similarity between phenotypic profiles. We termed the vector of scores a phenoprint (Figure 3C) and defined the phenotypic distance between a pair of perturbations as the distance between their corresponding phenoprints.
To visualize the distribution of all phenoprints, we plotted them in a genome-wide map as a two-dimensional representation of the phenotypic similarity relationships (Figure 3A). The complete data set and an interactive version of the phenotypic map are available at The map identified phenotypic ‘neighbourhoods', which are characterized by cells with lamellipodia (WNK3, ANXA4), cells with prominent actin fibres (ODF2, SOD3), abundance of large cells (CA14), many elongated cells (SH2B2, ELMO2), decrease in cell number (TPX2, COPB1, COPA), increase in number of cells in metaphase (BLR1, CIB2) and combinations of phenotypes such as presence of large cells with protrusions and bright nuclei (PTPRZ1, RRM1; Figure 3B).
To test whether phenotypic similarity might serve as a predictor of gene function, we focused our further analysis on two clusters that contained genes associated with the DNA damage response (DDR) and genomic integrity (Figure 3A and C). The first phenotypic cluster included proteins with kinetochore-associated functions such as NUF2 (Figure 3B) and SGOL1. It also contained the centrosomal protein CEP164 that has been described as an important mediator of the DNA damage-activated signalling cascade (Sivasubramaniam et al, 2008) and the largely uncharacterized genes DONSON and SON. A second phenotypically distinct cluster included previously described components of the DDR pathway such as RRM1 (Figure 3A–C), CLSPN, PRIM2 and SETD8. Furthermore, this cluster contained the poorly characterized genes CADM1 and CD3EAP.
Cells activate a signalling cascade in response to DNA damage induced by exogenous and endogenous factors. Central are the kinases ATM and ATR as they serve as sensors of DNA damage and activators of further downstream kinases (Harper and Elledge, 2007; Cimprich and Cortez, 2008). To investigate whether DONSON, SON, CADM1 and CD3EAP, which were found in phenotypic ‘neighbourhoods' to known DDR components, have a role in the DNA damage signalling pathway, we tested the effect of their depletion on the DDR on γ irradiation. As indicated by reduced CHEK1 phosphorylation, siRNA knock down of DONSON, SON, CD3EAP or CADM1 resulted in impaired DDR signalling on γ irradiation. Furthermore, knock down of DONSON or SON reduced phosphorylation of downstream effectors such as NBS1, CHEK1 and the histone variant H2AX on UVC irradiation. DONSON depletion also impaired recruitment of RPA2 onto chromatin and SON knockdown reduced RPA2 phosphorylation indicating that DONSON and SON presumably act downstream of the activation of ATM. In agreement to their phenotypic profile, these results suggest that DONSON, SON, CADM1 and CD3EAP are important mediators of the DDR. Further experiments demonstrated that they are also required for the maintenance of genomic integrity.
In summary, we show that genes with similar phenotypic profiles tend to share similar functions. The power of our computational and experimental approach is demonstrated by the identification of novel signalling regulators whose phenotypic profiles were found in proximity to known biological modules. Therefore, we believe that such phenotypic maps can serve as a resource for functional discovery and characterization of unknown genes. Furthermore, such approaches are also applicable for other perturbation reagents, such as small molecules in drug discovery and development. One could also envision combined maps that contain both siRNAs and small molecules to predict target–small molecule relationships and potential side effects.
Genetic screens for phenotypic similarity have made key contributions to associating genes with biological processes. With RNA interference (RNAi), highly parallel phenotyping of loss-of-function effects in cells has become feasible. One of the current challenges however is the computational categorization of visual phenotypes and the prediction of biological function and processes. In this study, we describe a combined computational and experimental approach to discover novel gene functions and explore functional relationships. We performed a genome-wide RNAi screen in human cells and used quantitative descriptors derived from high-throughput imaging to generate multiparametric phenotypic profiles. We show that profiles predicted functions of genes by phenotypic similarity. Specifically, we examined several candidates including the largely uncharacterized gene DONSON, which shared phenotype similarity with known factors of DNA damage response (DDR) and genomic integrity. Experimental evidence supports that DONSON is a novel centrosomal protein required for DDR signalling and genomic integrity. Multiparametric phenotyping by automated imaging and computational annotation is a powerful method for functional discovery and mapping the landscape of phenotypic responses to cellular perturbations.
PMCID: PMC2913390  PMID: 20531400
DNA damage response signalling; massively parallel phenotyping; phenotype networks; RNAi screening
5.  Regulatory and metabolic rewiring during laboratory evolution of ethanol tolerance in E. coli 
We have designed an experimental/computational framework for studying complex phenotypes in bacteria.Our framework relies on whole-genome fitness profiling coupled with a module-level analysis to discover pathways that directly affect fitness.As a proof-of-principle, we studied ethanol tolerance in Escherichia coli and we identified key pathways that contribute to this phenotype.We then validated our findings through genetic manipulations, gene-expression profiling, metabolite-level measurements, and stable-isotope labeling.
Elucidating the genetic basis of complex phenotypes remains a fundamental challenge in biology. We have developed a systematic framework for comprehensive genetic analysis of microbial phenotypes. Our approach combines the power of fitness profiling (Girgis et al, 2007; Amini et al, 2009) with the sensitivity of module-level analysis (Goodarzi et al, 2009a) to identify key genetic modules that directly affect a phenotype under study. We applied our technology to ethanol tolerance, a complex phenotype with broad industrial relevance. Ethanol affects a variety of cellular components and pathways, including but not limited to membrane integrity (Dombek and Ingram, 1984), enzyme activities (Millar et al, 1982), and proton flux (D'Amore et al, 1990). Given the diversity of targets, the emergence of ethanol tolerance requires modifications to multiple pathway (D'Amore and Stewart, 1987).
To reveal the genetic basis of ethanol tolerance in Escherichia coli, we used two high-coverage mutant libraries (a transposon library and an overexpression library) to assess the fitness consequences of single-locus perturbations. Each cell in our transposon library contains a random transposon insertion in its genome (Girgis et al, 2007); whereas the cells in the overexpression library carry 1–3 kb genomic fragments cloned into a cloning vector (Amini et al, 2009). We grew these libraries under mild (4% v/v) and harsh (5.5% v/v) ethanol concentrations. On growth, the abundance of each transposon insertion or overexpression mutant changes as a function of its fitness, a process that can be monitored through parallel genetic footprinting and microarray hybridization (Figure 1A). This results in a global fitness profile, where the contribution of each genetic locus to ethanol tolerance can be quantified in parallel. However, in the context of ethanol tolerance and other complex phenotypes, single-locus perturbations typically result in modest changes in fitness. Although these small differences can be amplified through multiple rounds of selection, the number of generations is limited as spontaneous beneficial mutations emerge in the population and cause strong biases in the resulting fitness profiles. To boost our analytical power without introducing these biases in the data, we used a module-level computational method to discover the pathways and components that are strongly associated with the data as opposed to focusing on the genes individually (Goodarzi et al, 2009a). Genes function in the context of pathways and modules and module-level analyses increase statistical power through combining information from multiple genes functioning as part of a given pathway (Subramanian et al, 2005).
The module-level analysis of the fitness scores from both libraries revealed a diverse set of pathways that have a direct function in ethanol tolerance. Some of these pathways, including heat-shock stress response and osmoregulation, are known modifiers of ethanol tolerance; whereas others such as acid-stress response and fimbrial structures are novel pathways. Among our findings was the important function of three regulatory proteins: FNR, ArcA, and CafA. Knocking out FNR/ArcA that upregulates aerobic respiration proteins and TCA cycle components results in a marked increase in ethanol tolerance. Similarly, knocking out CafA, a post-transcriptional regulator of alcohol dehydrogenase, is beneficial for tolerance. Given these observations, we hypothesized that selection for ethanol tolerance can result in higher ethanol degradation.
As a large fraction of discovered pathways belonged to central metabolism, we used metabolomics to evaluate our findings. To directly assess the metabolic consequences of adaptation to ethanol, we evolved ethanol-tolerant strains in minimal media plus glucose for ∼30 and 160 generations. We then compared the steady-state level of metabolites in these strains to that of the wild type (Figure 1B). In agreement with our fitness profiling results, we observed a significant increase in TCA cycle metabolites in one of our ethanol-tolerant strains. Higher concentrations of TCA cycle components along with a high free coenzyme A (CoA) to acetyl-coenzyme A (acetyl-CoA) ratio hinted at the capacity of this strain to metabolize ethanol. To test this hypothesis, we performed stable-isotope labeling on our ethanol-tolerant strain versus wild type. After growth on labeled ethanol, we measured the fraction of metabolites that were labeled at each timepoint (Figure 1B). Our results confirmed that the ethanol-tolerant strain has the capacity to consume ethanol through its conversion into acetyl-CoA and further assimilation in the TCA cycle.
By using a variety of systems-level approaches, we have been able to genetically dissect ethanol tolerance in E. coli. We have shown that fitness profiling, in combination with module-level analysis tools, can serve as a powerful approach for revealing the genetic basis of complex phenotypes. The fact that laboratory evolution ended up using the very modules that we discovered, highlights the biological and adaptive relevance of the proposed framework.
Understanding the genetic basis of adaptation is a central problem in biology. However, revealing the underlying molecular mechanisms has been challenging as changes in fitness may result from perturbations to many pathways, any of which may contribute relatively little. We have developed a combined experimental/computational framework to address this problem and used it to understand the genetic basis of ethanol tolerance in Escherichia coli. We used fitness profiling to measure the consequences of single-locus perturbations in the context of ethanol exposure. A module-level computational analysis was then used to reveal the organization of the contributing loci into cellular processes and regulatory pathways (e.g. osmoregulation and cell-wall biogenesis) whose modifications significantly affect ethanol tolerance. Strikingly, we discovered that a dominant component of adaptation involves metabolic rewiring that boosts intracellular ethanol degradation and assimilation. Through phenotypic and metabolomic analysis of laboratory-evolved ethanol-tolerant strains, we investigated naturally accessible pathways of ethanol tolerance. Remarkably, these laboratory-evolved strains, by and large, follow the same adaptive paths as inferred from our coarse-grained search of the fitness landscape.
PMCID: PMC2913397  PMID: 20531407
adaptation; ethanol tolerance; evolution; fitness profiling
6.  The breast cancer somatic 'muta-ome': tackling the complexity 
Acquired somatic mutations are responsible for approximately 90% of breast tumours. However, only one somatic aberration, amplification of the HER2 locus, is currently used to define a clinical subtype, one that accounts for approximately 10% to 15% of breast tumours. In recent years, a number of mutational profiling studies have attempted to further identify clinically relevant mutations. While these studies have confirmed the oncogenic or tumour suppressor role of many known suspects, they have exposed complexity as a main feature of the breast cancer mutational landscape (the 'muta-ome'). The two defining features of this complexity are (a) a surprising richness of low-frequency mutants contrasting with the relative rarity of high-frequency events and (b) the relatively large number of somatic genomic aberrations (approximately 20 to 50) driving an average tumour. Structural features of this complex landscape have begun to emerge from follow-up studies that have tackled the complexity by integrating the spectrum of genomic mutations with a variety of complementary biological knowledge databases. Among these structural features are the growing links between somatic gene disruptions and those conferring breast cancer risk, mutually exclusive coexistence and synergistic mutational patterns, and a clearly non-random distribution of mutations implicating specific molecular pathways in breast tumour initiation and progression. Recognising that a shift from a gene-centric to a pathway-centric approach is necessary, we envisage that further progress in identifying clinically relevant genomic aberration patterns and associated breast cancer subtypes will require not only multi-dimensional integrative analyses that combine mutational and functional profiles, but also larger profiling studies that use second- and third-generation sequencing technologies in order to fill out the important gaps in the current mutational landscape.
PMCID: PMC2688941  PMID: 19344493
7.  Role of DNA Methylation and Epigenetic Silencing of HAND2 in Endometrial Cancer Development 
PLoS Medicine  2013;10(11):e1001551.
TB filled in by Laureen
Please see later in the article for the Editors' Summary
Endometrial cancer incidence is continuing to rise in the wake of the current ageing and obesity epidemics. Much of the risk for endometrial cancer development is influenced by the environment and lifestyle. Accumulating evidence suggests that the epigenome serves as the interface between the genome and the environment and that hypermethylation of stem cell polycomb group target genes is an epigenetic hallmark of cancer. The objective of this study was to determine the functional role of epigenetic factors in endometrial cancer development.
Methods and Findings
Epigenome-wide methylation analysis of >27,000 CpG sites in endometrial cancer tissue samples (n = 64) and control samples (n = 23) revealed that HAND2 (a gene encoding a transcription factor expressed in the endometrial stroma) is one of the most commonly hypermethylated and silenced genes in endometrial cancer. A novel integrative epigenome-transcriptome-interactome analysis further revealed that HAND2 is the hub of the most highly ranked differential methylation hotspot in endometrial cancer. These findings were validated using candidate gene methylation analysis in multiple clinical sample sets of tissue samples from a total of 272 additional women. Increased HAND2 methylation was a feature of premalignant endometrial lesions and was seen to parallel a decrease in RNA and protein levels. Furthermore, women with high endometrial HAND2 methylation in their premalignant lesions were less likely to respond to progesterone treatment. HAND2 methylation analysis of endometrial secretions collected using high vaginal swabs taken from women with postmenopausal bleeding specifically identified those patients with early stage endometrial cancer with both high sensitivity and high specificity (receiver operating characteristics area under the curve = 0.91 for stage 1A and 0.97 for higher than stage 1A). Finally, mice harbouring a Hand2 knock-out specifically in their endometrium were shown to develop precancerous endometrial lesions with increasing age, and these lesions also demonstrated a lack of PTEN expression.
HAND2 methylation is a common and crucial molecular alteration in endometrial cancer that could potentially be employed as a biomarker for early detection of endometrial cancer and as a predictor of treatment response. The true clinical utility of HAND2 DNA methylation, however, requires further validation in prospective studies.
Please see later in the article for the Editors' Summary
Editors' Summary
Cancer, which is responsible for 13% of global deaths, can develop anywhere in the body, but all cancers are characterized by uncontrolled cell growth and reduced cellular differentiation (the process by which unspecialized cells such as “stem” cells become specialized during development, tissue repair, and normal cell turnover). Genetic alterations—changes in the sequence of nucleotides (DNA's building blocks) in specific genes—are required for this cellular transformation and subsequent cancer development (carcinogenesis). However, recent evidence suggests that epigenetic modifications—reversible, heritable changes in gene function that occur in the absence of nucleotide sequence changes—may also be involved in carcinogenesis. For example, the addition of methyl groups to a set of genes called stem cell polycomb group target genes (PCGTs; polycomb genes control the expression of their target genes by modifying their DNA or associated proteins) is one of the earliest molecular changes in human cancer development, and increasing evidence suggests that hypermethylation of PCGTs is an epigenetic hallmark of cancer.
Why Was This Study Done?
The methylation of PCGTs, which is triggered by age and by environmental factors that are associated with cancer development, reduces cellular differentiation and leads to the accumulation of undifferentiated cells that are susceptible to cancer development. It is unclear, however, whether epigenetic modifications have a causal role in carcinogenesis. Here, the researchers investigate the involvement of epigenetic factors in the development of endometrial (womb) cancer. The risk of endometrial cancer (which affects nearly 50,000 women annually in the United States) is largely determined by environmental and lifestyle factors. Specifically, the risk of this cancer is increased in women in whom estrogen (a hormone that drives cell proliferation in the endometrium) is functionally dominant over progesterone (a hormone that inhibits endometrial proliferation and causes cell differentiation); obese women and women who have taken estrogen-only hormone replacement therapies fall into this category. Thus, endometrial cancer is an ideal model in which to study whether epigenetic mechanisms underlie carcinogenesis.
What Did the Researchers Do and Find?
The researchers collected data on genome-wide DNA methylation at cytosine- and guanine-rich sites in endometrial cancers and normal endometrium and integrated this information with the human interactome and transcriptome (all the physical interactions between proteins and all the genes expressed, respectively, in a cell) using an algorithm called Functional Epigenetic Modules (FEM). This analysis identified HAND2 as the hub of the most highly ranked differential methylation hotspot in endometrial cancer. HAND2 is a progesterone-regulated stem cell PCGT. It encodes a transcription factor that is expressed in the endometrial stroma (the connective tissue that lies below the epithelial cells in which most endometrial cancers develop) and that suppresses the production of the growth factors that mediate the growth-inducing effects of estrogen on the endometrial epithelium. The researchers hypothesized, therefore, that epigenetic deregulation of HAND2 could be a key step in endometrial cancer development. In support of this hypothesis, the researchers report that HAND2 methylation was increased in premalignant endometrial lesions (cancer-prone, abnormal-looking tissue) compared to normal endometrium, and was associated with suppression of HAND2 expression. Moreover, a high level of endometrial HAND2 methylation in premalignant lesions predicted a poor response to progesterone treatment (which stops the growth of some endometrial cancers), and analysis of HAND2 methylation in endometrial secretions collected from women with postmenopausal bleeding (a symptom of endometrial cancer) accurately identified individuals with early stage endometrial cancer. Finally, mice in which the Hand2 gene was specifically deleted in the endometrium developed precancerous endometrial lesions with age.
What Do These Findings Mean?
These and other findings identify HAND2 methylation as a common, key molecular alteration in endometrial cancer. These findings need to be confirmed in more women, and studies are needed to determine the immediate molecular and cellular consequences of HAND2 silencing in endometrial stromal cells. Nevertheless, these results suggest that HAND2 methylation could potentially be used as a biomarker for the early detection of endometrial cancer and for predicting treatment response. More generally, these findings support the idea that methylation of HAND2 (and, by extension, the methylation of other PCGTs) is not a passive epigenetic feature of cancer but is functionally involved in cancer development, and provide a framework for identifying other genes that are epigenetically regulated and functionally important in carcinogenesis.
Additional Information
Please access these websites via the online version of this summary at
The US National Cancer Institute provides information on all aspects of cancer and has detailed information about endometrial cancer for patients and professionals (in English and Spanish)
The not-for-profit organization American Cancer Society provides information on cancer and how it develops and specific information on endometrial cancer (in several languages)
The UK National Health Service Choices website includes an introduction to cancer, a page on endometrial cancer, and a personal story about endometrial cancer
The not-for-profit organization Cancer Research UK provides general information about cancer and specific information about endometrial cancer
Wikipedia has a page on cancer epigenetics (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
The Eve Appeal charity that supported this research provides useful information on gynecological cancers
PMCID: PMC3825654  PMID: 24265601
8.  The BARD1 Cys557Ser Variant and Breast Cancer Risk in Iceland 
PLoS Medicine  2006;3(7):e217.
Most, if not all, of the cellular functions of the BRCA1 protein are mediated through heterodimeric complexes composed of BRCA1 and a related protein, BARD1. Some breast-cancer-associated BRCA1 missense mutations disrupt the function of the BRCA1/BARD1 complex. It is therefore pertinent to determine whether variants of BARD1 confer susceptibility to breast cancer. Recently, a missense BARD1 variant, Cys557Ser, was reported to be at increased frequencies in breast cancer families. We investigated the role of the BARD1 Cys557Ser variant in a population-based cohort of 1,090 Icelandic patients with invasive breast cancer and 703 controls. We then used a computerized genealogy of the Icelandic population to study the relationships between the Cys557Ser variant and familial clustering of breast cancer.
Methods and Findings
The Cys557Ser allele was present at a frequency of 0.028 in patients with invasive breast cancer and 0.016 in controls (odds ratio [OR] = 1.82, 95% confidence interval [CI] 1.11–3.01, p = 0.014). The alleleic frequency was 0.037 in a high-predisposition group of cases defined by having a family history of breast cancer, early onset of breast cancer, or multiple primary breast cancers (OR = 2.41, 95% CI 1.22–4.75, p = 0.015). Carriers of the common Icelandic BRCA2 999del5 mutation were found to have their risk of breast cancer further increased if they also carried the BARD1 variant: the frequency of the BARD1 variant allele was 0.047 (OR = 3.11, 95% CI 1.16–8.40, p = 0.046) in 999del5 carriers with breast cancer. This suggests that the lifetime probability of a BARD1 Cys557Ser/BRCA2 999del5 double carrier developing breast cancer could approach certainty. Cys557Ser carriers, with or without the BRCA2 mutation, had an increased risk of subsequent primary breast tumors after the first breast cancer diagnosis compared to non-carriers. Lobular and medullary breast carcinomas were overrepresented amongst Cys557Ser carriers. We found that an excess of ancestors of contemporary carriers lived in a single county in the southeast of Iceland and that all carriers shared a SNP haplotype, which is suggestive of a founder event. Cys557Ser was found on the same SNP haplotype background in the HapMap Project CEPH sample of Utah residents.
Our findings suggest that BARD1 Cys557Ser is an ancient variant that confers risk of single and multiple primary breast cancers, and this risk extends to carriers of the BRCA2 999del5 mutation.
Editors' Summary
About 13% of women (one in eight women) will develop breast cancer during their lifetime, but many factors affect the likelihood of any individual woman developing this disease, for example, whether she has had children and at what age, when she started and stopped her periods, and her exposure to certain chemicals or radiation. She may also have inherited a defective gene that affects her risk of developing breast cancer. Some 5%–10% of all breast cancers are familial, or inherited. In 20% of these cases, the gene that is defective is BRCA1 or BRCA2. Inheriting a defective copy of one of these genes greatly increases a woman's risk of developing breast cancer, while researchers think that the other inherited genes that predispose to breast cancer—most of which have not been identified yet—have a much weaker effect. These are described as low-penetrance genes. Inheriting one such gene only slightly increases breast cancer risk; a woman has to inherit several to increase her lifetime risk of cancer significantly.
Why Was This Study Done?
It is important to identify these additional predisposing gene variants because they might provide insights into why breast cancer develops, how to prevent it, and how to treat it. To find low-penetrance genes, researchers do case–control association studies. They find a large group of women with breast cancer (cases) and a similar group of women without cancer (controls), and examine how often a specific gene variant occurs in the two groups. If the variant is found more often in the cases than in the controls, it might be a variant that increases a woman's risk of developing breast cancer.
What Did the Researchers Do and Find?
The researchers involved in this study recruited Icelandic women who had had breast cancer and unaffected women, and looked for a specific variant—the Cys557Ser allele—of a gene called BARD1. They chose BARD1 because the protein it encodes interacts with the protein encoded by BRCA1. Because defects in BRCA1 increase the risk of breast cancer, defects in an interacting protein might have a similar effect. In addition, the Cys557Ser allele has been implicated in breast cancer in other studies. The researchers found that the Cys557Ser allele was nearly twice as common in women with breast cancer as in control women. It was also more common (but not by much) in women who had a family history of breast cancer or who had developed breast cancer more than once. And having the Cys557Ser allele seemed to increase the already high risk of breast cancer in women who had a BRCA2 variant (known as BRCA2 999del5) that accounts for 40% of inherited breast cancer risk in Iceland.
What Do These Findings Mean?
These results indicate that inheriting the BARD1 Cys557Ser allele increases a woman's breast cancer risk but that she is unlikely to have a family history of the disease. Because carrying the Cys557Ser allele only slightly increases a woman's risk of breast cancer, for most women there is no clinical reason to test for this variant. Eventually, when all the low-penetrance genes that contribute to breast cancer risk have been identified, it might be helpful to screen women for the full set to determine whether they are at high risk of developing breast cancer. This will not happen for many years, however, since there might be tens or hundreds of these genes. For women who carry BRCA2 999del5, the situation might be different. It might be worth testing these women for the BARD1 Cys557Ser allele, the researchers explain, because the lifetime probability of developing breast cancer in women carrying both variants might approach 100%. This finding has clinical implications in terms of counseling and monitoring, as does the observation that Cys557Ser carriers have an increased risk of a second, independent breast cancer compared to non-carriers. However, all these findings need to be confirmed in other groups of patients before anyone is routinely tested for the BARD1 Cys557Ser allele.
Additional Information.
Please access these Web sites via the online version of this summary at
• MedlinePlus pages about breast cancer
• Information on breast cancer from the United States National Cancer Institute
• Information on inherited breast cancer from the United States National Human Genome Research Institute
• United States National Cancer Institute information on genetic testing for BRCA1 and BRCA2 variants
• GeneTests pages on the involvement of BRCA1 and BRCA2 in hereditary breast and ovarian cancer
• Cancer Research UK's page on breast cancer statistics
In a population-based cohort of 1090 Icelandic patients, a Cys557Ser missense variant of the BARD1 gene, which interacts with BRCA1, increased the risk of single and multiple primary breast cancers.
PMCID: PMC1479388  PMID: 16768547
9.  Diverse Roles and Interactions of the SWI/SNF Chromatin Remodeling Complex Revealed Using Global Approaches 
PLoS Genetics  2011;7(3):e1002008.
A systems understanding of nuclear organization and events is critical for determining how cells divide, differentiate, and respond to stimuli and for identifying the causes of diseases. Chromatin remodeling complexes such as SWI/SNF have been implicated in a wide variety of cellular processes including gene expression, nuclear organization, centromere function, and chromosomal stability, and mutations in SWI/SNF components have been linked to several types of cancer. To better understand the biological processes in which chromatin remodeling proteins participate, we globally mapped binding regions for several components of the SWI/SNF complex throughout the human genome using ChIP-Seq. SWI/SNF components were found to lie near regulatory elements integral to transcription (e.g. 5′ ends, RNA Polymerases II and III, and enhancers) as well as regions critical for chromosome organization (e.g. CTCF, lamins, and DNA replication origins). Interestingly we also find that certain configurations of SWI/SNF subunits are associated with transcripts that have higher levels of expression, whereas other configurations of SWI/SNF factors are associated with transcripts that have lower levels of expression. To further elucidate the association of SWI/SNF subunits with each other as well as with other nuclear proteins, we also analyzed SWI/SNF immunoprecipitated complexes by mass spectrometry. Individual SWI/SNF factors are associated with their own family members, as well as with cellular constituents such as nuclear matrix proteins, key transcription factors, and centromere components, implying a ubiquitous role in gene regulation and nuclear function. We find an overrepresentation of both SWI/SNF-associated regions and proteins in cell cycle and chromosome organization. Taken together the results from our ChIP and immunoprecipitation experiments suggest that SWI/SNF facilitates gene regulation and genome function more broadly and through a greater diversity of interactions than previously appreciated.
Author Summary
Genetic information and programming are not entirely contained in DNA sequence but are also governed by chromatin structure. Gaining a greater understanding of chromatin remodeling complexes can bridge gaps between processes in the genome and the epigenome and can offer insights into diseases such as cancer. We identified targets of the chromatin remodeling complex, SWI/SNF, on a genome-wide scale using ChIP-Seq. We also identify proteins that co-purify with its various components via immunoprecipitation combined with mass spectrometry. By integrating these newly-identified regions with a combination of novel and published data sources, we identify pathways and cellular compartments in which SWI/SNF plays a major role as well as discern general characteristics of SWI/SNF target sites. Our parallel evaluations of multiple SWI/SNF factors indicate that these subunits are found in highly dynamic and combinatorial assemblies. Our study presents the first genome-wide and unified view of multiple SWI/SNF components and also provides a valuable resource to the scientific community as an important data source to be integrated with future genomic and epigenomic studies.
PMCID: PMC3048368  PMID: 21408204
10.  Programmed fluctuations in sense/antisense transcript ratios drive sexual differentiation in S. pombe 
Strand-specific RNA sequencing of S. pombe reveals a highly structured programme of ncRNA expression at over 600 loci. Functional investigations show that this extensive ncRNA landscape controls the complex programme of sexual differentiation in S. pombe.
The model eukaryote S. pombe features substantial numbers of ncRNAs many of which are antisense regulatory transcripts (ARTs), ncRNAs expressed on the opposing strand to coding sequences.Individual ARTs are generated during the mitotic cycle, or at discrete stages of sexual differentiation to downregulate the levels of proteins that drive and coordinate sexual differentiation.Antisense transcription occurring from events such as bidirectional transcription is not simply artefactual ‘chatter', it performs a critical role in regulating gene expression.
Regulation of the RNA profile is a principal control driving sexual differentiation in the fission yeast Schizosaccharomyces pombe. Before transcription, RNAi-mediated formation of heterochromatin is used to suppress expression, while post-transcription, regulation is achieved via the active stabilisation or destruction of transcripts, and through at least two distinct types of splicing control (Mata et al, 2002; Shimoseki and Shimoda, 2001; Averbeck et al, 2005; Mata and Bähler, 2006; Xue-Franzen et al, 2006; Moldon et al, 2008; Djupedal et al, 2009; Amorim et al, 2010; Grewal, 2010; Cremona et al, 2011).
Around 94% of the S. pombe genome is transcribed (Wilhelm et al, 2008). While many of these transcripts encode proteins (Wood et al, 2002; Bitton et al, 2011), the majority have no known function. We used a strand-specific protocol to sequence total RNA extracts taken from vegetatively growing cells, and at different points during a time course of sexual differentiation. The resulting data redefined existing gene coordinates and identified additional transcribed loci. The frequency of reads at each of these was used to monitor transcript abundance.
Transcript levels at 6599 loci changed in at least one sample (G-statistic; False Discovery Rate <5%). 4231 (72.3%), of which 4011 map to protein-coding genes, while 809 loci were antisense to a known gene. Comparisons between haploid and diploid strains identified changes in transcript levels at over 1000 loci.
At 354 loci, greater antisense abundance was observed relative to sense, in at least one sample (putative antisense regulatory transcripts—ARTs). Since antisense mechanisms are known to modulate sense transcript expression through a variety of inhibitory mechanisms (Faghihi and Wahlestedt, 2009), we postulated that the waves of antisense expression activated at different stages during meiosis might be regulating protein expression.
To ask whether transcription factors that drive sense-transcript levels influenced ART production, we performed RNA-seq of a pat1.114 diploid meiosis in the absence of the transcription factors Atf21 and Atf31 (responsible for late meiotic transcription; Mata et al, 2002). Transcript levels at 185 ncRNA loci showed significant changes in the knockout backgrounds. Although meiotic progression is largely unaffected by removal of Atf21 and Atf31, viability of the resulting spores was significantly diminished, indicating that Atf21- and Atf31-mediated events are critical to efficient sexual differentiation.
If changes to relative antisense/sense transcript levels during a particular phase of sexual differentiation were to regulate protein expression, then the continued presence of the antisense at points in the differentiation programme where it would normally be absent should abolish protein function during this phase. We tested this hypothesis at four loci representing the three means of antisense production: convergent gene expression, improper termination and nascent transcription from an independent locus. Induction of the natural antisense transcripts that opposed spo4+, spo6+ and dis1+ (Figures 3 and 7) in trans from a heterologous locus phenocopied a loss of function of the target protein. ART overexpression decreased Dis1 protein levels. Antisense transcription opposing spk1+ originated from improper termination of the sense ups1+ transcript on the opposite strand (Figure 3B, left locus). Expression of either the natural full-length ups1+ transcript or a truncated version, restricted to the portion of ups1+ overlapping spk1+ (Figure 3, orange transcripts) in trans from a heterologous locus phenocopied the spk1.Δ differentiation deficiency. Convergent transcription from a neighbouring gene on the opposing strand is, therefore, an effective mechanism to generate RNAi-mediated (below) silencing in fission yeast. Further analysis of the data revealed, for many loci, substantial changes in UTR length over the course of meiosis, suggesting that UTR dynamics may have an active role in regulating gene expression by controlling the transcriptional overlap between convergent adjacent gene pairs.
The RNAi machinery (Grewal, 2010) was required for antisense suppression at each of the dis1, spk1, spo4 and spo6 loci, as antisense to each locus had no impact in ago1.Δ, dcr1.Δ and rdp1.Δ backgrounds. We conclude that RNAi control has a key role in maintaining the fidelity of sexual differentiation in fission yeast. The histone H3 methyl transferase Clr4 was required for antisense control from a heterologous locus.
Thus, a significant portion of the impact of ncRNA upon sexual differentiation arises from antisense gene silencing. Importantly, in contrast to the extensively characterised ability of the RNAi machinery to operate in cis at a target locus in S. pombe (Grewal, 2010), each case of gene silencing generated here could be achieved in trans by expression of the antisense transcript from a single heterologous locus elsewhere in the genome.
Integration of an antibiotic marker gene immediately downstream of the dis1+ locus instigated antisense control in an orientation-dependent manner. PCR-based gene tagging approaches are widely used to fuse the coding sequences of epitope or protein tags to a gene of interest. Not only do these tagging approaches disrupt normal 3′UTR controls, but the insertion of a heterologous marker gene immediately downstream of an ORF can clearly have a significant impact upon transcriptional control of the resulting fusion protein. Thus, PCR tagging approaches can no longer be viewed as benign manipulations of a locus that only result in the production of a tagged protein product.
Repression of Dis1 function by gene deletion or antisense control revealed a key role this conserved microtubule regulator in driving the horsetail nuclear migrations that promote recombination during meiotic prophase.
Non-coding transcripts have often been viewed as simple ‘chatter', maintained solely because evolutionary pressures have not been strong enough to force their elimination from the system. Our data show that phenomena such as improper termination and bidirectional transcription are not simply interesting artifacts arising from the complexities of transcription or genome history, but have a critical role in regulating gene expression in the current genome. Given the widespread use of RNAi, it is reasonable to anticipate that future analyses will establish ARTs to have equal importance in other organisms, including vertebrates.
These data highlight the need to modify our concept of a gene from that of a spatially distinct locus. This view is becoming increasingly untenable. Not only are the 5′ and 3′ ends of many genes indistinct, but that this lack of a hard and fast boundary is actively used by cells to control the transcription of adjacent and overlapping loci, and thus to regulate critical events in the life of a cell.
Strand-specific RNA sequencing of S. pombe revealed a highly structured programme of ncRNA expression at over 600 loci. Waves of antisense transcription accompanied sexual differentiation. A substantial proportion of ncRNA arose from mechanisms previously considered to be largely artefactual, including improper 3′ termination and bidirectional transcription. Constitutive induction of the entire spk1+, spo4+, dis1+ and spo6+ antisense transcripts from an integrated, ectopic, locus disrupted their respective meiotic functions. This ability of antisense transcripts to disrupt gene function when expressed in trans suggests that cis production at native loci during sexual differentiation may also control gene function. Consistently, insertion of a marker gene adjacent to the dis1+ antisense start site mimicked ectopic antisense expression in reducing the levels of this microtubule regulator and abolishing the microtubule-dependent ‘horsetail' stage of meiosis. Antisense production had no impact at any of these loci when the RNA interference (RNAi) machinery was removed. Thus, far from being simply ‘genome chatter', this extensive ncRNA landscape constitutes a fundamental component in the controls that drive the complex programme of sexual differentiation in S. pombe.
PMCID: PMC3738847  PMID: 22186733
antisense; meiosis; ncRNA; S. pombe; siRNA
11.  Cross-species discovery of syncretic drug combinations that potentiate the antifungal fluconazole 
The authors screen for compounds that show synergistic antifungal activity when combined with the widely-used fungistatic drug fluconazole. Chemogenomic profiling explains the mode of action of synergistic drugs and allows the prediction of additional drug synergies.
The authors screen for compounds that show synergistic antifungal activity when combined with the widely-used fungistatic drug fluconazole. Chemogenomic profiling explains the mode of action of synergistic drugs and allows the prediction of additional drug synergies.
Chemical screens with a library enriched for known drugs identified a diverse set of 148 compounds that potentiated the action of the antifungal drug fluconazole against the fungal pathogens Cryptococcus neoformans, Cryptococcus gattii and Candida albicans, and the model yeast Saccharomyces cerevisiae, often in a species-specific manner.Chemogenomic profiles of six confirmed hits in S. cerevisiae revealed different modes of action and enabled the prediction of additional synergistic combinations; three-way synergistic interactions exhibited even stronger synergies at low doses of fluconazole.The synergistic combination of fluconazole and the antidepressant sertraline was active against fluconazole-resistant clinical fungal isolates and in an in vivo model of Cryptococcal infection.
Rising fungal infection rates, especially among immune-suppressed individuals, represent a serious clinical challenge (Gullo, 2009). Cancer, organ transplant and HIV patients, for example, often succumb to opportunistic fungal pathogens. The limited repertoire of approved antifungal agents and emerging drug resistance in the clinic further complicate the effective treatment of systemic fungal infections. At the molecular level, the paucity of fungal-specific essential targets arises from the conserved nature of cellular functions from yeast to humans, as well as from the fact that many essential yeast genes can confer viability at a fraction of wild-type dosage (Yan et al, 2009). Although only ∼1100 of the ∼6000 genes in yeast are essential, almost all genes become essential in specific genetic backgrounds in which another non-essential gene has been deleted or otherwise attenuated, an effect termed synthetic lethality (Tong et al, 2001). Genome-scale surveys suggest that over 200 000 binary synthetic lethal gene combinations dominate the yeast genetic landscape (Costanzo et al, 2010). The genetic buffering phenomenon is also manifest as a plethora of differential chemical–genetic interactions in the presence of sublethal doses of bioactive compounds (Hillenmeyer et al, 2008). These observations frame the difficulty of interdicting network functions in eukaryotic pathogens with single agent therapeutics. At the same time, however, this genetic network organization suggests that judicious combinations of small molecule inhibitors of both essential and non-essential targets may elicit additive or synergistic effects on cell growth (Sharom et al, 2004; Lehar et al, 2008). Unbiased screens for drugs that synergistically enhance a specific bioactive effect, but which are not themselves individually active—termed a syncretic combination—are one means to substantially elaborate chemical space (Keith et al, 2005). Indeed, compounds that enhance the activity of known agents in model yeast and cancer cell line systems have been identified both by focused small molecule library screens and by computational methods (Borisy et al, 2003; Lehar et al, 2007; Nelander et al, 2008; Jansen et al, 2009; Zinner et al, 2009).
To extend the stratagem of chemical synthetic lethality to clinically relevant fungal pathogens, we screened a bioactive library of known drugs for synergistic enhancers of the widely used fungistatic drug fluconazole against the clinically relevant pathogens C. albicans, C. neoformans and C. gattii, as well as the genetically tractable budding yeast S. cerevisiae. Fluconazole is an azole drug that inhibits lanosterol 14α-demethylase, the gene product of ERG11, an essential cytochrome P450 enzyme in the ergosterol biosynthetic pathway (Groll et al, 1998). We identified 148 drugs that potentiate the antifungal action of fluconazole against the four species. These syncretic compounds had not been previously recognized in the clinic as antifungal agents, and many acted in a species-specific manner, often in a potent fungicidal manner.
To understand the mechanisms of synergism, we interrogated six syncretic drugs—trifluoperazine, tamoxifen, clomiphene, sertraline, suloctidil and L-cycloserine—in genome-wide chemogenomic profiles of the S. cerevisiae deletion strain collection (Giaever et al, 1999). These profiles revealed that membrane, vesicle trafficking and lipid biosynthesis pathways are targeted by five of the synergizers, whereas the sphingolipid biosynthesis pathway is targeted by L-cycloserine. Cell biological assays confirmed the predicted membrane disruption effects of the former group of compounds, which may perturb ergosterol metabolism, impair fluconazole export by drug efflux pumps and/or affect active import of fluconazole (Kuo et al, 2010; Mansfield et al, 2010). Based on the integration of chemical–genetic and genetic interaction space, a signature set of deletion strains that are sensitive to the membrane active synergizers correctly predicted additional drug synergies with fluconazole. Similarly, the L-cycloserine chemogenomic profile correctly predicted a synergistic interaction between fluconazole and myriocin, another inhibitor of sphingolipid biosynthesis. The structure of genetic networks suggests that it should be possible to devise higher order drug combinations with even greater selectivity and potency (Sharom et al, 2004). In an initial test of this concept, we found that the combination of a non-synergistic pair drawn from the membrane active and sphingolipid target classes exhibited potent three-way synergism with a low dose of fluconazole. Finally, the combination of sertraline and fluconazole was active in a G. mellonella model of Cryptococcal infection, and was also efficacious against fluconazole-resistant clinical isolates of C. albicans and C. glabrata.
Collectively, these results demonstrate that the combinatorial redeployment of known drugs defines a powerful antifungal strategy and establish a number of potential lead combinations for future clinical assessment.
Resistance to widely used fungistatic drugs, particularly to the ergosterol biosynthesis inhibitor fluconazole, threatens millions of immunocompromised patients susceptible to invasive fungal infections. The dense network structure of synthetic lethal genetic interactions in yeast suggests that combinatorial network inhibition may afford increased drug efficacy and specificity. We carried out systematic screens with a bioactive library enriched for off-patent drugs to identify compounds that potentiate fluconazole action in pathogenic Candida and Cryptococcus strains and the model yeast Saccharomyces. Many compounds exhibited species- or genus-specific synergism, and often improved fluconazole from fungistatic to fungicidal activity. Mode of action studies revealed two classes of synergistic compound, which either perturbed membrane permeability or inhibited sphingolipid biosynthesis. Synergistic drug interactions were rationalized by global genetic interaction networks and, notably, higher order drug combinations further potentiated the activity of fluconazole. Synergistic combinations were active against fluconazole-resistant clinical isolates and an in vivo model of Cryptococcus infection. The systematic repurposing of approved drugs against a spectrum of pathogens thus identifies network vulnerabilities that may be exploited to increase the activity and repertoire of antifungal agents.
PMCID: PMC3159983  PMID: 21694716
antifungal; combination; pathogen; resistance; synergism
12.  The Role of the Toxicologic Pathologist in the Post-Genomic Era# 
Journal of Toxicologic Pathology  2013;26(2):105-110.
An era can be defined as a period in time identified by distinctive character, events, or practices. We are now in the genomic era. The pre-genomic era: There was a pre-genomic era. It started many years ago with novel and seminal animal experiments, primarily directed at studying cancer. It is marked by the development of the two-year rodent cancer bioassay and the ultimate realization that alternative approaches and short-term animal models were needed to replace this resource-intensive and time-consuming method for predicting human health risk. Many alternatives approaches and short-term animal models were proposed and tried but, to date, none have completely replaced our dependence upon the two-year rodent bioassay. However, the alternative approaches and models themselves have made tangible contributions to basic research, clinical medicine and to our understanding of cancer and they remain useful tools to address hypothesis-driven research questions. The pre-genomic era was a time when toxicologic pathologists played a major role in drug development, evaluating the cancer bioassay and the associated dose-setting toxicity studies, and exploring the utility of proposed alternative animal models. It was a time when there was shortage of qualified toxicologic pathologists. The genomic era: We are in the genomic era. It is a time when the genetic underpinnings of normal biological and pathologic processes are being discovered and documented. It is a time for sequencing entire genomes and deliberately silencing relevant segments of the mouse genome to see what each segment controls and if that silencing leads to increased susceptibility to disease. What remains to be charted in this genomic era is the complex interaction of genes, gene segments, post-translational modifications of encoded proteins, and environmental factors that affect genomic expression. In this current genomic era, the toxicologic pathologist has had to make room for a growing population of molecular biologists. In this present era newly emerging DVM and MD scientists enter the work arena with a PhD in pathology often based on some aspect of molecular biology or molecular pathology research. In molecular biology, the almost daily technological advances require one’s complete dedication to remain at the cutting edge of the science. Similarly, the practice of toxicologic pathology, like other morphological disciplines, is based largely on experience and requires dedicated daily examination of pathology material to maintain a well-trained eye capable of distilling specific information from stained tissue slides - a dedicated effort that cannot be well done as an intermezzo between other tasks. It is a rare individual that has true expertise in both molecular biology and pathology. In this genomic era, the newly emerging DVM-PhD or MD-PhD pathologist enters a marketplace without many job opportunities in contrast to the pre-genomic era. Many face an identity crisis needing to decide to become a competent pathologist or, alternatively, to become a competent molecular biologist. At the same time, more PhD molecular biologists without training in pathology are members of the research teams working in drug development and toxicology. How best can the toxicologic pathologist interact in the contemporary team approach in drug development, toxicology research and safety testing? Based on their biomedical training, toxicologic pathologists are in an ideal position to link data from the emerging technologies with their knowledge of pathobiology and toxicology. To enable this linkage and obtain the synergy it provides, the bench-level, slide-reading expert pathologist will need to have some basic understanding and appreciation of molecular biology methods and tools. On the other hand, it is not likely that the typical molecular biologist could competently evaluate and diagnose stained tissue slides from a toxicology study or a cancer bioassay. The post-genomic era: The post-genomic era will likely arrive approximately around 2050 at which time entire genomes from multiple species will exist in massive databases, data from thousands of robotic high throughput chemical screenings will exist in other databases, genetic toxicity and chemical structure-activity-relationships will reside in yet other databases. All databases will be linked and relevant information will be extracted and analyzed by appropriate algorithms following input of the latest molecular, submolecular, genetic, experimental, pathology and clinical data. Knowledge gained will permit the genetic components of many diseases to be amenable to therapeutic prevention and/or intervention. Much like computerized algorithms are currently used to forecast weather or to predict political elections, computerized sophisticated algorithms based largely on scientific data mining will categorize new drugs and chemicals relative to their health benefits versus their health risks for defined human populations and subpopulations. However, this form of a virtual toxicity study or cancer bioassay will only identify probabilities of adverse consequences from interaction of particular environmental and/or chemical/drug exposure(s) with specific genomic variables. Proof in many situations will require confirmation in intact in vivo mammalian animal models. The toxicologic pathologist in the post-genomic era will be the best suited scientist to confirm the data mining and its probability predictions for safety or adverse consequences with the actual tissue morphological features in test species that define specific test agent pathobiology and human health risk.
PMCID: PMC3695332  PMID: 23914052
genomic era; history of toxicologic pathology; molecular biology
13.  The essential genome of a bacterium 
This study reports the essential Caulobacter genome at 8 bp resolution determined by saturated transposon mutagenesis and high-throughput sequencing. This strategy is applicable to full genome essentiality studies in a broad class of bacterial species.
The essential Caulobacter genome was determined at 8 bp resolution using hyper-saturated transposon mutagenesis coupled with high-throughput sequencing.Essential protein-coding sequences comprise 90% of the essential genome; the remaining 10% comprising essential non-coding RNA sequences, gene regulatory elements and essential genome replication features.Of the 3876 annotated open reading frames (ORFs), 480 (12.4%) were essential ORFs, 3240 (83.6%) were non-essential ORFs and 156 (4.0%) were ORFs that severely impacted fitness when mutated.The essential elements are preferentially positioned near the origin and terminus of the Caulobacter chromosome.This high-resolution strategy is applicable to high-throughput, full genome essentiality studies and large-scale genetic perturbation experiments in a broad class of bacterial species.
The regulatory events that control polar differentiation and cell-cycle progression in the bacterium Caulobacter crescentus are highly integrated, and they have to occur in the proper order (McAdams and Shapiro, 2011). Components of the core regulatory circuit are largely known. Full discovery of its essential genome, including non-coding, regulatory and coding elements, is a prerequisite for understanding the complete regulatory network of this bacterial cell. We have identified all the essential coding and non-coding elements of the Caulobacter chromosome using a hyper-saturated transposon mutagenesis strategy that is scalable and can be readily extended to obtain rapid and accurate identification of the essential genome elements of any sequenced bacterial species at a resolution of a few base pairs.
We engineered a Tn5 derivative transposon (Tn5Pxyl) that carries at one end an inducible outward pointing Pxyl promoter (Christen et al, 2010). We showed that this transposon construct inserts into the genome randomly where it can activate or disrupt transcription at the site of integration, depending on the insertion orientation. DNA from hundred of thousands of transposon insertion sites reading outward into flanking genomic regions was parallel PCR amplified and sequenced by Illumina paired-end sequencing to locate the insertion site in each mutant strain (Figure 1). A single sequencing run on DNA from a mutagenized cell population yielded 118 million raw sequencing reads. Of these, >90 million (>80%) read outward from the transposon element into adjacent genomic DNA regions and the insertion site could be mapped with single nucleotide resolution. This yielded the location and orientation of 428 735 independent transposon insertions in the 4-Mbp Caulobacter genome.
Within non-coding sequences of the Caulobacter genome, we detected 130 non-disruptable DNA segments between 90 and 393 bp long in addition to all essential promoter elements. Among 27 previously identified and validated sRNAs (Landt et al, 2008), three were contained within non-disruptable DNA segments and another three were partially disruptable, that is, insertions caused a notable growth defect. Two additional small RNAs found to be essential are the transfer-messenger RNA (tmRNA) and the ribozyme RNAseP (Landt et al, 2008). In addition to the 8 non-disruptable sRNAs, 29 out of the 130 intergenic essential non-coding sequences contained non-redundant tRNA genes; duplicated tRNA genes were non-essential. We also identified two non-disruptable DNA segments within the chromosomal origin of replication. Thus, we resolved essential non-coding RNAs, tRNAs and essential replication elements within the origin region of the chromosome. An additional 90 non-disruptable small genome elements of currently unknown function were identified. Eighteen of these are conserved in at least one closely related species. Only 2 could encode a protein of over 50 amino acids.
For each of the 3876 annotated open reading frames (ORFs), we analyzed the distribution, orientation, and genetic context of transposon insertions. There are 480 essential ORFs and 3240 non-essential ORFs. In addition, there were 156 ORFs that severely impacted fitness when mutated. The 8-bp resolution allowed a dissection of the essential and non-essential regions of the coding sequences. Sixty ORFs had transposon insertions within a significant portion of their 3′ region but lacked insertions in the essential 5′ coding region, allowing the identification of non-essential protein segments. For example, transposon insertions in the essential cell-cycle regulatory gene divL, a tyrosine kinase, showed that the last 204 C-terminal amino acids did not impact viability, confirming previous reports that the C-terminal ATPase domain of DivL is dispensable for viability (Reisinger et al, 2007; Iniesta et al, 2010). In addition, we found that 30 out of 480 (6.3%) of the essential ORFs appear to be shorter than the annotated ORF, suggesting that these are probably mis-annotated.
Among the 480 ORFs essential for growth on rich media, there were 10 essential transcriptional regulatory proteins, including 5 previously identified cell-cycle regulators (McAdams and Shapiro, 2003; Holtzendorff et al, 2004; Collier and Shapiro, 2007; Gora et al, 2010; Tan et al, 2010) and 5 uncharacterized predicted transcription factors. In addition, two RNA polymerase sigma factors RpoH and RpoD, as well as the anti-sigma factor ChrR, which mitigates rpoE-dependent stress response under physiological growth conditions (Lourenco and Gomes, 2009), were also found to be essential. Thus, a set of 10 transcription factors, 2 RNA polymerase sigma factors and 1 anti-sigma factor are the core essential transcriptional regulators for growth on rich media. To further characterize the core components of the Caulobacter cell-cycle control network, we identified all essential regulatory sequences and operon transcripts. Altogether, the 480 essential protein-coding and 37 essential RNA-coding Caulobacter genes are organized into operons such that 402 individual promoter regions are sufficient to regulate their expression. Of these 402 essential promoters, the transcription start sites (TSSs) of 105 were previously identified (McGrath et al, 2007).
The essential genome features are non-uniformly distributed on the Caulobacter genome and enriched near the origin and the terminus regions. In contrast, the chromosomal positions of the published E. coli essential coding sequences (Rocha, 2004) are preferentially located at either side of the origin (Figure 4A). This indicates that there are selective pressures on chromosomal positioning of some essential elements (Figure 4A).
The strategy described in this report could be readily extended to quickly determine the essential genome for a large class of bacterial species.
Caulobacter crescentus is a model organism for the integrated circuitry that runs a bacterial cell cycle. Full discovery of its essential genome, including non-coding, regulatory and coding elements, is a prerequisite for understanding the complete regulatory network of a bacterial cell. Using hyper-saturated transposon mutagenesis coupled with high-throughput sequencing, we determined the essential Caulobacter genome at 8 bp resolution, including 1012 essential genome features: 480 ORFs, 402 regulatory sequences and 130 non-coding elements, including 90 intergenic segments of unknown function. The essential transcriptional circuitry for growth on rich media includes 10 transcription factors, 2 RNA polymerase sigma factors and 1 anti-sigma factor. We identified all essential promoter elements for the cell cycle-regulated genes. The essential elements are preferentially positioned near the origin and terminus of the chromosome. The high-resolution strategy used here is applicable to high-throughput, full genome essentiality studies and large-scale genetic perturbation experiments in a broad class of bacterial species.
PMCID: PMC3202797  PMID: 21878915
functional genomics; next-generation sequencing; systems biology; transposon mutagenesis
14.  Tumor-associated copy number changes in the circulation of patients with prostate cancer identified through whole-genome sequencing 
Genome Medicine  2013;5(4):30.
Patients with prostate cancer may present with metastatic or recurrent disease despite initial curative treatment. The propensity of metastatic prostate cancer to spread to the bone has limited repeated sampling of tumor deposits. Hence, considerably less is understood about this lethal metastatic disease, as it is not commonly studied. Here we explored whole-genome sequencing of plasma DNA to scan the tumor genomes of these patients non-invasively.
We wanted to make whole-genome analysis from plasma DNA amenable to clinical routine applications and developed an approach based on a benchtop high-throughput platform, that is, Illuminas MiSeq instrument. We performed whole-genome sequencing from plasma at a shallow sequencing depth to establish a genome-wide copy number profile of the tumor at low costs within 2 days. In parallel, we sequenced a panel of 55 high-interest genes and 38 introns with frequent fusion breakpoints such as the TMPRSS2-ERG fusion with high coverage. After intensive testing of our approach with samples from 25 individuals without cancer we analyzed 13 plasma samples derived from five patients with castration resistant (CRPC) and four patients with castration sensitive prostate cancer (CSPC).
The genome-wide profiling in the plasma of our patients revealed multiple copy number aberrations including those previously reported in prostate tumors, such as losses in 8p and gains in 8q. High-level copy number gains in the AR locus were observed in patients with CRPC but not with CSPC disease. We identified the TMPRSS2-ERG rearrangement associated 3-Mbp deletion on chromosome 21 and found corresponding fusion plasma fragments in these cases. In an index case multiregional sequencing of the primary tumor identified different copy number changes in each sector, suggesting multifocal disease. Our plasma analyses of this index case, performed 13 years after resection of the primary tumor, revealed novel chromosomal rearrangements, which were stable in serial plasma analyses over a 9-month period, which is consistent with the presence of one metastatic clone.
The genomic landscape of prostate cancer can be established by non-invasive means from plasma DNA. Our approach provides specific genomic signatures within 2 days which may therefore serve as 'liquid biopsy'.
PMCID: PMC3707016  PMID: 23561577
15.  Genomic and epigenomic integration identifies a prognostic signature in colon cancer 
The importance of genetic and epigenetic alterations maybe in their aggregate role in altering core pathways in tumorigenesis.
Experimental Design
Merging genome-wide genomic and epigenomic alterations, we identify key genes and pathways altered in colorectal cancers (CRC). DNA Methylation analysis was tested for predicting survival in CRC patients using Cox proportional hazard model.
We identified 29 low frequency mutated genes that are also inactivated by epigenetic mechanisms in CRC. Pathway analysis showed the extracellular matrix (ECM) remodeling pathway is silenced in CRC. 6 ECM pathway genes were tested for their prognostic potential in large CRC cohorts (n=777). DNA Methylation of IGFBP3 and EVL predicted for poor survival (IGFBP3: HR=2.58, 95%CI:1.37-4.87, p=0.004; EVL: HR=2.48, 95%CI:1.07-5.74, p=0.034) and simultaneous methylation of multiple genes predicted significantly worse survival (HR=8.61, 95%CI:2.16-34.36, p<0.001 for methylation of IGFBP3, EVL, CD109 and FLNC). DNA Methylation of IGFBP3 and EVL was validated as a prognostic marker in an independent contemporary matched cohort (IGFBP3 HR=2.06, 95% CI:1.04-4.09, p=0.038; EVL HR=2.23, 95%CI:1.00-5.0, p=0.05) and EVL DNA methylation remained significant in a secondary historical validation cohort (HR=1.41, 95%CI:1.05-1.89, p=0.022). Moreover, DNA methylation of selected ECM genes helps to stratify the high-risk Stage 2 colon cancers patients who would benefit from adjuvant chemotherapy (HR: 5.85, 95%CI:2.03-16.83, p=0.001 for simultaneous methylation of IGFBP3, EVL and CD109).
CRC that have silenced in ECM pathway components show worse survival suggesting that our finding provides novel prognostic biomarkers for CRC and reflects the high importance of integrative analyses linking genetic and epigenetic abnormalities with pathway disruption in cancer.
PMCID: PMC3077819  PMID: 21278247
DNA Methylation; Extracellular Matrix Pathway; Prognostic Biomarker; Colorectal cancer
16.  A High-Dimensional, Deep-Sequencing Study of Lung Adenocarcinoma in Female Never-Smokers 
PLoS ONE  2013;8(2):e55596.
Deep sequencing techniques provide a remarkable opportunity for comprehensive understanding of tumorigenesis at the molecular level. As omics studies become popular, integrative approaches need to be developed to move from a simple cataloguing of mutations and changes in gene expression to dissecting the molecular nature of carcinogenesis at the systemic level and understanding the complex networks that lead to cancer development.
Here, we describe a high-throughput, multi-dimensional sequencing study of primary lung adenocarcinoma tumors and adjacent normal tissues of six Korean female never-smoker patients. Our data encompass results from exome-seq, RNA-seq, small RNA-seq, and MeDIP-seq. We identified and validated novel genetic aberrations, including 47 somatic mutations and 19 fusion transcripts. One of the fusions involves the c-RET gene, which was recently reported to form fusion genes that may function as drivers of carcinogenesis in lung cancer patients. We also characterized gene expression profiles, which we integrated with genomic aberrations and gene regulations into functional networks. The most prominent gene network module that emerged indicates that disturbances in G2/M transition and mitotic progression are causally linked to tumorigenesis in these patients. Also, results from the analysis strongly suggest that several novel microRNA-target interactions represent key regulatory elements of the gene network.
Our study not only provides an overview of the alterations occurring in lung adenocarcinoma at multiple levels from genome to transcriptome and epigenome, but also offers a model for integrative genomics analysis and proposes potential target pathways for the control of lung adenocarcinoma.
PMCID: PMC3566005  PMID: 23405175
17.  In Vitro Analysis of Integrated Global High-Resolution DNA Methylation Profiling with Genomic Imbalance and Gene Expression in Osteosarcoma 
PLoS ONE  2008;3(7):e2834.
Genetic and epigenetic changes contribute to deregulation of gene expression and development of human cancer. Changes in DNA methylation are key epigenetic factors regulating gene expression and genomic stability. Recent progress in microarray technologies resulted in developments of high resolution platforms for profiling of genetic, epigenetic and gene expression changes. OS is a pediatric bone tumor with characteristically high level of numerical and structural chromosomal changes. Furthermore, little is known about DNA methylation changes in OS. Our objective was to develop an integrative approach for analysis of high-resolution epigenomic, genomic, and gene expression profiles in order to identify functional epi/genomic differences between OS cell lines and normal human osteoblasts. A combination of Affymetrix Promoter Tilling Arrays for DNA methylation, Agilent array-CGH platform for genomic imbalance and Affymetrix Gene 1.0 platform for gene expression analysis was used. As a result, an integrative high-resolution approach for interrogation of genome-wide tumour-specific changes in DNA methylation was developed. This approach was used to provide the first genomic DNA methylation maps, and to identify and validate genes with aberrant DNA methylation in OS cell lines. This first integrative analysis of global cancer-related changes in DNA methylation, genomic imbalance, and gene expression has provided comprehensive evidence of the cumulative roles of epigenetic and genetic mechanisms in deregulation of gene expression networks.
PMCID: PMC2515339  PMID: 18698372
18.  Identifying the genetic determinants of transcription factor activity 
Genome-wide messenger RNA expression levels are highly heritable. However, the molecular mechanisms underlying this heritability are poorly understood.The influence of trans-acting polymorphisms is often mediated by changes in the regulatory activity of one or more sequence-specific transcription factors (TFs). We use a method that exploits prior information about the DNA-binding specificity of each TF to estimate its genotype-specific regulatory activity. To this end, we perform linear regression of genotype-specific differential mRNA expression on TF-specific promoter-binding affinity.Treating inferred TF activity as a quantitative trait and mapping it across a panel of segregants from an experimental genetic cross allows us to identify trans-acting loci (‘aQTLs') whose allelic variation modulates the TF. A few of these aQTL regions contain the gene encoding the TF itself; several others contain a gene whose protein product is known to interact with the TF.Our method is strictly causal, as it only uses sequence-based features as predictors. Application to budding yeast demonstrates a dramatic increase in statistical power, compared with existing methods, to detect locus-TF associations and trans-acting loci. Our aQTL mapping strategy also succeeds in mouse.
Genetic sequence variation naturally perturbs mRNA expression levels in the cell. In recent years, analysis of parallel genotyping and expression profiling data for segregants from genetic crosses between parental strains has revealed that mRNA expression levels are highly heritable. Expression quantitative trait loci (eQTLs), whose allelic variation regulates the expression level of individual genes, have successfully been identified (Brem et al, 2002; Schadt et al, 2003). The molecular mechanisms underlying the heritability of mRNA expression are poorly understood. However, they are likely to involve mediation by transcription factors (TFs). We present a new transcription-factor-centric method that greatly increases our ability to understand what drives the genetic variation in mRNA expression (Figure 1). Our method identifies genomic loci (‘aQTLs') whose allelic variation modulates the protein-level activity of specific TFs. To map aQTLs, we integrate genotyping and expression profiling data with quantitative prior information about DNA-binding specificity of transcription factors in the form of position-specific affinity matrices (Bussemaker et al, 2007). We applied our method in two different organisms: budding yeast and mouse.
In our approach, the inferred TF activity is explicitly treated as a quantitative trait, and genetically mapped. The decrease of ‘phenotype space' from that of all genes (in the eQTL approach) to that of all TFs (in our aQTL approach) increases the statistical power to detect trans-acting loci in two distinct ways. First, as each inferred TF activity is derived from a large number of genes, it is far less noisy than mRNA levels of individual genes. Second, the number of trait/marker combinations that needs to be tested for statistical significance in parallel is roughly two orders of magnitude smaller than for eQTLs. We identified a total of 103 locus-TF associations, a more than six-fold improvement over the 17 locus-TF associations identified by several existing methods (Brem et al, 2002; Yvert et al, 2003; Lee et al, 2006; Smith and Kruglyak, 2008; Zhu et al, 2008). The total number of distinct genomic loci identified as an aQTL equals 31, which includes 11 of the 13 previously identified eQTL hotspots (Smith and Kruglyak, 2008).
To better understand the mechanisms underlying the identified genetic linkages, we examined the genes within each aQTL region. First, we found four ‘local' aQTLs, which encompass the gene encoding the TF itself. This includes the known polymorphism in the HAP1 gene (Brem et al, 2002), but also novel predictions of trans-acting polymorphisms in RFX1, STB5, and HAP4. Second, using high-throughput protein–protein interaction data, we identified putative causal genes for several aQTLs. For example, we predict that a polymorphism in the cyclin-dependent kinase CDC28 antagonistically modulates the functionally distinct cell cycle regulators Fkh1 and Fkh2. In this and other cases, our approach naturally accounts for post-translational modulation of TF activity at the protein level.
We validated our ability to predict locus-TF associations in yeast using gene expression profiles of allele replacement strains from a previous study (Smith and Kruglyak, 2008). Chromosome 15 contains an aQTL whose allelic status influences the activity of no fewer than 30 distinct TFs. This locus includes IRA2, which controls intracellular cAMP levels. We used the gene expression profile of IRA2 replacement strains to confirm that the polymorphism within IRA2 indeed modulates a subset of the TFs whose activity was predicted to link to this locus, and no other TFs.
Application of our approach to mouse data identified an aQTL modulating the activity of a specific TF in liver cells. We identified an aQTL on mouse chromosome 7 for Zscan4, a transcription factor containing four zinc finger domains and a SCAN domain. Even though we could not detect a candidate causal gene for Zscan4p because of lack of information about the mouse genome, our result demonstrates that our method also works in higher eukaryotes.
In summary, aQTL mapping has a greatly improved sensitivity to detect molecular mechanisms underlying the heritability of gene expression. The successful application of our approach to yeast and mouse data underscores the value of explicitly treating the inferred TF activity as a quantitative trait for increasing statistical power of detecting trans-acting loci. Furthermore, our method is computationally efficient, and easily applicable to any other organism whenever prior information about the DNA-binding specificity of TFs is available.
Analysis of parallel genotyping and expression profiling data has shown that mRNA expression levels are highly heritable. Currently, only a tiny fraction of this genetic variance can be mechanistically accounted for. The influence of trans-acting polymorphisms on gene expression traits is often mediated by transcription factors (TFs). We present a method that exploits prior knowledge about the in vitro DNA-binding specificity of a TF in order to map the loci (‘aQTLs') whose inheritance modulates its protein-level regulatory activity. Genome-wide regression of differential mRNA expression on predicted promoter affinity is used to estimate segregant-specific TF activity, which is subsequently mapped as a quantitative phenotype. In budding yeast, our method identifies six times as many locus-TF associations and more than twice as many trans-acting loci as all existing methods combined. Application to mouse data from an F2 intercross identified an aQTL on chromosome VII modulating the activity of Zscan4 in liver cells. Our method has greatly improved statistical power over existing methods, is mechanism based, strictly causal, computationally efficient, and generally applicable.
PMCID: PMC2964119  PMID: 20865005
gene expression; gene regulatory networks; genetic variation; quantitative trait loci; transcription factors
19.  Integrative Genomic Analyses Identify BRF2 as a Novel Lineage-Specific Oncogene in Lung Squamous Cell Carcinoma 
PLoS Medicine  2010;7(7):e1000315.
William Lockwood and colleagues show that the focal amplification of a gene, BRF2, on Chromosome 8p12 plays a key role in squamous cell carcinoma of the lung.
Traditionally, non-small cell lung cancer is treated as a single disease entity in terms of systemic therapy. Emerging evidence suggests the major subtypes—adenocarcinoma (AC) and squamous cell carcinoma (SqCC)—respond differently to therapy. Identification of the molecular differences between these tumor types will have a significant impact in designing novel therapies that can improve the treatment outcome.
Methods and Findings
We used an integrative genomics approach, combing high-resolution comparative genomic hybridization and gene expression microarray profiles, to compare AC and SqCC tumors in order to uncover alterations at the DNA level, with corresponding gene transcription changes, which are selected for during development of lung cancer subtypes. Through the analysis of multiple independent cohorts of clinical tumor samples (>330), normal lung tissues and bronchial epithelial cells obtained by bronchial brushing in smokers without lung cancer, we identified the overexpression of BRF2, a gene on Chromosome 8p12, which is specific for development of SqCC of lung. Genetic activation of BRF2, which encodes a RNA polymerase III (Pol III) transcription initiation factor, was found to be associated with increased expression of small nuclear RNAs (snRNAs) that are involved in processes essential for cell growth, such as RNA splicing. Ectopic expression of BRF2 in human bronchial epithelial cells induced a transformed phenotype and demonstrates downstream oncogenic effects, whereas RNA interference (RNAi)-mediated knockdown suppressed growth and colony formation of SqCC cells overexpressing BRF2, but not AC cells. Frequent activation of BRF2 in >35% preinvasive bronchial carcinoma in situ, as well as in dysplastic lesions, provides evidence that BRF2 expression is an early event in cancer development of this cell lineage.
This is the first study, to our knowledge, to show that the focal amplification of a gene in Chromosome 8p12, plays a key role in squamous cell lineage specificity of the disease. Our data suggest that genetic activation of BRF2 represents a unique mechanism of SqCC lung tumorigenesis through the increase of Pol III-mediated transcription. It can serve as a marker for lung SqCC and may provide a novel target for therapy.
Please see later in the article for the Editors' Summary
Editors' Summary
Lung cancer is the commonest cause of cancer-related death. Every year, 1.3 million people die from this disease, which is mainly caused by smoking. Most cases of lung cancer are “non-small cell lung cancers” (NSCLCs). Like all cancers, NSCLC starts when cells begin to divide uncontrollably and to move round the body (metastasize) because of changes (mutations) in their genes. These mutations are often in “oncogenes,” genes that, when activated, encourage cell division. Oncogenes can be activated by mutations that alter the properties of the proteins they encode or by mutations that increase the amount of protein made from them, such as gene amplification (an increase in the number of copies of a gene). If NSCLC is diagnosed before it has spread from the lungs (stage I disease), it can be surgically removed and many patients with stage I NSCLC survive for more than 5 years after their diagnosis. Unfortunately, in more than half of patients, NSCLC has metastasized before it is diagnosed. This stage IV NSCLC can be treated with chemotherapy (toxic chemicals that kill fast-growing cancer cells) but only 2% of patients with stage IV lung cancer are alive 5 years after diagnosis.
Why Was This Study Done?
Traditionally, NSCLC has been regarded as a single disease in terms of treatment. However, emerging evidence suggests that the two major subtypes of NSCLC—adenocarcinoma and squamous cell carcinoma (SqCC)—respond differently to chemotherapy. Adenocarcinoma and SqCC start in different types of lung cell and experts think that for each cell type in the body, specific combinations of mutations interact with the cell type's own unique characteristics to provide the growth and survival advantage needed for cancer development. If this is true, then identifying the molecular differences between adenocarcinoma and SqCC could provide targets for more effective therapies for these major subtypes of NSCLC. Amplification of a chromosome region called 8p12 is very common in NSCLC, which suggests that an oncogene that drives lung cancer development is present in this chromosome region. In this study, the researchers investigate this possibility by looking for an amplified gene in the 8p12 chromosome region that makes increased amounts of protein in lung SqCC but not in lung adenocarcinoma.
What Did the Researchers Do and Find?
The researchers used a technique called comparative genomic hybridization to show that focal regions of Chromosome 8p are amplified in about 40% of lung SqCCs, but that DNA loss in this region is the most common alteration in lung adenocarcinomas. Ten genes in the 8p12 chromosome region were expressed at higher levels in the SqCC samples that they examined than in adenocarcinoma samples, they report, and overexpression of five of these genes correlated with amplification of the 8p12 region in the SqCC samples. Only one of the genes—BRF2—was more highly expressed in squamous carcinoma cells than in normal bronchial epithelial cells (the cell type that lines the tubes that take air into the lungs and from which SqCC develops). Artificially induced expression of BRF2 in bronchial epithelial cells made these normal cells behave like tumor cells, whereas reduction of BRF2 expression in squamous carcinoma cells made them behave more like normal bronchial epithelial cells. Finally, BRF2 was frequently activated in two early stages of squamous cell carcinoma—bronchial carcinoma in situ and dysplastic lesions.
What Do These Findings Mean?
Together, these findings show that the focal amplification of chromosome region 8p12 plays a role in the development of lung SqCC but not in the development of lung adenocarcinoma, the other major subtype of NSCLC. These findings identify BRF2 (which encodes a RNA polymerase III transcription initiation factor, a protein that is required for the synthesis of RNA molecules that help to control cell growth) as a lung SqCC-specific oncogene and uncover a unique mechanism for lung SqCC development. Most importantly, these findings suggest that genetic activation of BRF2 could be used as a marker for lung SqCC, which might facilitate the early detection of this type of NSCLC and that BRF2 might provide a new target for therapy.
Additional Information
Please access these Web sites via the online version of this summary at
The US National Cancer Institute provides detailed information for patients and professionals about all aspects of lung cancer, including information on non-small cell carcinoma (in English and Spanish)
Cancer Research UK also provides information about lung cancer and information on how cancer starts
MedlinePlus has links to other resources about lung cancer (in English and Spanish)
PMCID: PMC2910599  PMID: 20668658
20.  Integrative prescreening in analysis of multiple cancer genomic studies 
BMC Bioinformatics  2012;13:168.
In high throughput cancer genomic studies, results from the analysis of single datasets often suffer from a lack of reproducibility because of small sample sizes. Integrative analysis can effectively pool and analyze multiple datasets and provides a cost effective way to improve reproducibility. In integrative analysis, simultaneously analyzing all genes profiled may incur high computational cost. A computationally affordable remedy is prescreening, which fits marginal models, can be conducted in a parallel manner, and has low computational cost.
An integrative prescreening approach is developed for the analysis of multiple cancer genomic datasets. Simulation shows that the proposed integrative prescreening has better performance than alternatives, particularly including prescreening with individual datasets, an intensity approach and meta-analysis. We also analyze multiple microarray gene profiling studies on liver and pancreatic cancers using the proposed approach.
The proposed integrative prescreening provides an effective way to reduce the dimensionality in cancer genomic studies. It can be coupled with existing analysis methods to identify cancer markers.
PMCID: PMC3436748  PMID: 22799431
21.  Report on EU–USA Workshop: How Systems Biology Can Advance Cancer Research (27 October 2008)☆ 
Molecular oncology  2008;3(1):9-17.
The main conclusion is that systems biology approaches can indeed advance cancer research, having already proved successful in a very wide variety of cancer-related areas, and are likely to prove superior to many current research strategies. Major points include: Systems biology and computational approaches can make important contributions to research and development in key clinical aspects of cancer and of cancer treatment, and should be developed for understanding and application to diagnosis, biomarkers, cancer progression, drug development and treatment strategies.Development of new measurement technologies is central to successful systems approaches, and should be strongly encouraged. The systems view of disease combined with these new technologies and novel computational tools will over the next 5–20 years lead to medicine that is predictive, personalized, preventive and participatory (P4 medicine).Major initiatives are in progress to gather extremely wide ranges of data for both somatic and germ-line genetic variations, as well as gene, transcript, protein and metabolite expression profiles that are cancer-relevant. Electronic databases and repositories play a central role to store and analyze these data. These resources need to be developed and sustained.Understanding cellular pathways is crucial in cancer research, and these pathways need to be considered in the context of the progression of cancer at various stages. At all stages of cancer progression, major areas require modelling via systems and developmental biology methods including immune system reactions, angiogenesis and tumour progression.A number of mathematical models of an analytical or computational nature have been developed that can give detailed insights into the dynamics of cancer-relevant systems. These models should be further integrated across multiple levels of biological organization in conjunction with analysis of laboratory and clinical data.Biomarkers represent major tools in determining the presence of cancer, its progression and the responses to treatments. There is a need for sets of high-quality annotated clinical samples, enabling comparisons across different diseases and the quantitative simulation of major pathways leading to biomarker development and analysis of drug effects.Education is recognized as a key component in the success of any systems biology programme, especially for applications to cancer research. It is recognized that a balance needs to be found between the need to be interdisciplinary and the necessity of having extensive specialist knowledge in particular areas.A proposal from this workshop is to explore one or more types of cancer over the full scale of their progression, for example glioblastoma or colon cancer. Such an exemplar project would require all the experimental and computational tools available for the generation and analysis of quantitative data over the entire hierarchy of biological information. These tools and approaches could be mobilized to understand, detect and treat cancerous processes and establish methods applicable across a wide range of cancers.
PMCID: PMC2930781  PMID: 19383362
Systems biology; EU-USA workshop; Cancer
22.  Recurrent Targeted Genes of Hepatitis B Virus in the Liver Cancer Genomes Identified by a Next-Generation Sequencing–Based Approach 
PLoS Genetics  2012;8(12):e1003065.
Integration of the viral DNA into host chromosomes was found in most of the hepatitis B virus (HBV)–related hepatocellular carcinomas (HCCs). Here we devised a massive anchored parallel sequencing (MAPS) method using next-generation sequencing to isolate and sequence HBV integrants. Applying MAPS to 40 pairs of HBV–related HCC tissues (cancer and adjacent tissues), we identified 296 HBV integration events corresponding to 286 unique integration sites (UISs) with precise HBV–Human DNA junctions. HBV integration favored chromosome 17 and preferentially integrated into human transcript units. HBV targeted genes were enriched in GO terms: cAMP metabolic processes, T cell differentiation and activation, TGF beta receptor pathway, ncRNA catabolic process, and dsRNA fragmentation and cellular response to dsRNA. The HBV targeted genes include 7 genes (PTPRJ, CNTN6, IL12B, MYOM1, FNDC3B, LRFN2, FN1) containing IPR003961 (Fibronectin, type III domain), 7 genes (NRG3, MASP2, NELL1, LRP1B, ADAM21, NRXN1, FN1) containing IPR013032 (EGF-like region, conserved site), and three genes (PDE7A, PDE4B, PDE11A) containing IPR002073 (3′, 5′-cyclic-nucleotide phosphodiesterase). Enriched pathways include hsa04512 (ECM-receptor interaction), hsa04510 (Focal adhesion), and hsa04012 (ErbB signaling pathway). Fewer integration events were found in cancers compared to cancer-adjacent tissues, suggesting a clonal expansion model in HCC development. Finally, we identified 8 genes that were recurrent target genes by HBV integration including fibronectin 1 (FN1) and telomerase reverse transcriptase (TERT1), two known recurrent target genes, and additional novel target genes such as SMAD family member 5 (SMAD5), phosphatase and actin regulator 4 (PHACTR4), and RNA binding protein fox-1 homolog (C. elegans) 1 (RBFOX1). Integrating analysis with recently published whole-genome sequencing analysis, we identified 14 additional recurrent HBV target genes, greatly expanding the HBV recurrent target list. This global survey of HBV integration events, together with recently published whole-genome sequencing analyses, furthered our understanding of the HBV–related HCC.
Author Summary
Integration of the hepatitis B virus (HBV) into the human liver cells was found in most of the related hepatocellular carcinomas (HCCs). Here, taking the recent advances in high-throughput sequencing, we devised an efficient and cost-effective method that we named massive anchored parallel sequencing (MAPS) method, to conduct a global survey of HBV integration events in 40 pairs of HBV–related HCC tissues (cancer and adjacent tissues). We identified 286 unique integration sites (UISs) with precise HBV–Human DNA junctions. We identified a higher number of HBV integration events in cancer adjacent tissues than in HCC tissues, suggesting a clonal expansion process during HCC development. We also found that fibronectin and its related genes (fibronectin type III-like fold domain containing genes) were frequently targeted by HBV. Fibronectin is a protein produced abundantly by the liver cells and also serves as a linker in the extracellular matrix. Our findings might suggest a role for the disruption of fibronectin and associated cellular matrix in HBV related liver cancers. We also identified 14 additional recurrent HBV target genes, greatly expanding the HBV recurrent target list. This study would add significantly to our understanding of HCC development.
PMCID: PMC3516541  PMID: 23236287
23.  Divergent Genomic and Epigenomic Landscapes of Lung Cancer Subtypes Underscore the Selection of Different Oncogenic Pathways during Tumor Development 
PLoS ONE  2012;7(5):e37775.
For therapeutic purposes, non-small cell lung cancer (NSCLC) has traditionally been regarded as a single disease. However, recent evidence suggest that the two major subtypes of NSCLC, adenocarcinoma (AC) and squamous cell carcinoma (SqCC) respond differently to both molecular targeted and new generation chemotherapies. Therefore, identifying the molecular differences between these tumor types may impact novel treatment strategy. We performed the first large-scale analysis of 261 primary NSCLC tumors (169 AC and 92 SqCC), integrating genome-wide DNA copy number, methylation and gene expression profiles to identify subtype-specific molecular alterations relevant to new agent design and choice of therapy. Comparison of AC and SqCC genomic and epigenomic landscapes revealed 778 altered genes with corresponding expression changes that are selected during tumor development in a subtype-specific manner. Analysis of >200 additional NSCLCs confirmed that these genes are responsible for driving the differential development and resulting phenotypes of AC and SqCC. Importantly, we identified key oncogenic pathways disrupted in each subtype that likely serve as the basis for their differential tumor biology and clinical outcomes. Downregulation of HNF4α target genes was the most common pathway specific to AC, while SqCC demonstrated disruption of numerous histone modifying enzymes as well as the transcription factor E2F1. In silico screening of candidate therapeutic compounds using subtype-specific pathway components identified HDAC and PI3K inhibitors as potential treatments tailored to lung SqCC. Together, our findings suggest that AC and SqCC develop through distinct pathogenetic pathways that have significant implication in our approach to the clinical management of NSCLC.
PMCID: PMC3357406  PMID: 22629454
24.  Discovering transcription factor regulatory targets using gene expression and binding data 
Bioinformatics  2011;28(2):206-213.
Motivation: Identifying the target genes regulated by transcription factors (TFs) is the most basic step in understanding gene regulation. Recent advances in high-throughput sequencing technology, together with chromatin immunoprecipitation (ChIP), enable mapping TF binding sites genome wide, but it is not possible to infer function from binding alone. This is especially true in mammalian systems, where regulation often occurs through long-range enhancers in gene-rich neighborhoods, rather than proximal promoters, preventing straightforward assignment of a binding site to a target gene.
Results: We present EMBER (Expectation Maximization of Binding and Expression pRofiles), a method that integrates high-throughput binding data (e.g. ChIP-chip or ChIP-seq) with gene expression data (e.g. DNA microarray) via an unsupervised machine learning algorithm for inferring the gene targets of sets of TF binding sites. Genes selected are those that match overrepresented expression patterns, which can be used to provide information about multiple TF regulatory modes. We apply the method to genome-wide human breast cancer data and demonstrate that EMBER confirms a role for the TFs estrogen receptor alpha, retinoic acid receptors alpha and gamma in breast cancer development, whereas the conventional approach of assigning regulatory targets based on proximity does not. Additionally, we compare several predicted target genes from EMBER to interactions inferred previously, examine combinatorial effects of TFs on gene regulation and illustrate the ability of EMBER to discover multiple modes of regulation.
Availability: All code used for this work is available at
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3259433  PMID: 22084256
25.  The DNA Methylome of Human Peripheral Blood Mononuclear Cells 
PLoS Biology  2010;8(11):e1000533.
Analysis across the genome of patterns of DNA methylation reveals a rich landscape of allele-specific epigenetic modification and consequent effects on allele-specific gene expression.
DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies.
Author Summary
Epigenetic modifications such as addition of methyl groups to cytosine in DNA play a role in regulating gene expression. To better understand these processes, knowledge of the methylation status of all cytosine bases in the genome (the methylome) is required. DNA methylation can differ between the two gene copies (alleles) in each cell. Such allele-specific methylation (ASM) can be due to parental origin of the alleles (imprinting), X chromosome inactivation in females, and other as yet unknown mechanisms. This may significantly alter the expression profile arising from different allele combinations in different individuals. Using advanced sequencing technology, we have determined the methylome of human peripheral blood mononuclear cells (PBMC). Importantly, the PBMC were obtained from the same male Han Chinese individual whose complete genome had previously been determined. This allowed us, for the first time, to study genome-wide differences in ASM. Our analysis shows that ASM in PBMC is higher than can be accounted for by regions known to undergo parent-of-origin imprinting and frequently (>80%) correlates with allele-specific expression (ASE) of the corresponding gene. In addition, our data reveal a rich landscape of epigenomic variation for 20 genomic features, including regulatory, coding, and non-coding sequences, and provide a valuable resource for future studies. Our work further establishes whole-genome sequencing as an efficient method for methylome analysis.
PMCID: PMC2976721  PMID: 21085693

