Allele-specific (AS) assessment of chromatin has the potential to elucidate specific cis-regulatory mechanisms, which are predicted to underlie the majority of the known genetic associations to complex disease. However, development of chromatin landscapes at allelic resolution has been challenging since sites of variable signal strength require substantial read depths not commonly applied in sequencing based approaches. In this study, we addressed this by performing parallel analyses of input DNA and chromatin immunoprecipitates (ChIP) on high-density Illumina genotyping arrays. Allele-specificity for the histone modifications H3K4me1, H3K4me3, H3K27ac, H3K27me3, and H3K36me3 was assessed using ChIP samples generated from 14 lymphoblast and 6 fibroblast cell lines. AS-ChIP SNPs were combined into domains and validated using high-confidence ChIP-seq sites. We observed characteristic patterns of allelic-imbalance for each histone-modification around allele-specifically expressed transcripts. Notably, we found H3K4me1 to be significantly anti-correlated with allelic expression (AE) at transcription start sites, indicating H3K4me1 allelic imbalance as a marker of AE. We also found that allelic chromatin domains exhibit population and cell-type specificity as well as heritability within trios. Finally, we observed that a subset of allelic chromatin domains is regulated by DNase I-sensitive quantitative trait loci and that these domains are significantly enriched for genome-wide association studies hits, with autoimmune disease associated SNPs specifically enriched in lymphoblasts. This study provides the first genome-wide maps of allelic-imbalance for five histone marks. Our results provide new insights into the role of chromatin in cis-regulation and highlight the need for high-depth sequencing in ChIP-seq studies along with the need to improve allele-specificity of ChIP-enrichment.
histone modifications; allele-specific; chromatin immunoprecipitation (ChIP); fibroblasts; lymphoblasts; cis-regulatory; genotyping arrays
Large-scale epigenome mapping by the NIH Roadmap Epigenomics Project, the ENCODE Consortium and the International Human Epigenome Consortium (IHEC) produces genome-wide DNA methylation data at one base-pair resolution. We examine how such data can be made open-access while balancing appropriate interpretation and genomic privacy. We propose guidelines for data release that both reduce ambiguity in the interpretation of open-access data and limit immediate access to genetic variation data that are made available through controlled access.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-015-0723-0) contains supplementary material, which is available to authorized users.
Anaplastic oligodendroglioma (AO) are rare primary brain tumours that are generally incurable, with heterogeneous prognosis and few treatment targets identified. Most oligodendrogliomas have chromosomes 1p/19q co-deletion and an IDH mutation. Here we analysed 51 AO by whole-exome sequencing, identifying previously reported frequent somatic mutations in CIC and FUBP1. We also identified recurrent mutations in TCF12 and in an additional series of 83 AO. Overall, 7.5% of AO are mutated for TCF12, which encodes an oligodendrocyte-related transcription factor. Eighty percent of TCF12 mutations identified were in either the bHLH domain, which is important for TCF12 function as a transcription factor, or were frameshift mutations leading to TCF12 truncated for this domain. We show that these mutations compromise TCF12 transcriptional activity and are associated with a more aggressive tumour type. Our analysis provides further insights into the unique and shared pathways driving AO.
Anaplastic oligodendrogliomas are rare and incurable primary brain tumours with few treatment options. Here Labreche et al. perform whole-exome sequencing and identify recurring mutations in transcription factor TCF12, which are associated with aggressive tumours.
We coupled two strategies – trait extremes and genome-wide pooling – to discover a novel BP locus that encodes a previously uncharacterized thiamine transporter.
Hypertension is a heritable trait that remains the most potent and widespread cardiovascular risk factor, though details of its genetic determination are poorly understood.
Representative genomic DNA pools were created from male and female subjects in the highest and lowest 5th %iles of BP in a primary care population of >50,000 individuals. The peak associated SNPs were typed in individual DNA samples, as well as twins/siblings phenotyped for cardiovascular and autonomic traits. Biochemical properties of the associated transporter were evaluated in cellular assays.
After chip hybridization and calculation of relative allele scores, the peak associations were typed in individual samples, revealing association of hypertension, SBP, and DBP to the previously uncharacterized solute carrier SLC35F3. The BP genetic association at SLC35F3 was validated by meta-analysis in an independent sample from the original source population, as well as the ICBP (across North America and Western Europe). Sequence homology to a putative yeast thiamine (vitamin B1) transporter prompted us to express human SLC35F3 in E. coli, which catalyzed [3H]-thiamine uptake. SLC35F3 risk allele (T/T) homozygotes displayed decreased erythrocyte thiamine content on microbiological assay. In twin pairs, the SLC35F3 risk allele predicted heritable cardiovascular traits previously associated with thiamine deficiency, including elevated cardiac stroke volume with decreased vascular resistance, and elevated pressor responses to environmental (cold) stress. Allelic expression imbalance (AEI) confirmed that cis-variation at the human SLC35F3 locus influenced expression of that gene, and the AEI peak coincided with the hypertension peak.
Novel strategies were coupled to position a new hypertension susceptibility locus, uncovering a previously unsuspected thiamine transporter whose genetic variants predicted several disturbances in cardiac and autonomic function. The results have implications for the pathogenesis and treatment of systemic hypertension.
SLC35F3; thiamine; transporter; hypertension
In this letter to the editor, we respond to the recent publication by Philibert et al. Methylation array data can simultaneously identify individuals and convey protected health information: an unrecognized ethical concern (Clinical Epigenetics 2014, 6:28). Further discussion of the issues raised by the risk of re-identification of epigenetic methylation data is needed, and a more nuanced approach should be taken with respect to its implications for data sharing policy than the one provided.
Privacy; Data sharing; Epigenome; Policy
The interplay between genetic and epigenetic variation is only partially understood. One form of epigenetic variation is methylation at CpG sites, which can be measured as methylation quantitative trait loci (meQTL). Here we report that in a panel of lymphocytes from 1,748 individuals, methylation levels at 1,919 CpG sites are correlated with at least one distal (trans) single-nucleotide polymorphism (SNP) (P<3.2 × 10−13; FDR<5%). These trans-meQTLs include 1,657 SNP–CpG pairs from different chromosomes and 262 pairs from the same chromosome that are >1 Mb apart. Over 90% of these pairs are replicated (FDR<5%) in at least one of two independent data sets. Genomic loci harbouring trans-meQTLs are significantly enriched (P<0.001) for long non-coding transcripts (2.2-fold), known epigenetic regulators (2.3-fold), piwi-interacting RNA clusters (3.6-fold) and curated transcription factors (4.1-fold), including zinc-finger proteins (8.75-fold). Long-range epigenetic networks uncovered by this approach may be relevant to normal and disease states.
There is a functional link between SNPs and epigenetic variations when they are in close range, but the long-range effect is unclear. Here, by analysing methylation quantitative trait loci, the authors demonstrate that methylation levels at CpG sites in lymphocytes are correlated with distal SNPs.
Genome-wide demethylation and remethylation of DNA during early embryogenesis is essential for development. Imprinted germline differentially methylated domains (gDMDs) established by sex-specific methylation in either male or female germ cells, must escape these dynamic changes and sustain precise inheritance of both methylated and unmethylated parental alleles. To identify other, gDMD-like sequences with the same epigenetic inheritance properties, we used a modified embryonic stem (ES) cell line that emulates the early embryonic demethylation and remethylation waves. Transient DNMT1 suppression revealed gDMD-like sequences requiring continuous DNMT1 activity to sustain a highly methylated state. Remethylation of these sequences was also compromised in vivo in a mouse model of transient DNMT1 loss in the preimplantation embryo. These novel regions, possessing heritable epigenetic features similar to imprinted-gDMDs are required for normal physiological and developmental processes and when disrupted are associated with disorders such as cancer and autism spectrum disorders. This study presents new perspectives on DNA methylation heritability during early embryo development that extend beyond conventional imprinted-gDMDs.
Most complex disease-associated genetic variants are located in non-coding regions and are
therefore thought to be regulatory in nature. Association mapping of differential allelic expression
(AE) is a powerful method to identify SNPs with direct cis-regulatory impact
(cis-rSNPs). We used AE mapping to identify cis-rSNPs regulating
gene expression in 55 and 63 HapMap lymphoblastoid cell lines from a Caucasian and an African
population, respectively, 70 fibroblast cell lines, and 188 purified monocyte samples and found
40–60% of these cis-rSNPs to be shared across cell types. We uncover
a new class of cis-rSNPs, which disrupt footprint-derived de novo
motifs that are predominantly bound by repressive factors and are implicated in disease
susceptibility through overlaps with GWAS SNPs. Finally, we provide the proof-of-principle for a new
approach for genome-wide functional validation of transcription factor–SNP interactions. By
perturbing NFκB action in lymphoblasts, we identified 489 cis-regulated
transcripts with altered AE after NFκB perturbation. Altogether, we perform a comprehensive
analysis of cis-variation in four cell populations and provide new tools for the
identification of functional variants associated to complex diseases.
allelic expression; cis-rSNPs; complex disease; NFκB; repressor
African-American (AA) women have earlier menarche on average than women of European ancestry (EA), and earlier menarche is a risk factor for obesity and type 2 diabetes among other chronic diseases. Identification of common genetic variants associated with age at menarche has a potential value in pointing to the genetic pathways underlying chronic disease risk, yet comprehensive genome-wide studies of age at menarche are lacking for AA women. In this study, we tested the genome-wide association of self-reported age at menarche with common single-nucleotide polymorphisms (SNPs) in a total of 18 089 AA women in 15 studies using an additive genetic linear regression model, adjusting for year of birth and population stratification, followed by inverse-variance weighted meta-analysis (Stage 1). Top meta-analysis results were then tested in an independent sample of 2850 women (Stage 2). First, while no SNP passed the pre-specified P < 5 × 10−8 threshold for significance in Stage 1, suggestive associations were found for variants near FLRT2 and PIK3R1, and conditional analysis identified two independent SNPs (rs339978 and rs980000) in or near RORA, strengthening the support for this suggestive locus identified in EA women. Secondly, an investigation of SNPs in 42 previously identified menarche loci in EA women demonstrated that 25 (60%) of them contained variants significantly associated with menarche in AA women. The findings provide the first evidence of cross-ethnic generalization of menarche loci identified to date, and suggest a number of novel biological links to menarche timing in AA women.
We applied genome-wide allele-specific expression analysis of monocytes from 188 samples. Monocytes were purified from white blood cells of healthy blood donors to detect cis-acting genetic variation that regulates the expression of long non-coding RNAs. We analysed 8929 regions harboring genes for potential long non-coding RNA that were retrieved from data from the ENCODE project. Of these regions, 60% were annotated as intergenic, which implies that they do not overlap with protein-coding genes. Focusing on the intergenic regions, and using stringent analysis of the allele-specific expression data, we detected robust cis-regulatory SNPs in 258 out of 489 informative intergenic regions included in the analysis. The cis-regulatory SNPs that were significantly associated with allele-specific expression of long non-coding RNAs were enriched to enhancer regions marked for active or bivalent, poised chromatin by histone modifications. Out of the lncRNA regions regulated by cis-acting regulatory SNPs, 20% (n = 52) were co-regulated with the closest protein coding gene. We compared the identified cis-regulatory SNPs with those in the catalog of SNPs identified by genome-wide association studies of human diseases and traits. This comparison identified 32 SNPs in loci from genome-wide association studies that displayed a strong association signal with allele-specific expression of non-coding RNAs in monocytes, with p-values ranging from 6.7×10−7 to 9.5×10−89. The identified cis-regulatory SNPs are associated with diseases of the immune system, like multiple sclerosis and rheumatoid arthritis.
DNA methylation plays an essential role in the regulation of gene expression. While its presence near the transcription start site of a gene has been associated with reduced expression, the variation in methylation levels across individuals, its environmental or genetic causes, and its association with gene expression remain poorly understood.
We report the joint analysis of sequence variants, gene expression and DNA methylation in primary fibroblast samples derived from a set of 62 unrelated individuals. Approximately 2% of the most variable CpG sites are mappable in cis to sequence variation, usually within 5 kb. Via eQTL analysis with microarray data combined with mapping of allelic expression regions, we obtained a set of 2,770 regions mappable in cis to sequence variation. In 9.5% of these expressed regions, an associated SNP was also a methylation QTL. Methylation and gene expression are often correlated without direct discernible involvement of sequence variation, but not always in the expected direction of negative for promoter CpGs and positive for gene-body CpGs. Population-level correlation between methylation and expression is strongest in a subset of developmentally significant genes, including all four HOX clusters. The presence and sign of this correlation are best predicted using specific chromatin marks rather than position of the CpG site with respect to the gene.
Our results indicate a wide variety of relationships between gene expression, DNA methylation and sequence variation in untransformed adult human fibroblasts, with considerable involvement of chromatin features and some discernible involvement of sequence variation.
The search for expression quantitative trait loci (eQTL) has traditionally centered entirely on the process of transcription, whereas variants with effects on mRNA translation have not been systematically studied. Here we present a high throughput approach for measuring translational cis-regulation in the human genome. Using ribosomal association as proxy for translational efficiency of polymorphic mRNAs, we test the ratio of polysomal/nonpolysomal mRNA level as a quantitative trait for association with single-nucleotide polymorphisms on the same mRNA transcript. We identify one important ribosomal-distribution effect, from rs1131017 in the 5’UTR of RPS26 , that is in high linkage disequilibrium (LD) with the 12q13 locus for susceptibility to type 1 diabetes. The effect on translation is confirmed at the protein level by quantitative Western blots, both ex vivo and after in vitro translation. Our results are a proof-of-principle that allelic effects on translation can be detected at a transcriptome-wide scale.
X-chromosome inactivation (XCI) results in the silencing of most genes on one X chromosome, yielding mono-allelic expression in individual cells. However, random XCI results in expression of both alleles in most females. Allelic imbalances have been used genome-wide to detect mono-allelically expressed genes. Analysis of X-linked allelic imbalance in females with skewed XCI offers the opportunity to identify genes that escape XCI with bi-allelic expression in contrast to those with mono-allelic expression and which are therefore subject to XCI.
We determine XCI status for 409 genes, all of which have at least five informative females in our dataset. The majority of genes are subject to XCI and genes that escape from XCI show a continuum of expression from the inactive X. Inactive X expression corresponds to differences in the level of histone modification detected by allelic imbalance after chromatin immunoprecipitation. Differences in XCI between populations and between cell lines derived from different tissues are observed.
We demonstrate that allelic imbalance can be used to determine an inactivation status for X-linked genes, even without completely non-random XCI. There is a range of expression from the inactive X. Genes escaping XCI, including those that do so in only a subset of females, cluster together, demonstrating that XCI and location on the X chromosome are related. In addition to revealing mechanisms involved in cis-gene regulation, determining which genes escape XCI can expand our understanding of the contributions of X-linked genes to sexual dimorphism.
Sexual dimorphism in various bone phenotypes, including bone mineral density (BMD), is widely observed; however the extent to which genes explain these sex differences is unclear. To identify variants with different effects by sex, we examined gene-by-sex autosomal interactions genome-wide, and performed eQTL analysis and bioinformatics network analysis.
We conducted an autosomal genome-wide meta-analysis of gene-by-sex interaction on lumbar spine (LS-) and femoral neck (FN-) BMD, in 25,353 individuals from eight cohorts. In a second stage, we followed up the 12 top SNPs (P<1×10−5) in an additional set of 24,763 individuals. Gene-by-sex interaction and sex-specific effects were examined in these 12 SNPs.
We detected one novel genome-wide significant interaction associated with LS-BMD at the Chr3p26.1-p25.1 locus, near the GRM7 gene (male effect = 0.02 & p-value = 3.0×10−5; female effect = −0.007 & p-value=3.3×10−2) and eleven suggestive loci associated with either FN- or LS-BMD in discovery cohorts. However, there was no evidence for genome-wide significant (P<5×10−8) gene-by-sex interaction in the joint analysis of discovery and replication cohorts.
Despite the large collaborative effort, no genome-wide significant evidence for gene-by-sex interaction was found influencing BMD variation in this screen of autosomal markers. If they exist, gene-by-sex interactions for BMD probably have weak effects, accounting for less than 0.08% of the variation in these traits per implicated SNP.
gene-by-sex; interaction; BMD; association; aging
Although aberrant DNA methylation has been observed previously in acute lymphoblastic leukemia (ALL), the patterns of differential methylation have not been comprehensively determined in all subtypes of ALL on a genome-wide scale. The relationship between DNA methylation, cytogenetic background, drug resistance and relapse in ALL is poorly understood.
We surveyed the DNA methylation levels of 435,941 CpG sites in samples from 764 children at diagnosis of ALL and from 27 children at relapse. This survey uncovered four characteristic methylation signatures. First, compared with control blood cells, the methylomes of ALL cells shared 9,406 predominantly hypermethylated CpG sites, independent of cytogenetic background. Second, each cytogenetic subtype of ALL displayed a unique set of hyper- and hypomethylated CpG sites. The CpG sites that constituted these two signatures differed in their functional genomic enrichment to regions with marks of active or repressed chromatin. Third, we identified subtype-specific differential methylation in promoter and enhancer regions that were strongly correlated with gene expression. Fourth, a set of 6,612 CpG sites was predominantly hypermethylated in ALL cells at relapse, compared with matched samples at diagnosis. Analysis of relapse-free survival identified CpG sites with subtype-specific differential methylation that divided the patients into different risk groups, depending on their methylation status.
Our results suggest an important biological role for DNA methylation in the differences between ALL subtypes and in their clinical outcome after treatment.
While regulatory programs are extensively studied at the level of transcription, elements that are involved in regulation of post-transcriptional processes are largely unknown, and methods for systematic identification of these elements are in early stages. Here, using a novel computational framework, we have integrated sequence information with several functional genomics data sets to characterize conserved regulatory programs of trypanosomatids, a group of eukaryotes that almost entirely rely on post-transcriptional processes for regulation of mRNA abundance. This analysis revealed a complex network of linear and structural RNA elements that potentially govern mRNA abundance across different life stages and environmental conditions. Furthermore, we show that the conserved regulatory network that we have identified is responsive to chemical perturbation of several biological functions in trypanosomatids. We have further characterized one of the most abundant regulatory RNA elements that we discovered, an AU-rich element (ARE) that can be found in 3′ untranslated region of many trypanosomatid genes. Using bioinformatics approaches as well as in vitro and in vivo experiments, we have identified three ELAV-like homologs, including the developmentally critical protein TbRBP6, which regulate abundance of a large number of trypanosomatid ARE-containing transcripts. Together, these studies lay out a roadmap for characterization of mechanisms that modulate development and metabolic pathways in trypanosomatids.
BRCA1-associated breast and ovarian cancer risks can be modified by common genetic variants. To identify further cancer risk-modifying loci, we performed a multi-stage GWAS of 11,705 BRCA1 carriers (of whom 5,920 were diagnosed with breast and 1,839 were diagnosed with ovarian cancer), with a further replication in an additional sample of 2,646 BRCA1 carriers. We identified a novel breast cancer risk modifier locus at 1q32 for BRCA1 carriers (rs2290854, P = 2.7×10−8, HR = 1.14, 95% CI: 1.09–1.20). In addition, we identified two novel ovarian cancer risk modifier loci: 17q21.31 (rs17631303, P = 1.4×10−8, HR = 1.27, 95% CI: 1.17–1.38) and 4q32.3 (rs4691139, P = 3.4×10−8, HR = 1.20, 95% CI: 1.17–1.38). The 4q32.3 locus was not associated with ovarian cancer risk in the general population or BRCA2 carriers, suggesting a BRCA1-specific association. The 17q21.31 locus was also associated with ovarian cancer risk in 8,211 BRCA2 carriers (P = 2×10−4). These loci may lead to an improved understanding of the etiology of breast and ovarian tumors in BRCA1 carriers. Based on the joint distribution of the known BRCA1 breast cancer risk-modifying loci, we estimated that the breast cancer lifetime risks for the 5% of BRCA1 carriers at lowest risk are 28%–50% compared to 81%–100% for the 5% at highest risk. Similarly, based on the known ovarian cancer risk-modifying loci, the 5% of BRCA1 carriers at lowest risk have an estimated lifetime risk of developing ovarian cancer of 28% or lower, whereas the 5% at highest risk will have a risk of 63% or higher. Such differences in risk may have important implications for risk prediction and clinical management for BRCA1 carriers.
BRCA1 mutation carriers have increased and variable risks of breast and ovarian cancer. To identify modifiers of breast and ovarian cancer risk in this population, a multi-stage GWAS of 14,351 BRCA1 mutation carriers was performed. Loci 1q32 and TCF7L2 at 10q25.3 were associated with breast cancer risk, and two loci at 4q32.2 and 17q21.31 were associated with ovarian cancer risk. The 4q32.3 ovarian cancer locus was not associated with ovarian cancer risk in the general population or in BRCA2 carriers and is the first indication of a BRCA1-specific risk locus for either breast or ovarian cancer. Furthermore, modeling the influence of these modifiers on cumulative risk of breast and ovarian cancer in BRCA1 mutation carriers for the first time showed that a wide range of individual absolute risks of each cancer can be estimated. These differences suggest that genetic risk modifiers may be incorporated into the clinical management of BRCA1 mutation carriers.
A large number of genome-wide association studies have been performed during the past five years to identify associations between SNPs and human complex diseases and traits. The assignment of a functional role for the identified disease-associated SNP is not straight-forward. Genome-wide expression quantitative trait locus (eQTL) analysis is frequently used as the initial step to define a function while allele-specific gene expression (ASE) analysis has not yet gained a wide-spread use in disease mapping studies. We compared the power to identify cis-acting regulatory SNPs (cis-rSNPs) by genome-wide allele-specific gene expression (ASE) analysis with that of traditional expression quantitative trait locus (eQTL) mapping. Our study included 395 healthy blood donors for whom global gene expression profiles in circulating monocytes were determined by Illumina BeadArrays. ASE was assessed in a subset of these monocytes from 188 donors by quantitative genotyping of mRNA using a genome-wide panel of SNP markers. The performance of the two methods for detecting cis-rSNPs was evaluated by comparing associations between SNP genotypes and gene expression levels in sample sets of varying size. We found that up to 8-fold more samples are required for eQTL mapping to reach the same statistical power as that obtained by ASE analysis for the same rSNPs. The performance of ASE is insensitive to SNPs with low minor allele frequencies and detects a larger number of significantly associated rSNPs using the same sample size as eQTL mapping. An unequivocal conclusion from our comparison is that ASE analysis is more sensitive for detecting cis-rSNPs than standard eQTL mapping. Our study shows the potential of ASE mapping in tissue samples and primary cells which are difficult to obtain in large numbers.
A report on the 12th International Congress of Human Genetics, joint with the 61st annual American Society of Human Genetics conference, Montreal, Quebec, 11-15 October 2011.
Bone mineral density (BMD) is the most important predictor of fracture risk. We performed the largest meta-analysis to date on lumbar spine and femoral neck BMD, including 17 genome-wide association studies and 32,961 individuals of European and East Asian ancestry. We tested the top-associated BMD markers for replication in 50,933 independent subjects and for risk of low-trauma fracture in 31,016 cases and 102,444 controls. We identified 56 loci (32 novel)associated with BMD atgenome-wide significant level (P<5×10−8). Several of these factors cluster within the RANK-RANKL-OPG, mesenchymal-stem-cell differentiation, endochondral ossification and the Wnt signalling pathways. However, we also discovered loci containing genes not known to play a role in bone biology. Fourteen BMD loci were also associated with fracture risk (P<5×10−4, Bonferroni corrected), of which six reached P<5×10−8 including: 18p11.21 (C18orf19), 7q21.3 (SLC25A13), 11q13.2 (LRP5), 4q22.1 (MEPE), 2p16.2 (SPTBN1) and 10q21.1 (DKK1). These findings shed light on the genetic architecture and pathophysiological mechanisms underlying BMD variation and fracture susceptibility.
Identifying and understanding the impact of gene regulatory variation is of considerable importance in evolutionary and medical genetics; such variants are thought to be responsible for human-specific adaptation  and to have an important role in genetic disease. Regulatory variation in cis is readily detected in individuals showing uneven expression of a transcript from its two allelic copies, an observation referred to as allelic imbalance (AI). Identifying individuals exhibiting AI allows mapping of regulatory DNA regions and the potential to identify the underlying causal genetic variant(s). However, existing mapping methods require knowledge of the haplotypes, which make them sensitive to phasing errors. In this study, we introduce a genotype-based mapping test that does not require haplotype-phase inference to locate regulatory regions. The test relies on partitioning genotypes of individuals exhibiting AI and those not expressing AI in a 2×3 contingency table. The performance of this test to detect linkage disequilibrium (LD) between a potential regulatory site and a SNP located in this region was examined by analyzing the simulated and the empirical AI datasets. In simulation experiments, the genotype-based test outperforms the haplotype-based tests with the increasing distance separating the regulatory region from its regulated transcript. The genotype-based test performed equally well with the experimental AI datasets, either from genome–wide cDNA hybridization arrays or from RNA sequencing. By avoiding the need of haplotype inference, the genotype-based test will suit AI analyses in population samples of unknown haplotype structure and will additionally facilitate the identification of cis-regulatory variants that are located far away from the regulated transcript.
Phenotypic variation results from variation in gene expression, which is modulated by genetic and/or epigenetic factors. To understand the molecular basis of human disease, interaction between genetic and epigenetic factors needs to be taken into account. The asthma-associated region 17q12-q21 harbors three genes, the zona pellucida binding protein 2 (ZPBP2), gasdermin B (GSDMB) and ORM1-like 3 (ORMDL3), that show allele-specific differences in expression levels in lymphoblastoid cell lines (LCLs) and CD4+ T cells. Here, we report a molecular dissection of allele-specific transcriptional regulation of the genes within the chromosomal region 17q12-q21 combining in vitro transfection, formaldehyde-assisted isolation of regulatory elements, chromatin immunoprecipitation and DNA methylation assays in LCLs. We found that a single nucleotide polymorphism rs4795397 influences the activity of ZPBP2 promoter in vitro in an allele-dependent fashion, and also leads to nucleosome repositioning on the asthma-associated allele. However, variable methylation of exon 1 of ZPBP2 masks the strong genetic effect on ZPBP2 promoter activity in LCLs. In contrast, the ORMDL3 promoter is fully unmethylated, which allows detection of genetic effects on its transcription. We conclude that the cis-regulatory effects on 17q12-q21 gene expression result from interaction between several regulatory polymorphisms and epigenetic factors within the cis-regulatory haplotype region.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-012-1142-x) contains supplementary material, which is available to authorized users.
Genome-wide association studies of human gene expression promise to identify functional regulatory genetic variation that contributes to phenotypic diversity. However, it is unclear how useful this approach will be for the identification of disease-susceptibility variants. We generated gene expression profiles for 22 184 mRNA transcripts using RNA derived from peripheral blood CD4+ lymphocytes, and genome-wide genotype data for 516 512 autosomal markers in 200 subjects. We screened for cis-acting variants by testing variants mapping within 50 kb of expressed transcripts for association with transcript abundance using generalized linear models. Significant associations were identified for 1585 genes at a false discovery rate of 0.05 (corresponding to P-values ranging from 1 × 10−91 to 7 × 10−4). Importantly, we identified evidence of regulatory variation for 119 previously mapped disease genes, including 24 examples where the variant with the strongest evidence of disease-association demonstrates strong association with specific transcript abundance. The prevalence of cis-acting variants among disease-associated genes was 63% higher than the genome-wide rate in our data set (P = 6.41 × 10−6), and although many of the implicated loci were associated with immune-related diseases (including asthma, connective tissue disorders and inflammatory bowel disease), associations with genes implicated in non-immune-related diseases including lipid profiles, anthropomorphic measurements, cancer and neurologic disease were also observed. Genetic variants that confer inter-individual differences in gene expression represent an important subset of variants that contribute to disease susceptibility. Population-based integrative genetic approaches can help identify such variation and enhance our understanding of the genetic basis of complex traits.
The commercialization of academic research has been promoted by North American policy makers for over 30 years as a means of increasing university financing and to ensure that promising research would eventually find its way to the marketplace. The following issues paper constitutes a reflection on the impact of the Canadian commercialization framework on academic research in the field of genomics. It was written following two workshops and two independent studies organized by academic groups in Quebec (Centre of Genomics and Policy) and Alberta (Health Law Institute). The full sets of recommendations are available upon request to the authors.