Most complex disease-associated genetic variants are located in non-coding regions and are
therefore thought to be regulatory in nature. Association mapping of differential allelic expression
(AE) is a powerful method to identify SNPs with direct cis-regulatory impact
(cis-rSNPs). We used AE mapping to identify cis-rSNPs regulating
gene expression in 55 and 63 HapMap lymphoblastoid cell lines from a Caucasian and an African
population, respectively, 70 fibroblast cell lines, and 188 purified monocyte samples and found
40–60% of these cis-rSNPs to be shared across cell types. We uncover
a new class of cis-rSNPs, which disrupt footprint-derived de novo
motifs that are predominantly bound by repressive factors and are implicated in disease
susceptibility through overlaps with GWAS SNPs. Finally, we provide the proof-of-principle for a new
approach for genome-wide functional validation of transcription factor–SNP interactions. By
perturbing NFκB action in lymphoblasts, we identified 489 cis-regulated
transcripts with altered AE after NFκB perturbation. Altogether, we perform a comprehensive
analysis of cis-variation in four cell populations and provide new tools for the
identification of functional variants associated to complex diseases.
allelic expression; cis-rSNPs; complex disease; NFκB; repressor
African-American (AA) women have earlier menarche on average than women of European ancestry (EA), and earlier menarche is a risk factor for obesity and type 2 diabetes among other chronic diseases. Identification of common genetic variants associated with age at menarche has a potential value in pointing to the genetic pathways underlying chronic disease risk, yet comprehensive genome-wide studies of age at menarche are lacking for AA women. In this study, we tested the genome-wide association of self-reported age at menarche with common single-nucleotide polymorphisms (SNPs) in a total of 18 089 AA women in 15 studies using an additive genetic linear regression model, adjusting for year of birth and population stratification, followed by inverse-variance weighted meta-analysis (Stage 1). Top meta-analysis results were then tested in an independent sample of 2850 women (Stage 2). First, while no SNP passed the pre-specified P < 5 × 10−8 threshold for significance in Stage 1, suggestive associations were found for variants near FLRT2 and PIK3R1, and conditional analysis identified two independent SNPs (rs339978 and rs980000) in or near RORA, strengthening the support for this suggestive locus identified in EA women. Secondly, an investigation of SNPs in 42 previously identified menarche loci in EA women demonstrated that 25 (60%) of them contained variants significantly associated with menarche in AA women. The findings provide the first evidence of cross-ethnic generalization of menarche loci identified to date, and suggest a number of novel biological links to menarche timing in AA women.
We applied genome-wide allele-specific expression analysis of monocytes from 188 samples. Monocytes were purified from white blood cells of healthy blood donors to detect cis-acting genetic variation that regulates the expression of long non-coding RNAs. We analysed 8929 regions harboring genes for potential long non-coding RNA that were retrieved from data from the ENCODE project. Of these regions, 60% were annotated as intergenic, which implies that they do not overlap with protein-coding genes. Focusing on the intergenic regions, and using stringent analysis of the allele-specific expression data, we detected robust cis-regulatory SNPs in 258 out of 489 informative intergenic regions included in the analysis. The cis-regulatory SNPs that were significantly associated with allele-specific expression of long non-coding RNAs were enriched to enhancer regions marked for active or bivalent, poised chromatin by histone modifications. Out of the lncRNA regions regulated by cis-acting regulatory SNPs, 20% (n = 52) were co-regulated with the closest protein coding gene. We compared the identified cis-regulatory SNPs with those in the catalog of SNPs identified by genome-wide association studies of human diseases and traits. This comparison identified 32 SNPs in loci from genome-wide association studies that displayed a strong association signal with allele-specific expression of non-coding RNAs in monocytes, with p-values ranging from 6.7×10−7 to 9.5×10−89. The identified cis-regulatory SNPs are associated with diseases of the immune system, like multiple sclerosis and rheumatoid arthritis.
DNA methylation plays an essential role in the regulation of gene expression. While its presence near the transcription start site of a gene has been associated with reduced expression, the variation in methylation levels across individuals, its environmental or genetic causes, and its association with gene expression remain poorly understood.
We report the joint analysis of sequence variants, gene expression and DNA methylation in primary fibroblast samples derived from a set of 62 unrelated individuals. Approximately 2% of the most variable CpG sites are mappable in cis to sequence variation, usually within 5 kb. Via eQTL analysis with microarray data combined with mapping of allelic expression regions, we obtained a set of 2,770 regions mappable in cis to sequence variation. In 9.5% of these expressed regions, an associated SNP was also a methylation QTL. Methylation and gene expression are often correlated without direct discernible involvement of sequence variation, but not always in the expected direction of negative for promoter CpGs and positive for gene-body CpGs. Population-level correlation between methylation and expression is strongest in a subset of developmentally significant genes, including all four HOX clusters. The presence and sign of this correlation are best predicted using specific chromatin marks rather than position of the CpG site with respect to the gene.
Our results indicate a wide variety of relationships between gene expression, DNA methylation and sequence variation in untransformed adult human fibroblasts, with considerable involvement of chromatin features and some discernible involvement of sequence variation.
The search for expression quantitative trait loci (eQTL) has traditionally centered entirely on the process of transcription, whereas variants with effects on mRNA translation have not been systematically studied. Here we present a high throughput approach for measuring translational cis-regulation in the human genome. Using ribosomal association as proxy for translational efficiency of polymorphic mRNAs, we test the ratio of polysomal/nonpolysomal mRNA level as a quantitative trait for association with single-nucleotide polymorphisms on the same mRNA transcript. We identify one important ribosomal-distribution effect, from rs1131017 in the 5’UTR of RPS26 , that is in high linkage disequilibrium (LD) with the 12q13 locus for susceptibility to type 1 diabetes. The effect on translation is confirmed at the protein level by quantitative Western blots, both ex vivo and after in vitro translation. Our results are a proof-of-principle that allelic effects on translation can be detected at a transcriptome-wide scale.
X-chromosome inactivation (XCI) results in the silencing of most genes on one X chromosome, yielding mono-allelic expression in individual cells. However, random XCI results in expression of both alleles in most females. Allelic imbalances have been used genome-wide to detect mono-allelically expressed genes. Analysis of X-linked allelic imbalance in females with skewed XCI offers the opportunity to identify genes that escape XCI with bi-allelic expression in contrast to those with mono-allelic expression and which are therefore subject to XCI.
We determine XCI status for 409 genes, all of which have at least five informative females in our dataset. The majority of genes are subject to XCI and genes that escape from XCI show a continuum of expression from the inactive X. Inactive X expression corresponds to differences in the level of histone modification detected by allelic imbalance after chromatin immunoprecipitation. Differences in XCI between populations and between cell lines derived from different tissues are observed.
We demonstrate that allelic imbalance can be used to determine an inactivation status for X-linked genes, even without completely non-random XCI. There is a range of expression from the inactive X. Genes escaping XCI, including those that do so in only a subset of females, cluster together, demonstrating that XCI and location on the X chromosome are related. In addition to revealing mechanisms involved in cis-gene regulation, determining which genes escape XCI can expand our understanding of the contributions of X-linked genes to sexual dimorphism.
Sexual dimorphism in various bone phenotypes, including bone mineral density (BMD), is widely observed; however the extent to which genes explain these sex differences is unclear. To identify variants with different effects by sex, we examined gene-by-sex autosomal interactions genome-wide, and performed eQTL analysis and bioinformatics network analysis.
We conducted an autosomal genome-wide meta-analysis of gene-by-sex interaction on lumbar spine (LS-) and femoral neck (FN-) BMD, in 25,353 individuals from eight cohorts. In a second stage, we followed up the 12 top SNPs (P<1×10−5) in an additional set of 24,763 individuals. Gene-by-sex interaction and sex-specific effects were examined in these 12 SNPs.
We detected one novel genome-wide significant interaction associated with LS-BMD at the Chr3p26.1-p25.1 locus, near the GRM7 gene (male effect = 0.02 & p-value = 3.0×10−5; female effect = −0.007 & p-value=3.3×10−2) and eleven suggestive loci associated with either FN- or LS-BMD in discovery cohorts. However, there was no evidence for genome-wide significant (P<5×10−8) gene-by-sex interaction in the joint analysis of discovery and replication cohorts.
Despite the large collaborative effort, no genome-wide significant evidence for gene-by-sex interaction was found influencing BMD variation in this screen of autosomal markers. If they exist, gene-by-sex interactions for BMD probably have weak effects, accounting for less than 0.08% of the variation in these traits per implicated SNP.
gene-by-sex; interaction; BMD; association; aging
Although aberrant DNA methylation has been observed previously in acute lymphoblastic leukemia (ALL), the patterns of differential methylation have not been comprehensively determined in all subtypes of ALL on a genome-wide scale. The relationship between DNA methylation, cytogenetic background, drug resistance and relapse in ALL is poorly understood.
We surveyed the DNA methylation levels of 435,941 CpG sites in samples from 764 children at diagnosis of ALL and from 27 children at relapse. This survey uncovered four characteristic methylation signatures. First, compared with control blood cells, the methylomes of ALL cells shared 9,406 predominantly hypermethylated CpG sites, independent of cytogenetic background. Second, each cytogenetic subtype of ALL displayed a unique set of hyper- and hypomethylated CpG sites. The CpG sites that constituted these two signatures differed in their functional genomic enrichment to regions with marks of active or repressed chromatin. Third, we identified subtype-specific differential methylation in promoter and enhancer regions that were strongly correlated with gene expression. Fourth, a set of 6,612 CpG sites was predominantly hypermethylated in ALL cells at relapse, compared with matched samples at diagnosis. Analysis of relapse-free survival identified CpG sites with subtype-specific differential methylation that divided the patients into different risk groups, depending on their methylation status.
Our results suggest an important biological role for DNA methylation in the differences between ALL subtypes and in their clinical outcome after treatment.
While regulatory programs are extensively studied at the level of transcription, elements that are involved in regulation of post-transcriptional processes are largely unknown, and methods for systematic identification of these elements are in early stages. Here, using a novel computational framework, we have integrated sequence information with several functional genomics data sets to characterize conserved regulatory programs of trypanosomatids, a group of eukaryotes that almost entirely rely on post-transcriptional processes for regulation of mRNA abundance. This analysis revealed a complex network of linear and structural RNA elements that potentially govern mRNA abundance across different life stages and environmental conditions. Furthermore, we show that the conserved regulatory network that we have identified is responsive to chemical perturbation of several biological functions in trypanosomatids. We have further characterized one of the most abundant regulatory RNA elements that we discovered, an AU-rich element (ARE) that can be found in 3′ untranslated region of many trypanosomatid genes. Using bioinformatics approaches as well as in vitro and in vivo experiments, we have identified three ELAV-like homologs, including the developmentally critical protein TbRBP6, which regulate abundance of a large number of trypanosomatid ARE-containing transcripts. Together, these studies lay out a roadmap for characterization of mechanisms that modulate development and metabolic pathways in trypanosomatids.
BRCA1-associated breast and ovarian cancer risks can be modified by common genetic variants. To identify further cancer risk-modifying loci, we performed a multi-stage GWAS of 11,705 BRCA1 carriers (of whom 5,920 were diagnosed with breast and 1,839 were diagnosed with ovarian cancer), with a further replication in an additional sample of 2,646 BRCA1 carriers. We identified a novel breast cancer risk modifier locus at 1q32 for BRCA1 carriers (rs2290854, P = 2.7×10−8, HR = 1.14, 95% CI: 1.09–1.20). In addition, we identified two novel ovarian cancer risk modifier loci: 17q21.31 (rs17631303, P = 1.4×10−8, HR = 1.27, 95% CI: 1.17–1.38) and 4q32.3 (rs4691139, P = 3.4×10−8, HR = 1.20, 95% CI: 1.17–1.38). The 4q32.3 locus was not associated with ovarian cancer risk in the general population or BRCA2 carriers, suggesting a BRCA1-specific association. The 17q21.31 locus was also associated with ovarian cancer risk in 8,211 BRCA2 carriers (P = 2×10−4). These loci may lead to an improved understanding of the etiology of breast and ovarian tumors in BRCA1 carriers. Based on the joint distribution of the known BRCA1 breast cancer risk-modifying loci, we estimated that the breast cancer lifetime risks for the 5% of BRCA1 carriers at lowest risk are 28%–50% compared to 81%–100% for the 5% at highest risk. Similarly, based on the known ovarian cancer risk-modifying loci, the 5% of BRCA1 carriers at lowest risk have an estimated lifetime risk of developing ovarian cancer of 28% or lower, whereas the 5% at highest risk will have a risk of 63% or higher. Such differences in risk may have important implications for risk prediction and clinical management for BRCA1 carriers.
BRCA1 mutation carriers have increased and variable risks of breast and ovarian cancer. To identify modifiers of breast and ovarian cancer risk in this population, a multi-stage GWAS of 14,351 BRCA1 mutation carriers was performed. Loci 1q32 and TCF7L2 at 10q25.3 were associated with breast cancer risk, and two loci at 4q32.2 and 17q21.31 were associated with ovarian cancer risk. The 4q32.3 ovarian cancer locus was not associated with ovarian cancer risk in the general population or in BRCA2 carriers and is the first indication of a BRCA1-specific risk locus for either breast or ovarian cancer. Furthermore, modeling the influence of these modifiers on cumulative risk of breast and ovarian cancer in BRCA1 mutation carriers for the first time showed that a wide range of individual absolute risks of each cancer can be estimated. These differences suggest that genetic risk modifiers may be incorporated into the clinical management of BRCA1 mutation carriers.
A large number of genome-wide association studies have been performed during the past five years to identify associations between SNPs and human complex diseases and traits. The assignment of a functional role for the identified disease-associated SNP is not straight-forward. Genome-wide expression quantitative trait locus (eQTL) analysis is frequently used as the initial step to define a function while allele-specific gene expression (ASE) analysis has not yet gained a wide-spread use in disease mapping studies. We compared the power to identify cis-acting regulatory SNPs (cis-rSNPs) by genome-wide allele-specific gene expression (ASE) analysis with that of traditional expression quantitative trait locus (eQTL) mapping. Our study included 395 healthy blood donors for whom global gene expression profiles in circulating monocytes were determined by Illumina BeadArrays. ASE was assessed in a subset of these monocytes from 188 donors by quantitative genotyping of mRNA using a genome-wide panel of SNP markers. The performance of the two methods for detecting cis-rSNPs was evaluated by comparing associations between SNP genotypes and gene expression levels in sample sets of varying size. We found that up to 8-fold more samples are required for eQTL mapping to reach the same statistical power as that obtained by ASE analysis for the same rSNPs. The performance of ASE is insensitive to SNPs with low minor allele frequencies and detects a larger number of significantly associated rSNPs using the same sample size as eQTL mapping. An unequivocal conclusion from our comparison is that ASE analysis is more sensitive for detecting cis-rSNPs than standard eQTL mapping. Our study shows the potential of ASE mapping in tissue samples and primary cells which are difficult to obtain in large numbers.
A report on the 12th International Congress of Human Genetics, joint with the 61st annual American Society of Human Genetics conference, Montreal, Quebec, 11-15 October 2011.
Bone mineral density (BMD) is the most important predictor of fracture risk. We performed the largest meta-analysis to date on lumbar spine and femoral neck BMD, including 17 genome-wide association studies and 32,961 individuals of European and East Asian ancestry. We tested the top-associated BMD markers for replication in 50,933 independent subjects and for risk of low-trauma fracture in 31,016 cases and 102,444 controls. We identified 56 loci (32 novel)associated with BMD atgenome-wide significant level (P<5×10−8). Several of these factors cluster within the RANK-RANKL-OPG, mesenchymal-stem-cell differentiation, endochondral ossification and the Wnt signalling pathways. However, we also discovered loci containing genes not known to play a role in bone biology. Fourteen BMD loci were also associated with fracture risk (P<5×10−4, Bonferroni corrected), of which six reached P<5×10−8 including: 18p11.21 (C18orf19), 7q21.3 (SLC25A13), 11q13.2 (LRP5), 4q22.1 (MEPE), 2p16.2 (SPTBN1) and 10q21.1 (DKK1). These findings shed light on the genetic architecture and pathophysiological mechanisms underlying BMD variation and fracture susceptibility.
Identifying and understanding the impact of gene regulatory variation is of considerable importance in evolutionary and medical genetics; such variants are thought to be responsible for human-specific adaptation  and to have an important role in genetic disease. Regulatory variation in cis is readily detected in individuals showing uneven expression of a transcript from its two allelic copies, an observation referred to as allelic imbalance (AI). Identifying individuals exhibiting AI allows mapping of regulatory DNA regions and the potential to identify the underlying causal genetic variant(s). However, existing mapping methods require knowledge of the haplotypes, which make them sensitive to phasing errors. In this study, we introduce a genotype-based mapping test that does not require haplotype-phase inference to locate regulatory regions. The test relies on partitioning genotypes of individuals exhibiting AI and those not expressing AI in a 2×3 contingency table. The performance of this test to detect linkage disequilibrium (LD) between a potential regulatory site and a SNP located in this region was examined by analyzing the simulated and the empirical AI datasets. In simulation experiments, the genotype-based test outperforms the haplotype-based tests with the increasing distance separating the regulatory region from its regulated transcript. The genotype-based test performed equally well with the experimental AI datasets, either from genome–wide cDNA hybridization arrays or from RNA sequencing. By avoiding the need of haplotype inference, the genotype-based test will suit AI analyses in population samples of unknown haplotype structure and will additionally facilitate the identification of cis-regulatory variants that are located far away from the regulated transcript.
Phenotypic variation results from variation in gene expression, which is modulated by genetic and/or epigenetic factors. To understand the molecular basis of human disease, interaction between genetic and epigenetic factors needs to be taken into account. The asthma-associated region 17q12-q21 harbors three genes, the zona pellucida binding protein 2 (ZPBP2), gasdermin B (GSDMB) and ORM1-like 3 (ORMDL3), that show allele-specific differences in expression levels in lymphoblastoid cell lines (LCLs) and CD4+ T cells. Here, we report a molecular dissection of allele-specific transcriptional regulation of the genes within the chromosomal region 17q12-q21 combining in vitro transfection, formaldehyde-assisted isolation of regulatory elements, chromatin immunoprecipitation and DNA methylation assays in LCLs. We found that a single nucleotide polymorphism rs4795397 influences the activity of ZPBP2 promoter in vitro in an allele-dependent fashion, and also leads to nucleosome repositioning on the asthma-associated allele. However, variable methylation of exon 1 of ZPBP2 masks the strong genetic effect on ZPBP2 promoter activity in LCLs. In contrast, the ORMDL3 promoter is fully unmethylated, which allows detection of genetic effects on its transcription. We conclude that the cis-regulatory effects on 17q12-q21 gene expression result from interaction between several regulatory polymorphisms and epigenetic factors within the cis-regulatory haplotype region.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-012-1142-x) contains supplementary material, which is available to authorized users.
Genome-wide association studies of human gene expression promise to identify functional regulatory genetic variation that contributes to phenotypic diversity. However, it is unclear how useful this approach will be for the identification of disease-susceptibility variants. We generated gene expression profiles for 22 184 mRNA transcripts using RNA derived from peripheral blood CD4+ lymphocytes, and genome-wide genotype data for 516 512 autosomal markers in 200 subjects. We screened for cis-acting variants by testing variants mapping within 50 kb of expressed transcripts for association with transcript abundance using generalized linear models. Significant associations were identified for 1585 genes at a false discovery rate of 0.05 (corresponding to P-values ranging from 1 × 10−91 to 7 × 10−4). Importantly, we identified evidence of regulatory variation for 119 previously mapped disease genes, including 24 examples where the variant with the strongest evidence of disease-association demonstrates strong association with specific transcript abundance. The prevalence of cis-acting variants among disease-associated genes was 63% higher than the genome-wide rate in our data set (P = 6.41 × 10−6), and although many of the implicated loci were associated with immune-related diseases (including asthma, connective tissue disorders and inflammatory bowel disease), associations with genes implicated in non-immune-related diseases including lipid profiles, anthropomorphic measurements, cancer and neurologic disease were also observed. Genetic variants that confer inter-individual differences in gene expression represent an important subset of variants that contribute to disease susceptibility. Population-based integrative genetic approaches can help identify such variation and enhance our understanding of the genetic basis of complex traits.
The commercialization of academic research has been promoted by North American policy makers for over 30 years as a means of increasing university financing and to ensure that promising research would eventually find its way to the marketplace. The following issues paper constitutes a reflection on the impact of the Canadian commercialization framework on academic research in the field of genomics. It was written following two workshops and two independent studies organized by academic groups in Quebec (Centre of Genomics and Policy) and Alberta (Health Law Institute). The full sets of recommendations are available upon request to the authors.
Adult height is a classic polygenic trait of high heritability (h2 ∼0.8). More than 180 single nucleotide polymorphisms (SNPs), identified mostly in populations of European descent, are associated with height. These variants convey modest effects and explain ∼10% of the variance in height. Discovery efforts in other populations, while limited, have revealed loci for height not previously implicated in individuals of European ancestry. Here, we performed a meta-analysis of genome-wide association (GWA) results for adult height in 20,427 individuals of African ancestry with replication in up to 16,436 African Americans. We found two novel height loci (Xp22-rs12393627, P = 3.4×10−12 and 2p14-rs4315565, P = 1.2×10−8). As a group, height associations discovered in European-ancestry samples replicate in individuals of African ancestry (P = 1.7×10−4 for overall replication). Fine-mapping of the European height loci in African-ancestry individuals showed an enrichment of SNPs that are associated with expression of nearby genes when compared to the index European height SNPs (P<0.01). Our results highlight the utility of genetic studies in non-European populations to understand the etiology of complex human diseases and traits.
Adult height is an ideal phenotype to improve our understanding of the genetic architecture of complex diseases and traits: it is easily measured and usually available in large cohorts, relatively stable, and mostly influenced by genetics (narrow-sense heritability of height h2∼0.8). Genome-wide association (GWA) studies in individuals of European ancestry have identified >180 single nucleotide polymorphisms (SNPs) associated with height. In the current study, we continued to use height as a model polygenic trait and explored the genetic influence in populations of African ancestry through a meta-analysis of GWA height results from 20,809 individuals of African descent. We identified two novel height loci not previously found in Europeans. We also replicated the European height signals, suggesting that many of the genetic variants that are associated with height are shared between individuals of European and African descent. Finally, in fine-mapping the European height loci in African-ancestry individuals, we found SNPs more likely to be associated with the expression of nearby genes than the SNPs originally found in Europeans. Thus, our results support the utility of performing genetic studies in non-European populations to gain insights into complex human diseases and traits.
Short-acting b2-adrenergic receptor agonists are commonly used bronchodilators for symptom relief in asthmatics. The aim of this study was to test whether genetic variants in PDE4D gene, a key regulator of b2-adrenoceptor-induced cAMP turnover in airway smooth muscle cells, affect the response to short-acting b2-agonists. Bronchodilator responsiveness was assessed in 133 asthmatic children by % change in baseline forced expiratory volume in one second (FEV1) after administration of albuterol. The analyses were performed in patients with airway obstruction (FEV1/FVC ratio below 90%, n = 93). FEV1 % change adjusted for baseline FEV1 values was significantly different between genotypes of rs1544791 G/A polymorphism (P = 0.006) and −1345 C/T (rs1504982) promoter variation (P = 0.03). The association remained significant with inclusion of age, sex, atopy, and controller medication into multivariate model (P = 0.004
and P = 0.02, resp.). Our work identifies new genetic variants implicated in modulation of asthma treatment, one of them (rs1544791) previously associated with asthma phenotype.
Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence phenotype. Genome-wide association (GWA) studies have identified >600 variants associated with human traits1, but these typically explain small fractions of phenotypic variation, raising questions about the utility of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait2,3. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P=0.016), and that underlie skeletal growth defects (P<0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants, and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented amongst variants that alter amino acid structure of proteins and expression levels of nearby genes. Our data explain ∼10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to ∼16% of phenotypic variation (∼20% of heritable variation). Although additional approaches are needed to fully dissect the genetic architecture of polygenic human traits, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.
Parent-of-origin-dependent expression of alleles, imprinting, has been suggested to impact a substantial proportion of mammalian genes. Its discovery requires allele-specific detection of expressed transcripts, but in some cases detected allelic expression bias has been interpreted as imprinting without demonstrating compatible transmission patterns and excluding heritable variation. Therefore, we utilized a genome-wide tool exploiting high density genotyping arrays in parallel measurements of genotypes in RNA and DNA to determine allelic expression across the transcriptome in lymphoblastoid cell lines (LCLs) and skin fibroblasts derived from families.
We were able to validate 43% of imprinted genes with previous demonstration of compatible transmission patterns in LCLs and fibroblasts. In contrast, we only validated 8% of genes suggested to be imprinted in the literature, but without clear evidence of parent-of-origin-determined expression. We also detected five novel imprinted genes and delineated regions of imprinted expression surrounding annotated imprinted genes. More subtle parent-of-origin-dependent expression, or partial imprinting, could be verified in four genes. Despite higher prevalence of monoallelic expression, immortalized LCLs showed consistent imprinting in fewer loci than primary cells. Random monoallelic expression has previously been observed in LCLs and we show that random monoallelic expression in LCLs can be partly explained by aberrant methylation in the genome.
Our results indicate that widespread parent-of-origin-dependent expression observed recently in rodents is unlikely to be captured by assessment of human cells derived from adult tissues where genome-wide assessment of both primary and immortalized cells yields few new imprinted loci.
Genetic variants altering cis-regulation of normal gene expression (cis-eQTLs) have been extensively mapped in human cells and tissues, but the extent by which controlled, environmental perturbation influences cis-eQTLs is unclear. We carried out large-scale induction experiments using primary human bone cells derived from unrelated donors of Swedish origin treated with 18 different stimuli (7 treatments and 2 controls, each assessed at 2 time points). The treatments with the largest impact on the transcriptome, verified on two independent expression arrays, included BMP-2 (t = 2h), dexamethasone (DEX) (t = 24h), and PGE2 (t = 24h). Using these treatments and control, we performed expression profiling for 18,144 RefSeq transcripts on biological replicates of the complete study cohort of 113 individuals (ntotal = 782) and combined it with genome-wide SNP-genotyping data in order to map treatment-specific cis-eQTLs (defined as SNPs located within the gene ±250 kb). We found that 93% of cis-eQTLs at 1% FDR were observed in at least one additional treatment, and in fact, on average, only 1.4% of the cis-eQTLs were considered as treatment-specific at high confidence. The relative invariability of cis-regulation following perturbation was reiterated independently by genome-wide allelic expression tests where only a small proportion of variance could be attributed to treatment. Treatment-specific cis-regulatory effects were, however, 2- to 6-fold more abundant among differently expressed genes upon treatment. We further followed-up and validated the DEX–specific cis-regulation of the MYO6 and TNC loci and found top cis-regulatory variants located 180 kb and 250 kb upstream of the transcription start sites, respectively. Our results suggest that, as opposed to tissue-specificity of cis-eQTLs, the interactions between cellular environment and cis-variants are relatively rare (∼1.5%), but that detection of such specific interactions can be achieved by a combination of functional genomic approaches as described here.
Population variation in normal gene expression has been convincingly shown to be under strong genetic control where the main genetic variants are located within close proximity to the gene itself (so called cis-acting). However, the extent to which controlled, environmental stimuli influences cis-regulation of gene expression is unclear. Here, we combine different functional genomic approaches and examine the role of common genetic variants on induced gene expression in a population panel of primary human cells derived from ∼100 unrelated donors treated under multiple conditions. Using these approaches, we find that the interaction between cellular environment and cis-variants are relatively rare, with only a small proportion of the identified genetic variants being specific to treatment. However, although treatment-specific genetic regulation of gene expression seems to be infrequent, we prove its existence by thorough validation of treatment-specific effects of the glucocorticoid-specific regulation of TNC expression. Taken together, these findings indicate that the regulatory landscape within a cell is very stable but, by combining functional genomic tools gene-environmental interactions of clinical importance, can be detected and possibly used as biomarkers in future pharmacogenomic studies.
Allelic imbalance (AI) is a phenomenon where the two alleles of a given gene are expressed at different levels in a given cell, either because of epigenetic inactivation of one of the two alleles, or because of genetic variation in regulatory regions. Recently, Bing et al. have described the use of genotyping arrays to assay AI at a high resolution (∼750,000 SNPs across the autosomes). In this paper, we investigate computational approaches to analyze this data and identify genomic regions with AI in an unbiased and robust statistical manner. We propose two families of approaches: (i) a statistical approach based on z-score computations, and (ii) a family of machine learning approaches based on Hidden Markov Models. Each method is evaluated using previously published experimental data sets as well as with permutation testing. When applied to whole genome data from 53 HapMap samples, our approaches reveal that allelic imbalance is widespread (most expressed genes show evidence of AI in at least one of our 53 samples) and that most AI regions in a given individual are also found in at least a few other individuals. While many AI regions identified in the genome correspond to known protein-coding transcripts, others overlap with recently discovered long non-coding RNAs. We also observe that genomic regions with AI not only include complete transcripts with consistent differential expression levels, but also more complex patterns of allelic expression such as alternative promoters and alternative 3′ end. The approaches developed not only shed light on the incidence and mechanisms of allelic expression, but will also help towards mapping the genetic causes of allelic expression and identify cases where this variation may be linked to diseases.
Measures of gene expression, and the search for regulatory regions in the genome responsible for differences in levels of gene expression, is one of the key paths of research used to identify disease causing genes, as well as explain differences between healthy individuals. Typically, experiments have measured and compared gene expression in multiple individuals, and used this information to attempt to map regulatory regions responsible. Differences in environment between individuals can, however, cause differences in gene expression unrelated to the underlying regulatory sequence. New genotyping technologies enable the measurement of expression of both copies of a particular gene, at loci that are heterozygous within a particular individual. This will therefore act as an internal control, as environmental factors will continue to affect the expression of both copies of a gene at presumably equal levels, and differences in expression are more likely to be explicable by differences in regulatory regions specific to the two copies of the gene itself. Differences between regulatory regions are expected to lead to differences in expression of the two copies (or the two alleles) of a particular gene, also known as allelic imbalance. We describe a set of signal processing methods for the reliable detection of allelic expression within the genome.
Osteoporosis is a complex disorder and commonly leads to fractures in elderly persons. Genome-wide association studies (GWAS) have become an unbiased approach to identify variations in the genome that potentially affect health. However, the genetic variants identified so far only explain a small proportion of the heritability for complex traits. Due to the modest genetic effect size and inadequate power, true association signals may not be revealed based on a stringent genome-wide significance threshold. Here, we take advantage of SNP and transcript arrays and integrate GWAS and expression signature profiling relevant to the skeletal system in cellular and animal models to prioritize the discovery of novel candidate genes for osteoporosis-related traits, including bone mineral density (BMD) at the lumbar spine (LS) and femoral neck (FN), as well as geometric indices of the hip (femoral neck-shaft angle, NSA; femoral neck length, NL; and narrow-neck width, NW). A two-stage meta-analysis of GWAS from 7,633 Caucasian women and 3,657 men, revealed three novel loci associated with osteoporosis-related traits, including chromosome 1p13.2 (RAP1A, p = 3.6×10−8), 2q11.2 (TBC1D8), and 18q11.2 (OSBPL1A), and confirmed a previously reported region near TNFRSF11B/OPG gene. We also prioritized 16 suggestive genome-wide significant candidate genes based on their potential involvement in skeletal metabolism. Among them, 3 candidate genes were associated with BMD in women. Notably, 2 out of these 3 genes (GPR177, p = 2.6×10−13; SOX6, p = 6.4×10−10) associated with BMD in women have been successfully replicated in a large-scale meta-analysis of BMD, but none of the non-prioritized candidates (associated with BMD) did. Our results support the concept of our prioritization strategy. In the absence of direct biological support for identified genes, we highlighted the efficiency of subsequent functional characterization using publicly available expression profiling relevant to the skeletal system in cellular or whole animal models to prioritize candidate genes for further functional validation.
BMD and hip geometry are two major predictors of osteoporotic fractures, the most severe consequence of osteoporosis in elderly persons. We performed sex-specific genome-wide association studies (GWAS) for BMD at the lumbar spine and femor neck skeletal sites as well as hip geometric indices (NSA, NL, and NW) in the Framingham Osteoporosis Study and then replicated the top findings in two independent studies. Three novel loci were significant: in women, including chromosome 1p13.2 (RAP1A) for NW; in men, 2q11.2 (TBC1D8) for NSA and 18q11.2 (OSBPL1A) for NW. We confirmed a previously reported region on 8q24.12 (TNFRSF11B/OPG) for lumbar spine BMD in women. In addition, we integrated GWAS signals with eQTL in several tissues and publicly available expression signature profiling in cellular and whole-animal models, and prioritized 16 candidate genes/loci based on their potential involvement in skeletal metabolism. Among three prioritized loci (GPR177, SOX6, and CASR genes) associated with BMD in women, GPR177 and SOX6 have been successfully replicated later in a large-scale meta-analysis, but none of the non-prioritized candidates (associated with BMD) did. Our results support the concept of using expression profiling to support the candidacy of suggestive GWAS signals that may contain important genes of interest.