Related Articles
Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs) when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs) discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function.
Author Summary
The recent availability of almost fully sequenced human genomes by the 1000 genomes project allows the direct study of genetic variants that influence levels of gene expression in the cell. In this study, we explore the effect of rare and common variants on levels of gene expression. We show that the availability of a more comprehensive list of variants brings us closer to the likely causal variants, and we discuss their genomic and evolutionary properties. We also demonstrate the effects of variants that change splicing patterns or length of the protein product, the putative joint impacts of variants that affect gene expression, and those that affect protein structure. Finally, we show the impact of rare regulatory variants that cannot be detected by the conventional methodologies of association and require the interrogation of full genome sequencing and full transcriptome sequencing. These approaches bring us closer to the implementation of these data and methodologies to a direct clinical application.
doi:10.1371/journal.pgen.1002144
PMCID: PMC3141000
PMID: 21811411
The goal of personalized medicine is to recommend drug treatment based on an individual’s genetic makeup. Pharmacogenomic studies utilize two main approaches: candidate gene and whole-genome. Both approaches analyze genetic variants such as single nucleotide polymorphisms (SNPs) to identify associations with drug response. In addition to DNA sequence variations, non-genetic but heritable epigenetic systems have also been implicated in regulating gene expression that could influence drug response. The International HapMap Project lymphoblastoid cell lines (LCLs) have been used to study genetic determinants responsible for expression variation and drug response. Recent studies have demonstrated that common genetic variants, including both SNPs and copy number variants (CNVs) account for a substantial fraction of natural variation in gene expression. Given the critical role played by DNA methylation in gene regulation and the fact that DNA methylation is currently the most studied epigenetic system, we suggest that profiling the variation in DNA methylation in the HapMap samples will provide new insights into the regulation of gene expression as well as the mechanisms of individual drug response at a new level of complexity. Epigenomics will substantially add to our knowledge of how genetics explains gene expression and pharmacogenomics.
PMCID: PMC2901114
PMID: 20622972
epigenetics; DNA methylation; gene expression; pharmacogenomics; HapMap; drug response
The goal of personalized medicine is to recommend drug treatment based on an individual’s genetic makeup. Pharmacogenomic studies utilize two main approaches: candidate gene and whole-genome. Both approaches analyze genetic variants such as single nucleotide polymorphisms (SNPs) to identify associations with drug response. In addition to DNA sequence variations, nongenetic but heritable epigenetic systems have also been implicated in regulating gene expression that could influence drug response. The International HapMap Project lymphoblastoid cell lines (LCLs) have been used to study genetic determinants responsible for expression variation and drug response. Recent studies have demonstrated that common genetic variants, including both SNPs and copy number variants (CNVs) account for a substantial fraction of natural variation in gene expression. Given the critical role played by DNA methylation in gene regulation and the fact that DNA methylation is currently the most studied epigenetic system, we suggest that profiling the variation in DNA methylation in the HapMap samples will provide new insights into the regulation of gene expression as well as the mechanisms of individual drug response at a new level of complexity. Epigenomics will substantially add to our knowledge of how genetics explains gene expression and pharmacogenomics.
PMCID: PMC2901114
PMID: 20622972
epigenetics; DNA methylation; gene expression; pharmacogenomics; HapMap; drug response
Aksoy, Pinar | Zhu, Min Jia | Kalari, Krishna R. | Moon, Irene | Pelleymounter, Linda L. | Eckloff, Bruce W. | Wieben, Eric D. | Yee, Vivien C. | Weinshilboum, Richard M. | Wang, Liewei
Background
5’-Nucleotidases play a critical role in nucleotide pool balance and in the metabolism of nucleoside analogs such as gemcitabine and cytosine arabinoside (AraC). We previously performed an expression array association study with gemcitabine and AraC cytotoxicity using 197 human lymphoblastoid cell lines. One gene that was significantly associated with gemcitabine cytotoxicity was a nucleotidase family member, NT5C3. Very little is known with regard to the pharmacogenomics of this family of enzymes.
Methods
We set out to identify common genetic variation in NT5C3 by resequencing the gene and to determine the effect of that variation on NT5C3 protein function and potential effect on response to cytidine analogues. We identified 61 NT5C3 polymorphisms, 48 of which were novel, by resequencing 240 ethnically defined DNA samples. Functional studies were performed with one nonsynonymous (G847C, Asp283His) and 4 synonymous cSNPs (T9C, C276T, T306C, and G759A), as well as three combined variants (T276/His283, T276/C306, T276/C9).
Results
The His283 and T276/His283 constructs showed decreased levels of enzyme activity and protein. Substrate kinetic analysis showed no significant differences in Km values between WT and His283 when CMP, AraCMP and GemMP were used as substrates. An association study between SNPs and NT5C3 expression in the 240 cell lines from which DNA was extracted to resequence NT5C3 identified 4 SNPs that were significantly associated with NT5C3 expression. EMSAs showed that two of those SNPs, I4(-114) and I6(9), altered DNA-protein binding patterns. These findings suggest that genetic variation in NT5C3 might affect protein function and potentially influence drug response.
doi:10.1097/FPC.0b013e32832c14b8
PMCID: PMC2763634
PMID: 19623099
Gemcitabine; cytosine arabinoside; cytosolic 5’-nucleotidase III; pharmacogenomics; single nucleotide polymorphisms (SNPs); functional genomics
Nguyen-Dumont, Tú | Jordheim, Lars P | Michelon, Jocelyne | Forey, Nathalie | McKay-Chopin, Sandrine | Sinilnikova, Olga | Le Calvez-Kelm, Florence | Southey, Melissa C | Tavtigian, Sean V | Lesueur, Fabienne
Background
The gene CHEK2 encodes a checkpoint kinase playing a key role in the DNA damage pathway. Though CHEK2 has been identified as an intermediate breast cancer susceptibility gene, only a small proportion of high-risk families have been explained by genetic variants located in its coding region. Alteration in gene expression regulation provides a potential mechanism for generating disease susceptibility. The detection of differential allelic expression (DAE) represents a sensitive assay to direct the search for a functional sequence variant within the transcriptional regulatory elements of a candidate gene. We aimed to assess whether CHEK2 was subject to DAE in lymphoblastoid cell lines (LCLs) from high-risk breast cancer patients for whom no mutation in BRCA1 or BRCA2 had been identified.
Methods
We implemented an assay based on high-resolution melting (HRM) curve analysis and developed an analysis tool for DAE assessment.
Results
We observed allelic expression imbalance in 4 of the 41 LCLs examined. All four were carriers of the truncating mutation 1100delC. We confirmed previous findings that this mutation induces non-sense mediated mRNA decay. In our series, we ruled out the possibility of a functional sequence variant located in the promoter region or in a regulatory element of CHEK2 that would lead to DAE in the transcriptional regulatory milieu of freely proliferating LCLs.
Conclusions
Our results support that HRM is a sensitive and accurate method for DAE assessment. This approach would be of great interest for high-throughput mutation screening projects aiming to identify genes carrying functional regulatory polymorphisms.
doi:10.1186/1755-8794-4-39
PMCID: PMC3112061
PMID: 21569354
Attar, Homa | Bedard, Karen | Migliavacca, Eugenia | Gagnebin, Maryline | Dupré, Yann | Descombes, Patrick | Borel, Christelle | Deutsch, Samuel | Prokisch, Holger | Meitinger, Thomas | Mehta, Divya | Wichmann, Erich | Delabar, Jean Maurice | Dermitzakis, Emmanouil T. | Krause, Karl-Heinz | Antonarakis, Stylianos E. | Ahuja, Sunil K.
Natural variation in DNA sequence contributes to individual differences in quantitative traits. While multiple studies have shown genetic control over gene expression variation, few additional cellular traits have been investigated. Here, we investigated the natural variation of NADPH oxidase-dependent hydrogen peroxide (H2O2 release), which is the joint effect of reactive oxygen species (ROS) production, superoxide metabolism and degradation, and is related to a number of human disorders. We assessed the normal variation of H2O2 release in lymphoblastoid cell lines (LCL) in a family-based 3-generation cohort (CEPH-HapMap), and in 3 population-based cohorts (KORA, GenCord, HapMap). Substantial individual variation was observed, 45% of which were associated with heritability in the CEPH-HapMap cohort. We identified 2 genome-wide significant loci of Hsa12 and Hsa15 in genome-wide linkage analysis. Next, we performed genome-wide association study (GWAS) for the combined KORA-GenCord cohorts (n = 279) using enhanced marker resolution by imputation (>1.4 million SNPs). We found 5 significant associations (p<5.00×10−8) and 54 suggestive associations (p<1.00×10−5), one of which confirmed the linked region on Hsa15. To replicate our findings, we performed GWAS using 58 HapMap individuals and ∼2.1 million SNPs. We identified 40 genome-wide significant and 302 suggestive SNPs, and confirmed genome signals on Hsa1, Hsa12, and Hsa15. Genetic loci within 900 kb from the known candidate gene p67phox on Hsa1 were identified in GWAS in both cohorts. We did not find replication of SNPs across all cohorts, but replication within the same genomic region. Finally, a highly significant decrease in H2O2 release was observed in Down Syndrome (DS) individuals (p<2.88×10−12). Taken together, our results show strong evidence of genetic control of H2O2 in LCL of healthy and DS cohorts and suggest that cellular phenotypes, which themselves are also complex, may be used as proxies for dissection of complex disorders.
doi:10.1371/journal.pone.0043566
PMCID: PMC3430705
PMID: 22952707
Mycophenolic acid (MPA) is commonly used to treat patients with solid organ transplants during maintenance immunosuppressive therapy. Response to MPA varies widely, both for efficacy and drug-induced toxicity. A portion of this variation can be explained by pharmacokinetic and pharmacodynamic factors, including genetic variation in MPA-metabolizing UDP-glucuronyltransferase isoforms and the MPA targets, inosine monophosphate dehydrogenase 1 and 2. However, much of the variation in MPA response presently remains unexplained. We set out to determine whether there might be additional genes that modify response to MPA by performing a genome-wide association study between basal gene mRNA expression profiles and an MPA cytotoxicity phenotype using a 271 human lymphoblastoid cell line model system to identify and functionally validate genes that might contribute to variation in MPA response. Our association study identified 41 gene expression probe sets, corresponding to 35 genes, that were associated with MPA cytotoxicity as a drug response phenotype (p < 1 × 10−6). Follow-up siRNA-mediated knockdown-based functional validation identified four of these candidate genes, C17orf108, CYBRD1, NASP, and RRM2, whose knockdown shifted the MPA cytotoxicity curves in the direction predicted by the association analysis. These studies have identified novel candidate genes that may contribute to variation in response to MPA therapy and, as a result, may help make it possible to move toward more highly individualized MPA-based immunosuppressive therapy.
doi:10.1016/j.intimp.2011.02.027
PMCID: PMC3138818
PMID: 21396482
Mycophenolic acid; mycophenolate; immunosuppression; pharmacogenomics; cell line model system; C17orf108; CYPRD1; NASP; RRM2
The International HapMap Project provides a key resource of genotypic data on human samples including lymphoblastoid cell lines derived from individuals of four major world populations of African, European, Japanese and Chinese ancestry. Researchers have utilized this resource to identify genetic elements that correlate with various phenotypes such as risks of common diseases, individual drug response and gene expression variation. However, recent comparative studies have suggested that the currently available HapMap genotypic data may not capture a substantial proportion of rare or untyped SNPs in these populations, implying that the HapMap SNPs may not be sufficient for comprehensive association studies. In this paper, three large-scale deep resequencing projects covering the HapMap samples: ENCODE (Encyclopedia of DNA Elements), SeattleSNPs and NIEHS (National Institute of Environmental Health Sciences) Environmental Genome Project are discussed. Prospectively, once integrated with the HapMap resource, these efforts will greatly benefit the next wave of association studies and data mining using these cell lines.
doi:10.2174/157489308785909232
PMCID: PMC2819736
PMID: 20151045
HapMap; lymphoblastoid cell lines; genotype; single nucleotide polymorphism; resequencing
Although statin drugs are generally efficacious for lowering plasma LDL-cholesterol levels, there is considerable variability in response. To identify candidate genes that may contribute to this variation, we used an unbiased genome-wide filter approach that was applied to 10,149 genes expressed in immortalized lymphoblastoid cell lines (LCLs) derived from 480 participants of the Cholesterol and Pharmacogenomics (CAP) clinical trial of simvastatin. The criteria for identification of candidates included genes whose statin-induced changes in expression were correlated with change in expression of HMGCR, a key regulator of cellular cholesterol metabolism and the target of statin inhibition. This analysis yielded 45 genes, from which RHOA was selected for follow-up because it has been found to participate in mediating the pleiotropic but not the lipid-lowering effects of statin treatment. RHOA knock-down in hepatoma cell lines reduced HMGCR, LDLR, and SREBF2 mRNA expression and increased intracellular cholesterol ester content as well as apolipoprotein B (APOB) concentrations in the conditioned media. Furthermore, inter-individual variation in statin-induced RHOA mRNA expression measured in vitro in CAP LCLs was correlated with the changes in plasma total cholesterol, LDL-cholesterol, and APOB induced by simvastatin treatment (40 mg/d for 6 wk) of the individuals from whom these cell lines were derived. Moreover, the minor allele of rs11716445, a SNP located in a novel cryptic RHOA exon, dramatically increased inclusion of the exon in RHOA transcripts during splicing and was associated with a smaller LDL-cholesterol reduction in response to statin treatment in 1,886 participants from the CAP and Pravastatin Inflamation and CRP Evaluation (PRINCE; pravastatin 40 mg/d) statin clinical trials. Thus, an unbiased filter approach based on transcriptome-wide profiling identified RHOA as a gene contributing to variation in LDL-cholesterol response to statin, illustrating the power of this approach for identifying candidate genes involved in drug response phenotypes.
Author Summary
Statins, or HMG CoA reductase inhibitors, are widely used to lower plasma LDL-cholesterol levels as a means of reducing risk for cardiovascular disease. We performed an unbiased genome-wide survey to identify novel candidate genes that may be involved in statin response using genome-wide mRNA expression analysis in a sequential filtering strategy to identify those most likely to be relevant to cholesterol metabolism based on their gene expression characteristics. Among these, RHOA was selected for further functional study. A role for this gene in the maintenance of intracellular cholesterol homeostasis was confirmed by knock-down in hepatoma cell lines. In addition, statin-induced RHOA transcript levels measured in a panel of lymphoblastoid cell lines was correlated with statin-induced change in plasma LDL-cholesterol measured in individuals from whom the cell lines were derived. Lastly, a cis-acting splicing QTL associated with expression of a rare cryptic RHOA exon was also associated with statin-induced changes in plasma LDLC levels. This result exemplifies the power of applying biological information of well understood molecular pathways with genome-wide expression data for the identification of candidate genes that influence drug response.
doi:10.1371/journal.pgen.1003058
PMCID: PMC3499361
PMID: 23166513
Li, Liang | Schaid, Daniel J. | Fridley, Brooke L. | Kalari, Krishna R. | Jenkins, Gregory D. | Abo, Ryan P. | Batzler, Anthony | Moon, Irene | Pelleymounter, Linda | Eckloff, Bruce W. | Wieben, Eric D. | Sun, Zhifu | Yang, Ping | Wang, Liewei
Background and objective
Gemcitabine is widely used to treat non-small cell lung cancer (NSCLC). To assess the pharmacogenomic effects of the entire gemcitabine metabolic pathway, we genotyped SNPs within the 17 pathway genes using DNA samples from NSCLC patients treated with gemcitabine to determine the effect of genetic variants within gemcitabine pathway genes on overall survival (OS) of NSCLC patients after treatment of gemcitabine.
Methods
Eight of the 17 pathway genes were resequenced with DNA samples from Coriell lymphoblastoid cell lines (LCLs) using Sanger sequencing for all exons, exon-intron junctions and 5′-, 3′-UTRs. A total of 107 tag SNPs were selected based on the resequencing data for the 8 genes and on HapMap data for the remaining 9 genes, followed by successful genotyping of 394 NSCLC patient DNA samples. Association of SNPs/haplotypes with OS was performed using the Cox regression model, followed by functional studies performed with LCLs and NSCLC cell lines.
Results
5 SNPs in 4 genes (CDA, NT5C2, RRM1, and SLC29A1) showed associations with OS of those NSCLC patients, as well as 9 haplotypes in 4 genes (RRM1, RRM2, SLC28A3, and SLC29A1) with P < 0.05. Genotype imputation using the LCLs was performed for a region of 200kb surrounding those SNPs, followed by association studies with gemcitabine cytotoxicity. Functional studies demonstrated that downregulation of SLC29A1, NT5C2, and RRM1 in NSCLC cell lines altered cell susceptibility to gemcitabine.
Conclusion
These studies help identify biomarkers to predict gemcitabine response in NSCLC, a step toward the individualized chemotherapy of lung cancer.
doi:10.1097/FPC.0b013e32834dd7e2
PMCID: PMC3259218
PMID: 22173087
gemcitabine; pharmacogenomics; metabolic pathway; lymphoblastoid cell lines; non-small cell lung cancer (NSCLC)
Duan, Jubao | Shi, Jianxin | Ge, Xijin | Dölken, Lars | Moy, Winton | He, Deli | Shi, Sandra | Sanders, Alan R. | Ross, Jeff | Gejman, Pablo V.
The extent to which RNA stability differs between individuals and its contribution to the interindividual expression variation remain unknown. We conducted a genome-wide analysis of RNA stability in seven human HapMap lymphoblastoid cell lines (LCLs) and analyzed the effect of DNA sequence variation on RNA half-life differences. Twenty-six percent of the expressed genes exhibited RNA half-life differences between LCLs at a false discovery rate (FDR) < 0.05, which accounted for ~ 37% of the gene expression differences between individuals. Nonsense polymorphisms were associated with reduced RNA half-lives. In genes presenting interindividual RNA half-life differences, higher coding GC3 contents (G and C percentages at the third-codon positions) were correlated with increased RNA half-life. Consistently, G and C alleles of single nucleotide polymorphisms (SNPs) in protein coding sequences were associated with enhanced RNA stability. These results suggest widespread interindividual differences in RNA stability related to DNA sequence and composition variation.
doi:10.1038/srep01318
PMCID: PMC3576867
PMID: 23422947
Pai, Athma A. | Cain, Carolyn E. | Mizrahi-Man, Orna | De Leon, Sherryl | Lewellen, Noah | Veyrieras, Jean-Baptiste | Degner, Jacob F. | Gaffney, Daniel J. | Pickrell, Joseph K. | Stephens, Matthew | Pritchard, Jonathan K. | Gilad, Yoav | Gibson, Greg
Recent gene expression QTL (eQTL) mapping studies have provided considerable insight into the genetic basis for inter-individual regulatory variation. However, a limitation of all eQTL studies to date, which have used measurements of steady-state gene expression levels, is the inability to directly distinguish between variation in transcription and decay rates. To address this gap, we performed a genome-wide study of variation in gene-specific mRNA decay rates across individuals. Using a time-course study design, we estimated mRNA decay rates for over 16,000 genes in 70 Yoruban HapMap lymphoblastoid cell lines (LCLs), for which extensive genotyping data are available. Considering mRNA decay rates across genes, we found that: (i) as expected, highly expressed genes are generally associated with lower mRNA decay rates, (ii) genes with rapid mRNA decay rates are enriched with putative binding sites for miRNA and RNA binding proteins, and (iii) genes with similar functional roles tend to exhibit correlated rates of mRNA decay. Focusing on variation in mRNA decay across individuals, we estimate that steady-state expression levels are significantly correlated with variation in decay rates in 10% of genes. Somewhat counter-intuitively, for about half of these genes, higher expression is associated with faster decay rates, possibly due to a coupling of mRNA decay with transcriptional processes in genes involved in rapid cellular responses. Finally, we used these data to map genetic variation that is specifically associated with variation in mRNA decay rates across individuals. We found 195 such loci, which we named RNA decay quantitative trait loci (“rdQTLs”). All the observed rdQTLs are located near the regulated genes and therefore are assumed to act in cis. By analyzing our data within the context of known steady-state eQTLs, we estimate that a substantial fraction of eQTLs are associated with inter-individual variation in mRNA decay rates.
Author Summary
Recent studies of functional genetic variation in humans have identified numerous loci that are associated with variation in gene expression levels, called expression quantitative trait loci (eQTLs). The mechanisms by which these loci affect gene expression, however, are still largely unknown. Specifically, since most studies rely on measures of steady-state gene expression levels, they are unable to distinguish between the relative influences of either transcriptional- or decay-related processes. To address this gap, we examined the specific impact of mRNA decay processes on steady-state gene expression levels for over 16,000 genes in human lymphoblastoid cell lines. By characterizing decay rates in 70 individuals, we show that steady-state expression levels are significantly influenced by variation in decay rates for 10% of genes. Yet, for roughly half of these genes, we find that individuals with higher expression levels also have faster decay rates. This pattern points to a non-simple mechanistic interplay between transcriptional and decay processes, especially for genes involved in rapid cellular responses. Finally, we identify 195 genetic variants that are significantly associated with both gene expression variation and variation in mRNA decay rates. Using these data, we estimate that that a substantial fraction of eQTLs are associated with inter-individual variation in mRNA decay rates.
doi:10.1371/journal.pgen.1003000
PMCID: PMC3469421
PMID: 23071454
The International HapMap Project provides a key resource of genotypic data on human lymphoblastoid cell lines derived from four major world populations
of European, African, Chinese and Japanese ancestry for researchers to associate with various phenotypic data to find genes affecting health, disease and
response to drugs. Recently, the HapMap resource has significantly benefited research areas such as gene expression variation studies. Besides some intrinsic
limitations, there are a few challenges that should be considered in the next wave of research using this tremendous resource. We suggest that overcoming
these challenges or considering the confounding variables in the interpretation of results can provide more insights into the current views of the human
genome as well as complex traits such as drug response variation and susceptibility to common diseases.
PMCID: PMC2258425
PMID: 18317571
HapMap; lymphoblastoid cell lines; single nuceotide polymorphism; genotype; gene expression
Lim3 encodes an RNA polymerase II transcription factor with a key role in neuron specification. It was also identified as a candidate gene that affects lifespan. These pleiotropic effects indicate the fundamental significance of the potential interplay between neural development and lifespan control. The goal of this study was to analyze the causal relationships between Lim3 structural variations, and gene expression and lifespan changes, and to provide insights into regulatory pathways controlling lifespan. Fifty substitution lines containing second chromosomes from a Drosophila natural population were used to analyze the association between lifespan and sequence variation in the 5′-regulatory region, and first exon and intron of Lim3A, in which we discovered multiple transcription start sites (TSS). The core and proximal promoter organization for Lim3A and a previously unknown mRNA named Lim3C were described. A haplotype of two markers in the Lim3A regulatory region was significantly associated with variation in lifespan. We propose that polymorphisms in the regulatory region affect gene transcription, and consequently lifespan. Indeed, five polymorphic markers located within 380 to 680 bp of the Lim3A major TSS, including two markers associated with lifespan variation, were significantly associated with the level of Lim3A transcript, as evaluated by real time RT-PCR in embryos, adult heads, and testes. A naturally occurring polymorphism caused a six-fold change in gene transcription and a 25% change in lifespan. Markers associated with long lifespan and intermediate Lim3A transcription were present in the population at high frequencies. We hypothesize that polymorphic markers associated with Lim3A expression are located within the binding sites for proteins that regulate gene function, and provide general rather than tissue-specific regulation of transcription, and that intermediate levels of Lim3A expression confer a selective advantage and longer lifespan.
doi:10.1371/journal.pone.0012621
PMCID: PMC2935391
PMID: 20838645
Pickrell, Joseph K. | Marioni, John C. | Pai, Athma A. | Degner, Jacob F. | Engelhardt, Barbara E. | Nkadori, Everlyne | Veyrieras, Jean-Baptiste | Stephens, Matthew | Gilad, Yoav | Pritchard, Jonathan K.
Nature
2010;464(7289):768-772.
Understanding the genetic mechanisms underlying natural variation in gene expression is a central goal of both medical and evolutionary genetics, and studies of expression quantitative trait loci (eQTLs) have become an important tool for achieving this goal1. Although all eQTL studies so far have assayed messenger RNA levels using expression microarrays, recent advances in RNA sequencing enable the analysis of transcript variation at unprecedented resolution. We sequenced RNA from 69 lymphoblastoid cell lines derived from unrelated Nigerian individuals that have been extensively genotyped by the International HapMap Project2. By pooling data from all individuals, we generated a map of the transcriptional landscape of these cells, identifying extensive use of unannotated untranslated regions and more than 100 new putative protein-coding exons. Using the genotypes from the HapMap project, we identified more than a thousand genes at which genetic variation influences overall expression levels or splicing. We demonstrate that eQTLs near genes generally act by a mechanism involving allele-specific expression, and that variation that influences the inclusion of an exon is enriched within and near the consensus splice sites. Our results illustrate the power of high-throughput sequencing for the joint analysis of variation in transcription, splicing and allele-specific expression across individuals.
doi:10.1038/nature08872
PMCID: PMC3089435
PMID: 20220758
Background
Usually the reference genes used in gene expression analysis have been chosen for their known or suspected housekeeping roles, however the variation observed in most of them hinders their effective use. The assessed lack of validated reference genes emphasizes the importance of a systematic study for their identification. For selecting candidate reference genes we have developed a simple in silico method based on the data publicly available in the wheat databases Unigene and TIGR.
Results
The expression stability of 32 genes was assessed by qRT-PCR using a set of cDNAs from 24 different plant samples, which included different tissues, developmental stages and temperature stresses. The selected sequences included 12 well-known HKGs representing different functional classes and 20 genes novel with reference to the normalization issue. The expression stability of the 32 candidate genes was tested by the computer programs geNorm and NormFinder using five different data-sets. Some discrepancies were detected in the ranking of the candidate reference genes, but there was substantial agreement between the groups of genes with the most and least stable expression. Three new identified reference genes appear more effective than the well-known and frequently used HKGs to normalize gene expression in wheat. Finally, the expression study of a gene encoding a PDI-like protein showed that its correct evaluation relies on the adoption of suitable normalization genes and can be negatively affected by the use of traditional HKGs with unstable expression, such as actin and α-tubulin.
Conclusion
The present research represents the first wide screening aimed to the identification of reference genes and of the corresponding primer pairs specifically designed for gene expression studies in wheat, in particular for qRT-PCR analyses. Several of the new identified reference genes outperformed the traditional HKGs in terms of expression stability under all the tested conditions. The new reference genes will enable more accurate normalization and quantification of gene expression in wheat and will be helpful for designing primer pairs targeting orthologous genes in other plant species.
doi:10.1186/1471-2199-10-11
PMCID: PMC2667184
PMID: 19232096
The exploration of quantitative variation in complex traits such as gene expression and drug response in human populations has become one of the major priorities for medical genetics. The International HapMap Project provides a key resource of genotypic data on human lymphoblastoid cell lines derived from four major world populations of European, African, Chinese and Japanese ancestry for researchers to associate with various phenotypic data to find genes affecting health, disease and response to drugs. Recent progress in dissecting genetic contribution to natural variation in gene expression within and among human populations and variation in drug response are two examples in which researchers have utilized the HapMap resource. The HapMap Project provides new insights into the human genome and has applicability to pharmacogenomics studies leading to personalized medicine.
PMCID: PMC2288550
PMID: 18392109
HapMap; lymphoblastoid cell lines; genotype; gene expression; population genetics
The exploration of quantitative variation in complex traits such as gene expression and drug response in human populations has become one of the major priorities for medical genetics. The International HapMap Project provides a key resource of genotypic data on human lymphoblastoid cell lines derived from four major world populations of European, African, Chinese and Japanese ancestry for researchers to associate with various phenotypic data to find genes affecting health, disease and response to drugs. Recent progress in dissecting genetic contribution to natural variation in gene expression within and among human populations and variation in drug response are two examples in which researchers have utilized the HapMap resource. The HapMap Project provides new insights into the human genome and has applicability to pharmacogenomics studies leading to personalized medicine.
PMCID: PMC2288550
PMID: 18392109
HapMap; Lymphoblastoid cell lines; Genotype; Gene expression; Population genetics
Hartman, William R. | Pelleymounter, Linda L. | Moon, Irene | Kalari, Krishna | Liu, Mohan | Wu, Tse-Yu | Escande, Carlos | Nin, Veronica | Chini, Eduardo N. | Weinshilboum, Richard M.
CD38 is an ecto-enzyme that hydrolyzes NAD. Its expression is a prognostic marker for chronic lymphocytic leukemia. We have characterized individual variation in CD38 expression in lymphoblastoid cell lines from 288 healthy subjects of three ethnicities. Expression varied widely, with significant differences among ethnic groups, and was correlated significantly with CD38 enzymatic activity and protein levels. The CD38 gene was then resequenced using DNA from the same cell lines, with the identification of 53 single nucleotide polymorphisms (SNPs) and one indel – 39 novel. One SNP, rs1130169, was significantly associated with CD38 mRNA expression and explained a portion of the difference in expression among ethnic groups. EMS assay showed nuclear protein binding at or near this SNP. We also determined that variation in CD38 expression in these cell lines was associated with variation in antineoplastic drug sensitivity. These results represent a step toward understanding mechanisms involved in CD38 expression.
doi:10.3109/10428194.2010.483299
PMCID: PMC2892000
PMID: 20470215
CD38; mRNA expression; gene expression; expression variation; NAD; microarray; single nucleotide polymorphisms; SNPs; polymorphism
Gene expression varies widely between individuals of a population, and regulatory change can underlie phenotypes of evolutionary and biomedical relevance. A key question in the field is how DNA sequence variants impact gene expression, with most mechanistic studies to date focused on the effects of genetic change on regulatory regions upstream of protein-coding sequence. By contrast, the role of RNA 3′-end processing in regulatory variation remains largely unknown, owing in part to the challenge of identifying functional elements in 3′ untranslated regions. In this work, we conducted a genomic survey of transcript ends in lymphoblastoid cells from genetically distinct human individuals. Our analysis mapped the cis-regulatory architecture of 3′ gene ends, finding that transcript end positions did not fall randomly in untranslated regions, but rather preferentially flanked the locations of 3′ regulatory elements, including miRNA sites. The usage of these transcript length forms and motifs varied across human individuals, and polymorphisms in polyadenylation signals and other 3′ motifs were significant predictors of expression levels of the genes in which they lay. Independent single-gene experiments confirmed the effects of polyadenylation variants on steady-state expression of their respective genes, and validated the regulatory function of 3′ cis-regulatory sequence elements that mediated expression of these distinct RNA length forms. Focusing on the immune regulator IRF5, we established the effect of natural variation in RNA 3′-end processing on regulatory response to antigen stimulation. Our results underscore the importance of two mechanisms at play in the genetics of 3′-end variation: the usage of distinct 3′-end processing signals and the effects of 3′ sequence elements that determine transcript fate. Our findings suggest that the strategy of integrating observed 3′-end positions with inferred 3′ regulatory motifs will prove to be a critical tool in continued efforts to interpret human genome variation.
Author Summary
Messenger RNAs carry the instructions necessary to synthesize proteins that do work for the cell. Extending beyond the protein-coding sequence of a given mRNA is an additional stretch of sequence, harboring signals that govern how much protein is made and how long the mRNA remains in the cell before it is broken down. The incorporation of this end region into mature mRNA is itself subject to change; for the vast majority of human genes, how and why cells use different mRNA ends remains largely unknown. In this work, we surveyed mRNA ends from ∼10,000 genes in immune cells from genetically distinct human individuals. We found that mRNA end positions were not randomly distributed, but rather preferentially flanked the locations of regulatory signals that govern mRNA fate. The usage of these mRNA length forms and regulatory elements varied across individuals and could be dissected molecularly. Our results uncover key mechanisms and regulatory effects of transcript end processing, particularly as these are perturbed by genetic differences between humans.
doi:10.1371/journal.pgen.1002882
PMCID: PMC3420953
PMID: 22916029
Genome-wide association studies have been successful in identifying common variants for common complex traits in recent years. However, common variants have generally failed to explain substantial proportions of the trait heritabilities. Rare variants, structural variations, and gene-gene and gene-environment interactions, among others, have been suggested as potential sources of the so-called missing heritability. With the advent of exome-wide and whole-genome next-generation sequencing technologies, finding rare variants in functionally important sites (e.g., protein-coding regions) becomes feasible. We investigate the role of linkage information to select families enriched for rare variants using the simulated Genetic Analysis Workshop 17 data. In each replicate of simulated phenotypes Q1 and Q2 on 697 subjects in 8 extended pedigrees, we select one pedigree with the largest family-specific LOD score. Across all 200 replications, we compare the probability that rare causal alleles will be carried in the selected pedigree versus a randomly chosen pedigree. One example of successful enrichment was exhibited for gene VEGFC. The causal variant had minor allele frequency of 0.0717% in the simulated unrelated individuals and explained about 0.1% of the phenotypic variance. However, it explained 7.9% of the phenotypic variance in the eight simulated pedigrees and 23.8% in the family that carried the minor allele. The carrier’s family was selected in all 200 replications. Thus our results show that family-specific linkage information is useful for selecting families for sequencing, thus ensuring that rare functional variants are segregating in the sequencing samples.
doi:10.1186/1753-6561-5-S9-S82
PMCID: PMC3287923
PMID: 22373363
Mapping of expression quantitative trait loci (eQTLs) is an important technique for studying how genetic variation affects gene regulation in natural populations. In a previous study using Illumina expression data from human lymphoblastoid cell lines, we reported that cis-eQTLs are especially enriched around transcription start sites (TSSs) and immediately upstream of transcription end sites (TESs). In this paper, we revisit the distribution of eQTLs using additional data from Affymetrix exon arrays and from RNA sequencing. We confirm that most eQTLs lie close to the target genes; that transcribed regions are generally enriched for eQTLs; that eQTLs are more abundant in exons than introns; and that the peak density of eQTLs occurs at the TSS. However, we find that the intriguing TES peak is greatly reduced or absent in the Affymetrix and RNA-seq data. Instead our data suggest that the TES peak observed in the Illumina data is mainly due to exon-specific QTLs that affect 3′ untranslated regions, where most of the Illumina probes are positioned. Nonetheless, we do observe an overall enrichment of eQTLs in exons versus introns in all three data sets, consistent with an important role for exonic sequences in gene regulation.
doi:10.1371/journal.pone.0030629
PMCID: PMC3281037
PMID: 22359548
Stranger, Barbara E. | Nica, Alexandra C. | Forrest, Matthew S. | Dimas, Antigone | Bird, Christine P. | Beazley, Claude | Ingle, Catherine E. | Dunning, Mark | Flicek, Paul | Koller, Daphne | Montgomery, Stephen | Tavaré, Simon | Deloukas, Panagiotis | Dermitzakis, Emmanouil T.
Genetic variation influences gene expression, and this can be efficiently mapped to specific genomic regions and variants. We used gene expression profiling of EBV-transformed lymphoblastoid cell lines of all 270 individuals of the HapMap consortium to elucidate the detailed features of genetic variation underlying gene expression variation. We find gene expression levels to be heritable and differentiation between populations in agreement with earlier small-scale studies. A detailed association analysis of over 2.2 million common SNPs per population (5% frequency HapMap) with gene expression identified at least 1348 genes with association signals in cis and at least 180 in trans. Replication in at least one independent population was achieved for 37% of cis- signals and 15% of trans- signals, respectively. Our results strongly support an abundance of cis- regulatory variation in the human genome. Detection of trans- effects is limited but suggests that regulatory variation may be the key primary effect contributing to phenotypic variation in humans. Finally, we explore a variety of methodologies that improve the current state of analysis of gene expression variation.
doi:10.1038/ng2142
PMCID: PMC2683249
PMID: 17873874
Stranger, Barbara E. | Montgomery, Stephen B. | Dimas, Antigone S. | Parts, Leopold | Stegle, Oliver | Ingle, Catherine E. | Sekowska, Magda | Smith, George Davey | Evans, David | Gutierrez-Arcelus, Maria | Price, Alkes | Raj, Towfique | Nisbett, James | Nica, Alexandra C. | Beazley, Claude | Durbin, Richard | Deloukas, Panos | Dermitzakis, Emmanouil T. | Barsh, Gregory S.
The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants, but also more recently to assist in the interpretation and elucidation of disease signals. To date, many studies have looked in specific tissues and population-based samples, but there has been limited assessment of the degree of inter-population variability in regulatory variation. We analyzed genome-wide gene expression in lymphoblastoid cell lines from a total of 726 individuals from 8 global populations from the HapMap3 project and correlated gene expression levels with HapMap3 SNPs located in cis to the genes. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We further dissect the specific functional pathways differentiated between populations. We also identify 5,691 expression quantitative trait loci (eQTLs) after controlling for both non-genetic factors and population admixture and observe that half of the cis-eQTLs are replicated in one or more of the populations. We highlight patterns of eQTL-sharing between populations, which are partially determined by population genetic relatedness, and discover significant sharing of eQTL effects between Asians, European-admixed, and African subpopulations. Specifically, we observe that both the effect size and the direction of effect for eQTLs are highly conserved across populations. We observe an increasing proximity of eQTLs toward the transcription start site as sharing of eQTLs among populations increases, highlighting that variants close to TSS have stronger effects and therefore are more likely to be detected across a wider panel of populations. Together these results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation and provide an estimate for the transferability of complex trait variants across populations.
Author Summary
Variation among individuals in the degree to which genes are expressed (i.e. turned on or off) is a characteristic exhibited by all species, and studies have identified regions of the genome harboring genetic variation affecting gene expression levels. To assess the degree of human inter-population variability in regulatory variation, we describe mapping of regions of the genome that have functional effects on gene expression levels. We analyzed genome-wide gene expression in human cell lines derived from 726 unrelated individuals representing 8 global populations that have been genetically well-characterized by the International HapMap Project. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We identify ∼5,700 genes whose expression levels are associated with genetic variation located physically close to the gene, and we observe significant sharing of associations that is partially dependent on population genetic relatedness, among Asians, European-admixed, and African subpopulations. We identify biological functions affected by regulatory variation and describe common and unique characteristics of population-specific and population-shared associations. These results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation.
doi:10.1371/journal.pgen.1002639
PMCID: PMC3330104
PMID: 22532805
The importance of gene regulation in animal evolution is a matter of long-standing interest, but measuring the impact of selection on gene expression has proven a challenge. Here, we propose a selection index of gene expression as a straightforward method for assessing the mode and strength of selection operating on gene expression levels. The index is based on the widely used McDonald-Kreitman test and requires the estimation of four quantities: the within-species and between-species expression variances as well as the sequence heterozygosity and divergence of neutrally evolving sequences. We apply the method to data from human and chimpanzee lymphoblastoid cell lines and show that gene expression is in general under strong stabilizing selection. We also demonstrate how the same framework can be used to estimate the proportion of adaptive gene expression evolution.
doi:10.1371/journal.pone.0034935
PMCID: PMC3329554
PMID: 22529958