The transmission of information from DNA to RNA is a critical process. We compared RNA sequences from human B cells of 27 individuals to the corresponding DNA sequences from the same individuals and uncovered more than 10,000 exonic sites where the RNA sequences do not match that of the DNA. All 12 possible categories of discordances were observed. These differences were nonrandom as many sites were found in multiple individuals and in different cell types, including primary skin cells and brain tissues. Using mass spectrometry, we detected peptides that are translated from the discordant RNA sequences and thus do not correspond exactly to the DNA sequences. These widespread RNA-DNA differences in the human transcriptome provide a yet unexplored aspect of genome variation.
To study the genetic basis of natural variation in gene expression, we previously carried out genome-wide linkage analysis and mapped the determinants of ~1,000 expression phenotypes1. In the present study, we carried out association analysis with dense sets of single-nucleotide polymorphism (SNP) markers from the International HapMap Project2. For 374 phenotypes, the association study was performed with markers only from regions with strong linkage evidence; these regions all mapped close to the expressed gene. For a subset of 27 phenotypes, analysis of genome-wide association was performed with >770,000 markers. The association analysis with markers under the linkage peaks confirmed the linkage results and narrowed the candidate regulatory regions for many phenotypes with strong linkage evidence. The genome-wide association analysis yielded highly significant results that point to the same locations as the genome scans for about 50% of the phenotypes. For one candidate determinant, we carried out functional analyses and confirmed the variation in cis-acting regulatory activity. Our findings suggest that association studies with dense SNP maps will identify susceptibility loci or other determinants for some complex traits or diseases.
Humans are exposed to radiation through the environment and in medical settings. To deal with radiation-induced damage, cells mount complex responses that rely on changes in gene expression. These gene expression responses differ greatly between individuals1 and contribute to individual differences in response to radiation2. Here we identify regulators that influence expression levels of radiation-responsive genes. We treated radiation-induced changes in gene expression as quantitative phenotypes3,4, and conducted genetic linkage and association studies to map their regulators. For more than 1,200 of these phenotypes there was significant evidence of linkage to specific chromosomal regions. Nearly all of the regulators act in trans to influence the expression of their target genes; there are very few cis-acting regulators. Some of the transacting regulators are transcription factors, but others are genes that were not known to have a regulatory function in radiation response. These results have implications for our basic and clinical understanding of how human cells respond to radiation.
Our genotype inference method combines sparse marker data from a linkage scan and high-resolution SNP genotypes for several individuals to infer genotypes for related individuals. We illustrate the method’s utility by inferring over 53 million SNP genotypes for 78 children in the Centre d’Etude du Polymorphisme Humain families. The method can be used to obtain high-density genotypes in different family structures, including nuclear families commonly used in complex disease gene mapping studies.
Variation in DNA sequence contributes to individual differences in quantitative traits, but in humans the specific sequence variants are known for very few traits. We characterized variation in gene expression in cells from individuals belonging to three major population groups. This quantitative phenotype differs significantly between European-derived and Asian-derived populations for 1,097 of 4,197 genes tested. For the phenotypes with the strongest evidence of cis determinants, most of the variation is due to allele frequency differences at cis-linked regulators. The results show that specific genetic variation among populations contributes appreciably to differences in gene expression phenotypes. Populations differ in prevalence of many complex genetic diseases, such as diabetes and cardiovascular disease. As some of these are probably influenced by the level of gene expression, our results suggest that allele frequency differences at regulatory polymorphisms also account for some population differences in prevalence of complex diseases.
There is extensive natural variation in human gene expression. As quantitative phenotypes, expression levels of genes are heritable. Genetic linkage and association mapping have identified cis- and trans-acting DNA variants that influence expression levels of human genes. New insights into human gene regulation are emerging from genetic analyses of gene expression in cells at rest and following exposure to stimuli. The integration of these genetic mapping results with data from co-expression networks is leading to a better understanding of how expression levels of individual genes are regulated and how genes interact with each other. These findings are important for basic understanding of gene regulation and of diseases that result from disruption of normal gene regulation.
Natural variation in gene expression is extensive in humans and other organisms, and variation in the baseline expression level of many genes has a heritable component. To localize the genetic determinants of these quantitative traits (expression phenotypes) in humans, we used microarrays to measure gene expression levels and performed genome-wide linkage analysis for expression levels of 3,554 genes in 14 large families. For approximately 1,000 expression phenotypes, there was significant evidence of linkage to specific chromosomal regions. Both cis- and trans-acting loci regulate variation in the expression levels of genes, although most act in trans. Many gene expression phenotypes are influenced by several genetic determinants. Furthermore, we found hotspots of transcriptional regulation where significant evidence of linkage for several expression phenotypes (up to 31) coincides, and expression levels of many genes that share the same regulatory region are significantly correlated. The combination of microarray techniques for phenotyping and linkage analysis for quantitative traits allows the genetic mapping of determinants that contribute to variation in human gene expression.
Using genetic and molecular analyses, we identified over 1,000 polymorphic regulators that regulate expression levels of human genes.
Expression levels of human genes vary extensively among individuals. This variation facilitates analyses of expression levels as quantitative phenotypes in genetic studies where the entire genome can be scanned for regulators without prior knowledge of the regulatory mechanisms, thus enabling the identification of unknown regulatory relationships. Here, we carried out such genetic analyses with a large sample size and identified cis- and trans-acting polymorphic regulators for about 1,000 human genes. We validated the cis-acting regulators by demonstrating differential allelic expression with sequencing of transcriptomes (RNA-Seq) and the trans-regulators by gene knockdown, metabolic assays, and chromosome conformation capture analysis. The majority of the regulators act in trans to the target (regulated) genes. Most of these trans-regulators were not known to play a role in gene expression regulation. The identification of these regulators enabled the characterization of polymorphic regulation of human gene expression at a resolution that was unattainable in the past.
Cellular characteristics and functions are determined largely by gene expression and expression levels differ among individuals, however it is not clear how these levels are regulated. While many cis-acting DNA sequence variants in promoters and enhancers that influence gene expression have been identified, only a few polymorphic trans-regulators of human genes are known. Here, we used human B-cells from individuals belonging to large families and identified polymorphic trans-regulators for about 1,000 human genes. We validated these results by gene knockdown, metabolic perturbation studies and chromosome conformation capture assays. Although these regulatory relationships were identified in cultured B-cells, we show that some of the relationships were also found in primary fibroblasts. The large number of regulators allowed us to better understand gene expression regulation, to uncover new gene functions, and to identify their roles in disease processes. This study shows that genetic variation is a powerful tool not only for gene mapping but also to study gene interaction and regulation.
Single nucleotide polymorphisms (SNPs) in transcription factor binding sites (TFBSs) may affect the binding of transcription factors, lead to differences in gene expression and phenotypes, and therefore affect susceptibility to environmental exposure. We developed an integrated computational system for discovering functional SNPs in TFBSs in the human genome and predicting their impact on the expression of target genes. In this system we: (1) construct a position weight matrix (PWM) from a collection of experimentally discovered TFBSs; (2) predict TFBSs in SNP sequences using the PWM and map SNPs to the upstream regions of genes; (3) examine the evolutionary conservation of putative TFBSs by phylogenetic footprinting; (4) prioritize candidate SNPs based on microarray expression profiles from tissues in which the transcription factor of interest is either deleted or over-expressed; and (5) finally, analyze association of SNP genotypes with gene expression phenotypes. The application of our system has been tested to identify functional polymorphisms in the antioxidant response element (ARE), a cis-acting enhancer sequence found in the promoter region of many genes that encode antioxidant and Phase II detoxification enzymes/proteins. In response to oxidative stress, the transcription factor NRF2 (nuclear factor erythroid-derived 2-like 2) binds to AREs, mediating transcriptional activation of its responsive genes and modulating in vivo defense mechanisms against oxidative damage. Using our novel computational tools, we have identified a set of polymorphic AREs with functional evidence, showing the utility of our system to direct further experimental validation of genomic sequence variations that could be useful for identifying high-risk individuals.
The number of recombination events per meiosis varies extensively among individuals. This recombination phenotype differs between female and male, and also among individuals of each gender. In this study, we used high-density SNP genotypes of over 2,300 individuals and their offspring in two datasets to characterize recombination landscape and to map the genetic variants that contribute to variation in recombination phenotypes. We found six genetic loci that are associated with recombination phenotypes. Two of these (RNF212 and an inversion on chromosome 17q21.31) were previously reported in the Icelandic population, and this is the first replication in any other population. Of the four newly identified loci (KIAA1462, PDZK1, UGCG, NUB1), results from expression studies provide support for their roles in meiosis. Each of the variants that we identified explains only a small fraction of the individual variation in recombination. Notably, we found different sequence variants associated with female and male recombination phenotypes, suggesting that they are regulated by different genes. Characterization of genetic variants that influence natural variation in meiotic recombination will lead to a better understanding of normal meiotic events as well as of non-disjunction, the primary cause of pregnancy loss.
Meiotic recombination is essential for the formation of human gametes and is a key process that generates genetic diversity. Given its importance, we would expect the number and location of exchanges to be tightly regulated. However, studies show significant gender and inter-individual variation in genome-wide recombination rates. The genetic basis for this variation is poorly understood. In this study, we used genotypes from high-density single nucleotide polymorphism (SNP) markers of 2,315 individuals and their children from two Caucasian samples in a genome-wide association study to identify genetic variants that influence the number of meiotic recombination events per gamete. We found three loci that influence female recombination and three different loci that influence male recombination. Our results suggest that gender differences in recombination result from differences in the genetic regulation of female and male meiosis. Also, each identified locus only explains a small proportion of variance; together, each set of loci explains about 10% of the variation in the gender-specific recombination phenotype. This suggests a mechanism for variability in recombination that is essential for genetic diversity while maintaining the number of recombinations within a range to ensure proper chromosome segregation.
Variation in gene expression is a fundamental aspect of human phenotypic variation. Several recent studies have analyzed gene expression levels in populations of different continental ancestry and reported population differences at a large number of genes. However, these differences could largely be due to non-genetic (e.g., environmental) effects. Here, we analyze gene expression levels in African American cell lines, which differ from previously analyzed cell lines in that individuals from this population inherit variable proportions of two continental ancestries. We first relate gene expression levels in individual African Americans to their genome-wide proportion of European ancestry. The results provide strong evidence of a genetic contribution to expression differences between European and African populations, validating previous findings. Second, we infer local ancestry (0, 1, or 2 European chromosomes) at each location in the genome and investigate the effects of ancestry proximal to the expressed gene (cis) versus ancestry elsewhere in the genome (trans). Both effects are highly significant, and we estimate that 12±3% of all heritable variation in human gene expression is due to cis variants.
Variation in gene expression is a fundamental aspect of human phenotypic variation, and understanding how this variation is apportioned among human populations is an important aim. Previous studies have compared gene expression levels between distinct populations, but it is unclear whether the differences that were observed have a genetic or nongenetic basis. Admixed populations, such as African Americans, offer a solution to this problem because individuals vary in their proportion of European ancestry while the analysis of a single population minimizes nongenetic factors. Here, we show that differences in gene expression among African Americans of different ancestry proportions validate gene expression differences between European and African populations. Furthermore, by drawing a distinction between an African American individual's ancestry at the location of a gene whose expression is being analyzed (cis) versus at distal locations (trans), we can use ancestry effects to quantify the relative contributions of cis and trans regulation to human gene expression. We estimate that 12±3% of all heritable variation in human gene expression is due to cis variants.
Heterogeneity poses a challenge to linkage mapping. Here, we apply a latent class extension of Haseman-Elston regression to expression phenotypes with significant evidence of linkage to trans regulators in 14 large pedigrees. We test for linkage, accounting for heterogeneity, and classify individual families as "linked" and "unlinked" on the basis of their contribution to the overall evidence of linkage.
Here we describe the data provided for Problem 1 of Genetic Analysis Workshop 15. The data provided for Problem 1 were unusual in two ways. First, the phenotype was the level of gene expression for each gene, not a conventional phenotype like height or disease, and second, there were more than 3500 such phenotypes. Natural variation in gene expression was a new idea in 2004 when these data were collected and published. Because the phenotypes were measured in members of 14 Centre d'Etude du Polymorphisme Humain (CEPH) families, there was an opportunity for linkage mapping on a very large scale. For this purpose, 2882 single-nucleotide polymorphism genotypes were also provided for each family member.
GenMapDB (http://genomics.med.upenn.edu/genmapdb) is
a repository of human bacterial artificial chromosome (BAC) clones
mapped by our laboratory to sequence-tagged site markers. Currently,
GenMapDB contains over 3000 mapped clones that span 19 chromosomes,
chromosomes 2, 4, 5, 9–22, X and Y. This database provides
positional information about human BAC clones from the RPCI-11 human
male BAC library. It also contains restriction fragment analysis
data and end sequences of the clones. GenMapDB is freely available
to the public. The main purpose of GenMapDB is to organize the mapping data
and to allow the research community to search for mapped BAC clones
that can be used in gene mapping studies and chromosomal mutation
Many disease-associated variants affect gene expression levels (expression quantitative trait loci, eQTLs) and expression profiling using next generation sequencing (NGS) technology is a powerful way to detect these eQTLs. We analyzed 94 total blood samples from healthy volunteers with DeepSAGE to gain specific insight into how genetic variants affect the expression of genes and lengths of 3′-untranslated regions (3′-UTRs). We detected previously unknown cis-eQTL effects for GWAS hits in disease- and physiology-associated traits. Apart from cis-eQTLs that are typically easily identifiable using microarrays or RNA-sequencing, DeepSAGE also revealed many cis-eQTLs for antisense and other non-coding transcripts, often in genomic regions containing retrotransposon-derived elements. We also identified and confirmed SNPs that affect the usage of alternative polyadenylation sites, thereby potentially influencing the stability of messenger RNAs (mRNA). We then combined the power of RNA-sequencing with DeepSAGE by performing a meta-analysis of three datasets, leading to the identification of many more cis-eQTLs. Our results indicate that DeepSAGE data is useful for eQTL mapping of known and unknown transcripts, and for identifying SNPs that affect alternative polyadenylation. Because of the inherent differences between DeepSAGE and RNA-sequencing, our complementary, integrative approach leads to greater insight into the molecular consequences of many disease-associated variants.
Many genetic variants that are associated with diseases also affect gene expression levels. We used a next generation sequencing approach targeting 3′ transcript ends (DeepSAGE) to gain specific insight into how genetic variants affect the expression of genes and the usage and length of 3′-untranslated regions. We detected many associations for antisense and other non-coding transcripts, often in genomic regions containing retrotransposon-derived elements. Some of these variants are also associated with disease. We also identified and confirmed variants that affect the usage of alternative polyadenylation sites, thereby potentially influencing the stability of mRNAs. We conclude that DeepSAGE is useful for detecting eQTL effects on both known and unknown transcripts, and for identifying variants that affect alternative polyadenylation.
Recently it has become clear that only a small percentage (7%) of disease-associated single nucleotide polymorphisms (SNPs) are located in protein-coding regions, while the remaining 93% are located in gene regulatory regions or in intergenic regions. Thus, the understanding of how genetic variations control the expression of non-coding RNAs (in a tissue-dependent manner) has far-reaching implications. We tested the association of SNPs with expression levels (eQTLs) of large intergenic non-coding RNAs (lincRNAs), using genome-wide gene expression and genotype data from five different tissues. We identified 112 cis-regulated lincRNAs, of which 45% could be replicated in an independent dataset. We observed that 75% of the SNPs affecting lincRNA expression (lincRNA cis-eQTLs) were specific to lincRNA alone and did not affect the expression of neighboring protein-coding genes. We show that this specific genotype-lincRNA expression correlation is tissue-dependent and that many of these lincRNA cis-eQTL SNPs are also associated with complex traits and diseases.
Large intergenic non-coding RNAs (lincRNAs) are the largest class of non-coding RNA molecules in the human genome. Many genome-wide association studies (GWAS) have mapped disease-associated genetic variants (SNPs) to, or in, the vicinity of such lincRNA regions. However, it is not clear how these SNPs can affect the disease. We tested whether SNPs were also associated with the lincRNA expression levels in five different human primary tissues. We observed that there is a strong genotype-lincRNA expression correlation that is tissue-dependent. Many of the observed lincRNA cis-eQTLs are disease- or trait-associated SNPs. Our results suggest that lincRNA-eQTLs represent a novel link between non-coding SNPs and the expression of protein-coding genes, which can be exploited to understand the process of gene-regulation through lincRNAs in more detail.
For many complex traits, genetic variants have been found associated. However, it is still mostly unclear through which downstream mechanism these variants cause these phenotypes. Knowledge of these intermediate steps is crucial to understand pathogenesis, while also providing leads for potential pharmacological intervention. Here we relied upon natural human genetic variation to identify effects of these variants on trans-gene expression (expression quantitative trait locus mapping, eQTL) in whole peripheral blood from 1,469 unrelated individuals. We looked at 1,167 published trait- or disease-associated SNPs and observed trans-eQTL effects on 113 different genes, of which we replicated 46 in monocytes of 1,490 different individuals and 18 in a smaller dataset that comprised subcutaneous adipose, visceral adipose, liver tissue, and muscle tissue. HLA single-nucleotide polymorphisms (SNPs) were 10-fold enriched for trans-eQTLs: 48% of the trans-acting SNPs map within the HLA, including ulcerative colitis susceptibility variants that affect plausible candidate genes AOAH and TRBV18 in trans. We identified 18 pairs of unlinked SNPs associated with the same phenotype and affecting expression of the same trans-gene (21 times more than expected, P<10−16). This was particularly pronounced for mean platelet volume (MPV): Two independent SNPs significantly affect the well-known blood coagulation genes GP9 and F13A1 but also C19orf33, SAMD14, VCL, and GNG11. Several of these SNPs have a substantially higher effect on the downstream trans-genes than on the eventual phenotypes, supporting the concept that the effects of these SNPs on expression seems to be much less multifactorial. Therefore, these trans-eQTLs could well represent some of the intermediate genes that connect genetic variants with their eventual complex phenotypic outcomes.
Many genetic variants have been found associated with diseases. However, for many of these genetic variants, it remains unclear how they exert their effect on the eventual phenotype. We investigated genetic variants that are known to be associated with diseases and complex phenotypes and assessed whether these variants were also associated with gene expression levels in a set of 1,469 unrelated whole blood samples. For several diseases, such as type 1 diabetes and ulcerative colitis, we observed that genetic variants affect the expression of genes, not implicated before. For complex traits, such as mean platelet volume and mean corpuscular volume, we observed that independent genetic variants on different chromosomes influence the expression of exactly the same genes. For mean platelet volume, these genes include well-known blood coagulation genes but also genes with still unknown functions. These results indicate that, by systematically correlating genetic variation with gene expression levels, it is possible to identify downstream genes, which provide important avenues for further research.
We conducted a comprehensive study of copy number variants (CNVs) well-tagged by SNPs (r2≥0.8) by analyzing their effect on gene expression and their association with disease susceptibility and other complex human traits. We tested whether these CNVs were more likely to be functional than frequency-matched SNPs as trait-associated loci or as expression quantitative trait loci (eQTLs) influencing phenotype by altering gene regulation. Our study found that CNV–tagging SNPs are significantly enriched for cis eQTLs; furthermore, we observed that trait associations from the NHGRI catalog show an overrepresentation of SNPs tagging CNVs relative to frequency-matched SNPs. We found that these SNPs tagging CNVs are more likely to affect multiple expression traits than frequency-matched variants. Given these findings on the functional relevance of CNVs, we created an online resource of expression-associated CNVs (eCNVs) using the most comprehensive population-based map of CNVs to inform future studies of complex traits. Although previous studies of common CNVs that can be typed on existing platforms and/or interrogated by SNPs in genome-wide association studies concluded that such CNVs appear unlikely to have a major role in the genetic basis of several complex diseases examined, our findings indicate that it would be premature to dismiss the possibility that even common CNVs may contribute to complex phenotypes and at least some common diseases.
Despite the large number of SNPs found to be reproducibly associated with complex diseases, they collectively account for only a small proportion of the overall heritability to such traits. CNVs have thus been proposed to explain some of the missing heritability and to alter disease susceptibility. However, a recent study of the genetics of 8 common diseases involving 16,000 cases and 3,000 controls failed to identify any novel CNVs associated with disease and concluded that CNVs are unlikely to play a major role in their etiology. Studies we report here show that we must be careful not to dismiss the possibility that CNVs may indeed underlie some of the observed associations with complex disease. Our findings show that well-tagged CNVs are disproportionately more likely to be eQTLs, as well as cis-eQTLs, than frequency-matched SNPs; furthermore, reproducible trait associations, as represented in the NHGRI catalog, are enriched for well-tagged CNVs than frequency-matched SNPs. Because of these findings on the strong functional relevance of these CNVs, we created a database (available at http://www.scandb.org/) of expression associated CNVs to supplement our earlier studies of SNP eQTLs and to contribute to future studies of the genetics of complex traits.
Transcriptional signatures are an indispensible source of correlative information on disease-related molecular alterations on a genome-wide level. Numerous candidate genes involved in disease and in factors of predictive, as well as of prognostic, value have been deduced from such molecular portraits, e.g. in cancer. However, mechanistic insights into the regulatory principles governing global transcriptional changes are lagging behind extensive compilations of deregulated genes. To identify regulators of transcriptome alterations, we used an integrated approach combining transcriptional profiling of colorectal cancer cell lines treated with inhibitors targeting the receptor tyrosine kinase (RTK)/RAS/mitogen-activated protein kinase pathway, computational prediction of regulatory elements in promoters of co-regulated genes, chromatin-based and functional cellular assays. We identified commonly co-regulated, proliferation-associated target genes that respond to the MAPK pathway. We recognized E2F and NFY transcription factor binding sites as prevalent motifs in those pathway-responsive genes and confirmed the predicted regulatory role of Y-box binding protein 1 (YBX1) by reporter gene, gel shift, and chromatin immunoprecipitation assays. We also validated the MAPK-dependent gene signature in colorectal cancers and provided evidence for the association of YBX1 with poor prognosis in colorectal cancer patients. This suggests that MEK/ERK-dependent, YBX1-regulated target genes are involved in executing malignant properties.
The simultaneous analysis of gene expression in cancer using microarrays is a standard approach for monitoring disease-related modifications involved in tumorigenesis, triggering malignant properties and clinical behavior. However, the factors that drive these alterations most often remain elusive. We sought to identify transcription factors that mediate the transcriptional effects of the receptor tyrosine kinase/RAS oncoprotein pathway, a frequently activated oncogenic signaling system, in cultured colorectal cancer cells. We used an integrated approach combining molecular and functional assays, as well as computational tools, to identify regulatory factors that trigger the alterations of gene expression and modulate cellular growth. We identified the YBX1 protein, a member of the highly conserved family of cold shock domain transcription factors, as a regulator of signaling effects triggered by the RAS cancer gene. Then we assayed the messenger RNA expression of YBX1 and YBX1-responsive target genes by interrogating microarrays, and also expression of the YBX1 protein by immunohistochemistry in colorectal tumors. We found that YBX1 expression is correlated with a bad clinical outcome in colon cancer patients.
A reciprocal translocation involving chromosomes 8 and 21 generates the AML1/ETO oncogenic transcription factor that initiates acute myeloid leukemia by recruiting co-repressor complexes to DNA. AML1/ETO interferes with the function of its wild-type counterpart, AML1, by directly targeting AML1 binding sites. However, transcriptional regulation determined by AML1/ETO probably relies on a more complex network, since the fusion protein has been shown to interact with a number of other transcription factors, in particular E-proteins, and may therefore target other sites on DNA. Genome-wide chromatin immunoprecipitation and expression profiling were exploited to identify AML1/ETO-dependent transcriptional regulation. AML1/ETO was found to co-localize with AML1, demonstrating that the fusion protein follows the binding pattern of the wild-type protein but does not function primarily by displacing it. The DNA binding profile of the E-protein HEB was grossly rearranged upon expression of AML1/ETO, and the fusion protein was found to co-localize with both AML1 and HEB on many of its regulated targets. Furthermore, the level of HEB protein was increased in both primary cells and cell lines expressing AML1/ETO. Our results suggest a major role for the functional interaction of AML1/ETO with AML1 and HEB in transcriptional regulation determined by the fusion protein.
Acute myeloid leukemias (AML) are a group of hematologic malignancies initiated by chromosomal abnormalities that often give origin to oncogenic proteins with transcriptional regulatory functions. These aberrant transcription factors bind to specific sequences on DNA and influence the activity of adjacent genes. The result is that leukemic blasts display abnormalities in their gene expression programs, which are ultimately responsible for the malignant phenotype. In this study, genome-wide approaches were exploited not only to identify target genes, but also to discover interactions among different transcription factors, with the aim of defining disease-linked regulatory networks. We performed a detailed analysis of the DNA binding pattern of an oncogenic transcription factor, AML1/ETO, which is responsible for approximately 10–15% of AML. We identified a specific signature, which is characterized by the presence of binding regions for AML1/ETO and for other transcription factors, AML1 and HEB, and found that the DNA binding pattern of AML1 and HEB is significantly affected in cells expressing AML1/ETO. Our results, therefore, describe genes regulated by AML1/ETO and demonstrate that this oncogenic protein can significantly interfere with the function of other transcriptional regulators.
High levels of serum IgE are considered markers of parasite and helminth exposure. In addition, they are associated with allergic disorders, play a key role in anti-tumoral defence, and are crucial mediators of autoimmune diseases. Total IgE is a strongly heritable trait. In a genome-wide association study (GWAS), we tested 353,569 SNPs for association with serum IgE levels in 1,530 individuals from the population-based KORA S3/F3 study. Replication was performed in four independent population-based study samples (total n = 9,769 individuals). Functional variants in the gene encoding the alpha chain of the high affinity receptor for IgE (FCER1A) on chromosome 1q23 (rs2251746 and rs2427837) were strongly associated with total IgE levels in all cohorts with P values of 1.85×10−20 and 7.08×10−19 in a combined analysis, and in a post-hoc analysis showed additional associations with allergic sensitization (P = 7.78×10−4 and P = 1.95×10−3). The “top” SNP significantly influenced the cell surface expression of FCER1A on basophils, and genome-wide expression profiles indicated an interesting novel regulatory mechanism of FCER1A expression via GATA-2. Polymorphisms within the RAD50 gene on chromosome 5q31 were consistently associated with IgE levels (P values 6.28×10−7−4.46×10−8) and increased the risk for atopic eczema and asthma. Furthermore, STAT6 was confirmed as susceptibility locus modulating IgE levels. In this first GWAS on total IgE FCER1A was identified and replicated as new susceptibility locus at which common genetic variation influences serum IgE levels. In addition, variants within the RAD50 gene might represent additional factors within cytokine gene cluster on chromosome 5q31, emphasizing the need for further investigations in this intriguing region. Our data furthermore confirm association of STAT6 variation with serum IgE levels.
High levels of serum IgE are considered markers of parasite and helminth exposure. In addition, they are associated with allergic disorders, play a key role in anti-tumoral defence, and are crucial mediators of autoimmune diseases. There is strong evidence that the regulation of serum IgE levels is under a strong genetic control. However, despite numerous loci and candidate genes linked and associated with atopy-related traits, very few have been associated consistently with total IgE. This study describes the first large-scale, genome-wide scan on total IgE. By examining >11,000 German individuals from four independent population-based cohorts, we show that functional variants in the gene encoding the alpha chain of the high affinity receptor for IgE (FCER1A) on chromosome 1q23 are strongly associated with total IgE levels. In addition, our data confirm association of STAT6 variation with serum IgE levels, and suggest that variants within the RAD50 gene might represent additional factors within cytokine gene cluster on chromosome 5q31, emphasizing the need for further investigations in this intriguing region.
The MYC oncogene has been implicated in the regulation of up to thousands of genes involved in many cellular programs including proliferation, growth, differentiation, self-renewal, and apoptosis. MYC is thought to induce cancer through an exaggerated effect on these physiologic programs. Which of these genes are responsible for the ability of MYC to initiate and/or maintain tumorigenesis is not clear. Previously, we have shown that upon brief MYC inactivation, some tumors undergo sustained regression. Here we demonstrate that upon MYC inactivation there are global permanent changes in gene expression detected by microarray analysis. By applying StepMiner analysis, we identified genes whose expression most strongly correlated with the ability of MYC to induce a neoplastic state. Notably, genes were identified that exhibited permanent changes in mRNA expression upon MYC inactivation. Importantly, permanent changes in gene expression could be shown by chromatin immunoprecipitation (ChIP) to be associated with permanent changes in the ability of MYC to bind to the promoter regions. Our list of candidate genes associated with tumor maintenance was further refined by comparing our analysis with other published results to generate a gene signature associated with MYC-induced tumorigenesis in mice. To validate the role of gene signatures associated with MYC in human tumorigenesis, we examined the expression of human homologs in 273 published human lymphoma microarray datasets in Affymetrix U133A format. One large functional group of these genes included the ribosomal structural proteins. In addition, we identified a group of genes involved in a diverse array of cellular functions including: BZW2, H2AFY, SFRS3, NAP1L1, NOLA2, UBE2D2, CCNG1, LIFR, FABP3, and EDG1. Hence, through our analysis of gene expression in murine tumor models and human lymphomas, we have identified a novel gene signature correlated with the ability of MYC to maintain tumorigenesis.
The targeted inactivation of oncogenes may be a specific and effective treatment of cancer. However, how oncogene inactivation leads to tumor regression is not clear. Previously, we have shown that even the brief inactivation of the MYC oncogene can result in the sustained regression of at least some tumors. To understand the mechanism, we have utilized several novel genomic analyses to define a set of genes that strongly correlate with the ability of the MYC oncogene to maintain tumorigenesis. First, we generated a novel data set from microarray analyses of murine tumors that we analyzed by StepMiner to identify discrete step changes in gene expression after the inactivation or the reactivation of the MYC oncogene. Second, we utilized Boolean Network Analysis to further define the subset of genes highly correlated with MYC in human tumorigenesis. Third, we utilized ChIP analysis to demonstrate that in many cases the permanent changes of gene expression we uncovered were associated with changes in the ability of MYC to occupy the promoter locus. Our general strategy could be similarly utilized in other experimental model systems to understand how specific oncogenes contribute to the maintenance of tumorigenesis.
The functional consequences of missense variants in disease genes are difficult to predict. We assessed if gene expression profiles could distinguish between BRCA1 or BRCA2 pathogenic truncating and missense mutation carriers and familial breast cancer cases whose disease was not attributable to BRCA1 or BRCA2 mutations (BRCAX cases). 72 cell lines from affected women in high-risk breast ovarian families were assayed after exposure to ionising irradiation, including 23 BRCA1 carriers, 22 BRCA2 carriers, and 27 BRCAX individuals. A subset of 10 BRCAX individuals carried rare BRCA1/2 sequence variants considered to be of low clinical significance (LCS). BRCA1 and BRCA2 mutation carriers had similar expression profiles, with some subclustering of missense mutation carriers. The majority of BRCAX individuals formed a distinct cluster, but BRCAX individuals with LCS variants had expression profiles similar to BRCA1/2 mutation carriers. Gaussian Process Classifier predicted BRCA1, BRCA2 and BRCAX status, with a maximum of 62% accuracy, and prediction accuracy decreased with inclusion of BRCAX samples carrying an LCS variant, and inclusion of pathogenic missense carriers. Similarly, prediction of mutation status with gene lists derived using Support Vector Machines was good for BRCAX samples without an LCS variant (82–94%), poor for BRCAX with an LCS (40–50%), and improved for pathogenic BRCA1/2 mutation carriers when the gene list used for prediction was appropriate to mutation effect being tested (71–100%). This study indicates that mutation effect, and presence of rare variants possibly associated with a low risk of cancer, must be considered in the development of array-based assays of variant pathogenicity.
Inherited mutations in the genes BRCA1 and BRCA2 increase risk of breast cancer and contribute to a proportion of breast cancer families. However, more than half of the reported sequence alterations in BRCA1 and BRCA2 are currently of unknown clinical significance. We analysed gene expression in lymphoblastoid cell lines derived from blood of patients with sequence alterations in BRCA1 and BRCA2 and compared these to lymphoblastoid cells from familial breast cancer patients without such alterations. We then classified these lymphoblastoid cells based on their gene profiles. We found that BRCA1 and BRCA2 samples were more similar to each other than to familial breast cancer patients without BRCA1/2 mutations, and that the type of sequence change in BRCA1 and BRCA2 (missense or truncating) influenced gene expression. We included in the study ten familial breast cancer samples, which carried sequence changes in BRCA1 or BRCA2, that are believed to be of little clinical significance. Interestingly these samples were distinct from other familial breast cancer cases without any sequence alteration in BRCA1 or BRCA2, indicating that further work needs to be performed to determine the possible association of these “low clinical significance” sequence changes with a low to moderate risk of cancer.
There is considerable evidence that human genetic variation influences gene expression. Genome-wide studies have revealed that mRNA levels are associated with genetic variation in or close to the gene coding for those mRNA transcripts – cis effects, and elsewhere in the genome – trans effects. The role of genetic variation in determining protein levels has not been systematically assessed. Using a genome-wide association approach we show that common genetic variation influences levels of clinically relevant proteins in human serum and plasma. We evaluated the role of 496,032 polymorphisms on levels of 42 proteins measured in 1200 fasting individuals from the population based InCHIANTI study. Proteins included insulin, several interleukins, adipokines, chemokines, and liver function markers that are implicated in many common diseases including metabolic, inflammatory, and infectious conditions. We identified eight Cis effects, including variants in or near the IL6R (p = 1.8×10−57), CCL4L1 (p = 3.9×10−21), IL18 (p = 6.8×10−13), LPA (p = 4.4×10−10), GGT1 (p = 1.5×10−7), SHBG (p = 3.1×10−7), CRP (p = 6.4×10−6) and IL1RN (p = 7.3×10−6) genes, all associated with their respective protein products with effect sizes ranging from 0.19 to 0.69 standard deviations per allele. Mechanisms implicated include altered rates of cleavage of bound to unbound soluble receptor (IL6R), altered secretion rates of different sized proteins (LPA), variation in gene copy number (CCL4L1) and altered transcription (GGT1). We identified one novel trans effect that was an association between ABO blood group and tumour necrosis factor alpha (TNF-alpha) levels (p = 6.8×10−40), but this finding was not present when TNF-alpha was measured using a different assay , or in a second study, suggesting an assay-specific association. Our results show that protein levels share some of the features of the genetics of gene expression. These include the presence of strong genetic effects in cis locations. The identification of protein quantitative trait loci (pQTLs) may be a powerful complementary method of improving our understanding of disease pathways.
One of the central dogmas of molecular genetics is that DNA is transcribed to RNA which is translated to protein and alterations to proteins can influence human diseases. Genome-wide association studies have recently revealed many new DNA variants that influence human diseases. To complement these efforts, several genome-wide studies have established that DNA variation influences mRNA expression levels. Loci influencing mRNA levels have been termed “eQTLs”. In this study we have performed the first genome-wide association study of the third piece in this jigsaw – the role of DNA variation in relation to protein levels, or “pQTLs”. We analysed 42 proteins measured in blood fractions from the InCHIANTI study. We identified eight cis effects including common variants in or near the IL6R, CCL4, IL18, LPA, GGT1, SHBG, CRP and IL1RN genes, all associated with blood levels of their respective protein products. Mechanisms implicated included altered transcription (GGT1) but also rates of cleavage of bound to unbound soluble receptor (IL6R), altered secretion rates of different sized proteins (LPA) and variation in gene copy number (CCL4). Blood levels of many of these proteins are correlated with human diseases and the identification of “pQTLs” may in turn help our understanding of disease.
The long-term health outcome of prenatal exposure to arsenic has been associated with increased mortality in human populations. In this study, the extent to which maternal arsenic exposure impacts gene expression in the newborn was addressed. We monitored gene expression profiles in a population of newborns whose mothers experienced varying levels of arsenic exposure during pregnancy. Through the application of machine learning–based two-class prediction algorithms, we identified expression signatures from babies born to arsenic-unexposed and -exposed mothers that were highly predictive of prenatal arsenic exposure in a subsequent test population. Furthermore, 11 transcripts were identified that captured the maximal predictive capacity to classify prenatal arsenic exposure. Network analysis of the arsenic-modulated transcripts identified the activation of extensive molecular networks that are indicative of stress, inflammation, metal exposure, and apoptosis in the newborn. Exposure to arsenic is an important health hazard both in the United States and around the world, and is associated with increased risk for several types of cancer and other chronic diseases. These studies clearly demonstrate the robust impact of a mother's arsenic consumption on fetal gene expression as evidenced by transcript levels in newborn cord blood.
Arsenic is an environmental pollutant and known human carcinogen. Chronic exposure to arsenic-contaminated water is an important public health hazard around the world, including the United States, with millions exposed to drinking water with levels that far exceed World Health Organization (WHO) guidelines. Given the implications of prenatal exposure on human health and the known public health hazard of chronic arsenic exposure, this study was aimed at establishing the extent to which maternal arsenic exposure in a human population affects newborn gene expression. The authors show that prenatal arsenic exposure in a human population results in alarming gene expression changes in newborn babies. The gene expression changes monitored in babies born to mothers exposed to arsenic during pregnancy are highly predictive of prenatal arsenic exposure in a subsequent test population. The study establishes a subset of just 11 transcripts that captured maximal predictive capability that could prove promising as genetic biomarkers of prenatal arsenic exposure. Pathway analysis of the genome-wide response in the babies exposed to arsenic in utero indicates robust activation of an integrated network of pathways involving NF-κB, inflammation, cell proliferation, stress, and apoptosis. This study contributes to our understanding of biological responses to arsenic exposure.