PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (838117)

Clipboard (0)
None

Related Articles

1.  Identification of cis-regulatory sequence variations in individual genome sequences 
Genome Medicine  2011;3(10):65.
Functional contributions of cis-regulatory sequence variations to human genetic disease are numerous. For instance, disrupting variations in a HNF4A transcription factor binding site upstream of the Factor IX gene contributes causally to hemophilia B Leyden. Although clinical genome sequence analysis currently focuses on the identification of protein-altering variation, the impact of cis-regulatory mutations can be similarly strong. New technologies are now enabling genome sequencing beyond exomes, revealing variation across the non-coding 98% of the genome responsible for developmental and physiological patterns of gene activity. The capacity to identify causal regulatory mutations is improving, but predicting functional changes in regulatory DNA sequences remains a great challenge. Here we explore the existing methods and software for prediction of functional variation situated in the cis-regulatory sequences governing gene transcription and RNA processing.
doi:10.1186/gm281
PMCID: PMC3239227  PMID: 21989199
2.  Identifying regulatory elements in eukaryotic genomes 
Proper development and functioning of an organism depends on precise spatial and temporal expression of all its genes. These coordinated expression-patterns are maintained primarily through the process of transcriptional regulation. Transcriptional regulation is mediated by proteins binding to regulatory elements on the DNA in a combinatorial manner, where particular combinations of transcription factor binding sites establish specific regulatory codes. In this review, we survey experimental and computational approaches geared towards the identification of proximal and distal gene regulatory elements in the genomes of complex eukaryotes. Available approaches that decipher the genetic structure and function of regulatory elements by exploiting various sources of information like gene expression data, chromatin structure, DNA-binding specificities of transcription factors, cooperativity of transcription factors, etc. are highlighted. We also discuss the relevance of regulatory elements in the context of human health through examples of mutations in some of these regions having serious implications in misregulation of genes and being strongly associated with human disorders.
doi:10.1093/bfgp/elp014
PMCID: PMC2764519  PMID: 19498043
transcriptional regulation; enhancers; silencers; tissue-specific regulatory elements; population variation; non-coding diseases; computational analysis of regulatory element sequence composition
3.  Regulatory polymorphisms underlying complex disease traits 
There is growing evidence that genetic variation plays an important role in the determination of individual susceptibility to complex disease traits. In contrast to coding sequence polymorphisms, where the consequences of non-synonymous variation may be resolved at the level of the protein phenotype, defining specific functional regulatory polymorphisms has proved problematic. This has arisen for a number of reasons, including difficulties with fine mapping due to linkage disequilibrium, together with a paucity of experimental tools to resolve the effects of non-coding sequence variation on gene expression. Recent studies have shown that variation in gene expression is heritable and can be mapped as a quantitative trait. Allele-specific effects on gene expression appear relatively common, typically of modest magnitude and context specific. The role of regulatory polymorphisms in determining susceptibility to a number of complex disease traits is discussed, including variation at the VNTR of INS, encoding insulin, in type 1 diabetes and polymorphism of CTLA4, encoding cytotoxic T lymphocyte antigen, in autoimmune disease. Examples where regulatory polymorphisms have been found to play a role in mongenic traits such as factor VII deficiency are discussed, and contrasted with those polymorphisms associated with ischaemic heart disease at the same gene locus. Molecular mechanisms operating in an allele-specific manner at the level of transcription are illustrated, with examples including the role of Duffy binding protein in malaria. The difficulty of resolving specific functional regulatory variants arising from linkage disequilibrium is demonstrated using a number of examples including polymorphism of CCR5, encoding CC chemokine receptor 5, and HIV-1 infection. The importance of understanding haplotypic structure to the design and interpretation of functional assays of putative regulatory variation is highlighted, together with discussion of the strategic use of experimental tools to resolve regulatory polymorphisms at a transcriptional level. A number of examples are discussed including work on the TNF locus which demonstrate biological and experimental context specificity. Regulatory variation may also operate at other levels of control of gene expression and the modulation of splicing at PTPRC, encoding protein tyrosine phosphatase receptor-type C, and of translational efficiency at F12, encoding factor XII, are discussed.
doi:10.1007/s00109-004-0603-7
PMCID: PMC3132451  PMID: 15592805
Gene expression; Genetics; Gene polymorphism; Promoter; Transcription
4.  ChIP-seq accurately predicts tissue-specific activity of enhancers 
Nature  2009;457(7231):854-858.
A major yet unresolved quest in decoding the human genome is the identification of the regulatory sequences that control the spatial and temporal expression of genes. Distant-acting transcriptional enhancers are particularly challenging to uncover because they are scattered among the vast non-coding portion of the genome. Evolutionary sequence constraint can facilitate the discovery of enhancers, but fails to predict when and where they are active in vivo. Here we present the results of chromatin immunoprecipitation with the enhancer-associated protein p300 followed by massively parallel sequencing, and map several thousand in vivo binding sites of p300 in mouse embryonic forebrain, midbrain and limb tissue. We tested 86 of these sequences in a transgenic mouse assay, which in nearly all cases demonstrated reproducible enhancer activity in the tissues that were predicted by p300 binding. Our results indicate that in vivo mapping of p300 binding is a highly accurate means for identifying enhancers and their associated activities, and suggest that such data sets will be useful to study the role of tissue-specific enhancers in human biology and disease on a genome-wide scale.
doi:10.1038/nature07730
PMCID: PMC2745234  PMID: 19212405
5.  Local Regulatory Variation in Saccharomyces cerevisiae 
PLoS Genetics  2005;1(2):e25.
Naturally occurring sequence variation that affects gene expression is an important source of phenotypic differences among individuals within a species. We and others have previously shown that such regulatory variation can occur both at the same locus as the gene whose expression it affects (local regulatory variation) and elsewhere in the genome at trans-acting factors. Here we present a detailed analysis of genome-wide local regulatory variation in Saccharomyces cerevisiae. We used genetic linkage analysis to show that nearly a quarter of all yeast genes contain local regulatory variation between two divergent strains. We measured allele-specific expression in a diploid hybrid of the two strains for 77 genes showing strong self-linkage and found that in 52%–78% of these genes, local regulatory variation acts directly in cis. We also experimentally confirmed one example in which local regulatory variation in the gene AMN1 acts in trans through a feedback loop. Genome-wide sequence analysis revealed that genes subject to local regulatory variation show increased polymorphism in the promoter regions, and that some but not all of this increase is due to polymorphisms in predicted transcription factor binding sites. Increased polymorphism was also found in the 3′ untranslated regions of these genes. These findings point to the importance of cis-acting variation, but also suggest that there is a diverse set of mechanisms through which local variation can affect gene expression levels.
Synopsis
Variation in DNA sequences in and around a gene can contribute to differences between individuals by affecting the gene's expression. The authors have used a variety of methods to characterize this local DNA sequence variation on a large scale in two strains of the budding yeast Saccharomyces cerevisiae. Their results suggest that the expression levels of a sizeable fraction of genes are affected by local sequence variation. Many local variants alter the expression of only one of two copies of a gene in diploid hybrid yeast, but other local variants can affect both copies equally. The authors also found that sequence variation in particular regions of DNA near genes, both upstream and downstream of coding sequences and especially in transcription factor binding sites, is most likely to affect gene expression. These results provide a detailed view of local sequence variation that affects the expression of nearby genes in S. cerevisiae.
doi:10.1371/journal.pgen.0010025
PMCID: PMC1189075  PMID: 16121257
6.  Local Regulatory Variation in Saccharomyces cerevisiae 
PLoS Genetics  2005;1(2):e25.
Naturally occurring sequence variation that affects gene expression is an important source of phenotypic differences among individuals within a species. We and others have previously shown that such regulatory variation can occur both at the same locus as the gene whose expression it affects (local regulatory variation) and elsewhere in the genome at trans-acting factors. Here we present a detailed analysis of genome-wide local regulatory variation in Saccharomyces cerevisiae. We used genetic linkage analysis to show that nearly a quarter of all yeast genes contain local regulatory variation between two divergent strains. We measured allele-specific expression in a diploid hybrid of the two strains for 77 genes showing strong self-linkage and found that in 52%–78% of these genes, local regulatory variation acts directly in cis. We also experimentally confirmed one example in which local regulatory variation in the gene AMN1 acts in trans through a feedback loop. Genome-wide sequence analysis revealed that genes subject to local regulatory variation show increased polymorphism in the promoter regions, and that some but not all of this increase is due to polymorphisms in predicted transcription factor binding sites. Increased polymorphism was also found in the 3′ untranslated regions of these genes. These findings point to the importance of cis-acting variation, but also suggest that there is a diverse set of mechanisms through which local variation can affect gene expression levels.
Synopsis
Variation in DNA sequences in and around a gene can contribute to differences between individuals by affecting the gene's expression. The authors have used a variety of methods to characterize this local DNA sequence variation on a large scale in two strains of the budding yeast Saccharomyces cerevisiae. Their results suggest that the expression levels of a sizeable fraction of genes are affected by local sequence variation. Many local variants alter the expression of only one of two copies of a gene in diploid hybrid yeast, but other local variants can affect both copies equally. The authors also found that sequence variation in particular regions of DNA near genes, both upstream and downstream of coding sequences and especially in transcription factor binding sites, is most likely to affect gene expression. These results provide a detailed view of local sequence variation that affects the expression of nearby genes in S. cerevisiae.
doi:10.1371/journal.pgen.0010025
PMCID: PMC1189075  PMID: 16121257
7.  An Immune Response Network Associated with Blood Lipid Levels 
PLoS Genetics  2010;6(9):e1001113.
While recent scans for genetic variation associated with human disease have been immensely successful in uncovering large numbers of loci, far fewer studies have focused on the underlying pathways of disease pathogenesis. Many loci which are associated with disease and complex phenotypes map to non-coding, regulatory regions of the genome, indicating that modulation of gene transcription plays a key role. Thus, this study generated genome-wide profiles of both genetic and transcriptional variation from the total blood extracts of over 500 randomly-selected, unrelated individuals. Using measurements of blood lipids, key players in the progression of atherosclerosis, three levels of biological information are integrated in order to investigate the interactions between circulating leukocytes and proximal lipid compounds. Pair-wise correlations between gene expression and lipid concentration indicate a prominent role for basophil granulocytes and mast cells, cell types central to powerful allergic and inflammatory responses. Network analysis of gene co-expression showed that the top associations function as part of a single, previously unknown gene module, the Lipid Leukocyte (LL) module. This module replicated in T cells from an independent cohort while also displaying potential tissue specificity. Further, genetic variation driving LL module expression included the single nucleotide polymorphism (SNP) most strongly associated with serum immunoglobulin E (IgE) levels, a key antibody in allergy. Structural Equation Modeling (SEM) indicated that LL module is at least partially reactive to blood lipid levels. Taken together, this study uncovers a gene network linking blood lipids and circulating cell types and offers insight into the hypothesis that the inflammatory response plays a prominent role in metabolism and the potential control of atherogenesis.
Author Summary
Circulating lipid concentrations are important predictors of coronary artery disease. The main pathology of coronary artery disease is atherosclerosis, a cycle of lipid adherence to the walls of arteries and an inflammatory response resulting in more adhesion. To investigate the link between lipids and immune cells in circulation, we have generated both genomic and whole blood gene expression profiles for a population-based collection of individuals from the capital region of Finland. Key mediators of inflammation and allergy were shown to be correlated with lipid levels. Further, the expressions of these genes operated in such a highly coordinated fashion that they appeared to function as part of a single pathway, which itself was both highly correlated with and reactive to lipid levels. Our findings offer insight into how lipids activate circulating immune cells, potentially contributing to the pathogenesis of coronary artery disease.
doi:10.1371/journal.pgen.1001113
PMCID: PMC2936545  PMID: 20844574
8.  Transcriptome and genome sequencing uncovers functional variation in humans 
Nature  2013;501(7468):506-511.
Summary
Genome sequencing projects are discovering millions of genetic variants in humans, and interpretation of their functional effects is essential for understanding the genetic basis of variation in human traits. Here we report sequencing and deep analysis of mRNA and miRNA from lymphoblastoid cell lines of 462 individuals from the 1000 Genomes Project – the first uniformly processed RNA-seq data from multiple human populations with high-quality genome sequences. We discovered extremely widespread genetic variation affecting regulation of the majority of genes, with transcript structure and expression level variation being equally common but genetically largely independent. Our characterization of causal regulatory variation sheds light on cellular mechanisms of regulatory and loss-of-function variation, and allowed us to infer putative causal variants for dozens of disease-associated loci. Altogether, this study provides a deep understanding of the cellular mechanisms of transcriptome variation and of the landscape of functional variants in the human genome.
doi:10.1038/nature12531
PMCID: PMC3918453  PMID: 24037378
9.  Regulatory variation in a TBX5 enhancer leads to isolated congenital heart disease 
Human Molecular Genetics  2012;21(14):3255-3263.
Recent studies have identified the genetic underpinnings of a growing number of diseases through targeted exome sequencing. However, this strategy ignores the large component of the genome that does not code for proteins, but is nonetheless biologically functional. To address the possible involvement of regulatory variation in congenital heart diseases (CHDs), we searched for regulatory mutations impacting the activity of TBX5, a dosage-dependent transcription factor with well-defined roles in the heart and limb development that has been associated with the Holt–Oram syndrome (heart–hand syndrome), a condition that affects 1/100 000 newborns. Using a combination of genomics, bioinformatics and mouse genetic engineering, we scanned ∼700 kb of the TBX5 locus in search of cis-regulatory elements. We uncovered three enhancers that collectively recapitulate the endogenous expression pattern of TBX5 in the developing heart. We re-sequenced these enhancer elements in a cohort of non-syndromic patients with isolated atrial and/or ventricular septal defects, the predominant cardiac defects of the Holt–Oram syndrome, and identified a patient with a homozygous mutation in an enhancer ∼90 kb downstream of TBX5. Notably, we demonstrate that this single-base-pair mutation abrogates the ability of the enhancer to drive expression within the heart in vivo using both mouse and zebrafish transgenic models. Given the population-wide frequency of this variant, we estimate that 1/100 000 individuals would be homozygous for this variant, highlighting that a significant number of CHD associated with TBX5 dysfunction might arise from non-coding mutations in TBX5 heart enhancers, effectively decoupling the heart and hand phenotypes of the Holt–Oram syndrome.
doi:10.1093/hmg/dds165
PMCID: PMC3384386  PMID: 22543974
10.  Polymorphic Cis- and Trans-Regulation of Human Gene Expression 
PLoS Biology  2010;8(9):e1000480.
Using genetic and molecular analyses, we identified over 1,000 polymorphic regulators that regulate expression levels of human genes.
Expression levels of human genes vary extensively among individuals. This variation facilitates analyses of expression levels as quantitative phenotypes in genetic studies where the entire genome can be scanned for regulators without prior knowledge of the regulatory mechanisms, thus enabling the identification of unknown regulatory relationships. Here, we carried out such genetic analyses with a large sample size and identified cis- and trans-acting polymorphic regulators for about 1,000 human genes. We validated the cis-acting regulators by demonstrating differential allelic expression with sequencing of transcriptomes (RNA-Seq) and the trans-regulators by gene knockdown, metabolic assays, and chromosome conformation capture analysis. The majority of the regulators act in trans to the target (regulated) genes. Most of these trans-regulators were not known to play a role in gene expression regulation. The identification of these regulators enabled the characterization of polymorphic regulation of human gene expression at a resolution that was unattainable in the past.
Author Summary
Cellular characteristics and functions are determined largely by gene expression and expression levels differ among individuals, however it is not clear how these levels are regulated. While many cis-acting DNA sequence variants in promoters and enhancers that influence gene expression have been identified, only a few polymorphic trans-regulators of human genes are known. Here, we used human B-cells from individuals belonging to large families and identified polymorphic trans-regulators for about 1,000 human genes. We validated these results by gene knockdown, metabolic perturbation studies and chromosome conformation capture assays. Although these regulatory relationships were identified in cultured B-cells, we show that some of the relationships were also found in primary fibroblasts. The large number of regulators allowed us to better understand gene expression regulation, to uncover new gene functions, and to identify their roles in disease processes. This study shows that genetic variation is a powerful tool not only for gene mapping but also to study gene interaction and regulation.
doi:10.1371/journal.pbio.1000480
PMCID: PMC2939022  PMID: 20856902
11.  Cell-type-specific long-range looping interactions identify distant regulatory elements of the CFTR gene 
Nucleic Acids Research  2010;38(13):4325-4336.
Identification of regulatory elements and their target genes is complicated by the fact that regulatory elements can act over large genomic distances. Identification of long-range acting elements is particularly important in the case of disease genes as mutations in these elements can result in human disease. It is becoming increasingly clear that long-range control of gene expression is facilitated by chromatin looping interactions. These interactions can be detected by chromosome conformation capture (3C). Here, we employed 3C as a discovery tool for identification of long-range regulatory elements that control the cystic fibrosis transmembrane conductance regulator gene, CFTR. We identified four elements in a 460-kb region around the locus that loop specifically to the CFTR promoter exclusively in CFTR expressing cells. The elements are located 20 and 80 kb upstream; and 109 and 203 kb downstream of the CFTR promoter. These elements contain DNase I hypersensitive sites and histone modification patterns characteristic of enhancers. The elements also interact with each other and the latter two activate the CFTR promoter synergistically in reporter assays. Our results reveal novel long-range acting elements that control expression of CFTR and suggest that 3C-based approaches can be used for discovery of novel regulatory elements.
doi:10.1093/nar/gkq175
PMCID: PMC2910055  PMID: 20360044
12.  Dissecting complex phenotypes using the genomics of twins 
Functional & integrative genomics  2010;10(3):321-327.
Genetics in the post-genomic period is shifting from structural to functional genetics or genomics. Meanwhile, the use of twins is largely expanding from traditional heritability estimation for disease phenotypes to the study of both diseases and various molecular phenotypes, such as the regulatory phenotypes in functional genomics concerning gene expression and regulation, by engaging both classical twin design and marker-based gene mapping techniques in genetic epidemiology. New research designs have been proposed for making novel uses of twins in studying the molecular basis in the epigenetics of human diseases. Besides, twins not only serve as ideal samples for disease gene mapping using conventional genetic markers but also represent an excellent model for associating DNA copy number variations, a structural genetic marker, with human diseases. It is believed that, with the rapid development in biotechniques and new advances in bioinformatics, the unique samples of twins will make new contributions to our understanding of the nature and nurture in complex disease development and in human health. This paper aims at summarizing the new uses of twins in current genetic studies and suggesting novel proposes together with useful design and analytical strategies.
doi:10.1007/s10142-010-0160-9
PMCID: PMC3629377  PMID: 20145969
Twins; Genetics; Genomics
13.  Patterns of Cis Regulatory Variation in Diverse Human Populations 
PLoS Genetics  2012;8(4):e1002639.
The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants, but also more recently to assist in the interpretation and elucidation of disease signals. To date, many studies have looked in specific tissues and population-based samples, but there has been limited assessment of the degree of inter-population variability in regulatory variation. We analyzed genome-wide gene expression in lymphoblastoid cell lines from a total of 726 individuals from 8 global populations from the HapMap3 project and correlated gene expression levels with HapMap3 SNPs located in cis to the genes. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We further dissect the specific functional pathways differentiated between populations. We also identify 5,691 expression quantitative trait loci (eQTLs) after controlling for both non-genetic factors and population admixture and observe that half of the cis-eQTLs are replicated in one or more of the populations. We highlight patterns of eQTL-sharing between populations, which are partially determined by population genetic relatedness, and discover significant sharing of eQTL effects between Asians, European-admixed, and African subpopulations. Specifically, we observe that both the effect size and the direction of effect for eQTLs are highly conserved across populations. We observe an increasing proximity of eQTLs toward the transcription start site as sharing of eQTLs among populations increases, highlighting that variants close to TSS have stronger effects and therefore are more likely to be detected across a wider panel of populations. Together these results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation and provide an estimate for the transferability of complex trait variants across populations.
Author Summary
Variation among individuals in the degree to which genes are expressed (i.e. turned on or off) is a characteristic exhibited by all species, and studies have identified regions of the genome harboring genetic variation affecting gene expression levels. To assess the degree of human inter-population variability in regulatory variation, we describe mapping of regions of the genome that have functional effects on gene expression levels. We analyzed genome-wide gene expression in human cell lines derived from 726 unrelated individuals representing 8 global populations that have been genetically well-characterized by the International HapMap Project. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We identify ∼5,700 genes whose expression levels are associated with genetic variation located physically close to the gene, and we observe significant sharing of associations that is partially dependent on population genetic relatedness, among Asians, European-admixed, and African subpopulations. We identify biological functions affected by regulatory variation and describe common and unique characteristics of population-specific and population-shared associations. These results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation.
doi:10.1371/journal.pgen.1002639
PMCID: PMC3330104  PMID: 22532805
14.  USING GENE EXPRESSION TO INVESTIGATE THE GENETIC BASIS OF COMPLEX DISORDERS 
Human molecular genetics  2008;17(R2):R129-R134.
The identification of complex disease susceptibility loci through genome-wide association studies (GWAS) has recently become possible and is now a method of choice for investigating the genetic basis of complex traits. The number of results from such studies is constantly increasing but the challenge lying forward is to identify the biological context in which these statistically significant candidate variants act. Regulatory variation plays an important role in shaping phenotypic differences among individuals and thus is very likely to also influence disease susceptibility. As such, integrating gene expression data and other disease relevant intermediate phenotypes with GWAS results could potentially help prioritize fine-mapping efforts and provide a shortcut to disease biology. Combining these different levels of information in a meaningful way is however not trivial. In the present review we outline the several approaches that have been explored so far in this sense and their achievements. We also discuss the limitations of the methods and how upcoming technological developments could help circumvent these limitations. Overall, such efforts will be very helpful in understanding initially regulatory effects on disease and disease etiology in general.
doi:10.1093/hmg/ddn285
PMCID: PMC2570059  PMID: 18852201
15.  Using gene expression to investigate the genetic basis of complex disorders 
Human Molecular Genetics  2008;17(R2):R129-R134.
The identification of complex disease susceptibility loci through genome-wide association studies (GWAS) has recently become possible and is now a method of choice for investigating the genetic basis of complex traits. The number of results from such studies is constantly increasing but the challenge lying forward is to identify the biological context in which these statistically significant candidate variants act. Regulatory variation plays an important role in shaping phenotypic differences among individuals and thus is very likely to also influence disease susceptibility. As such, integrating gene expression data and other disease relevant intermediate phenotypes with GWAS results could potentially help prioritize fine-mapping efforts and provide a shortcut to disease biology. Combining these different levels of information in a meaningful way is however not trivial. In the present review, we outline the several approaches that have been explored so far in this sense and their achievements. We also discuss the limitations of the methods and how upcoming technological developments could help circumvent these limitations. Overall, such efforts will be very helpful in understanding initially regulatory effects on disease and disease etiology in general.
doi:10.1093/hmg/ddn285
PMCID: PMC2570059  PMID: 18852201
16.  A universal framework for regulatory element discovery across all genomes and data-types 
Molecular cell  2007;28(2):337-350.
Summary
Deciphering the non-coding regulatory genome has proved a formidable challenge. Despite the wealth of available gene expression data, there currently exists no broadly applicable method for characterizing the regulatory elements that shape the rich underlying dynamics. We present a general framework for detecting such regulatory DNA and RNA motifs that relies on directly assessing the mutual information between sequence and gene expression measurements. Our approach makes minimal assumptions about the background sequence model and the mechanisms by which elements affect gene expression. This provides a versatile motif discovery framework, across all data types and genomes, with exceptional sensitivity and near-zero false-positive rates. Applications from yeast to human uncover putative and established transcription-factor binding and miRNA target sites, revealing rich diversity in their spatial configurations, pervasive co-occurrences of DNA and RNA motifs, context-dependent selection for motif avoidance, and the strong impact of post-transcriptional processes on eukaryotic transcriptomes.
doi:10.1016/j.molcel.2007.09.027
PMCID: PMC2900317  PMID: 17964271
cis-regulatory element discovery; transcription factor binding sites; miRNA regulation; computational genomics; transcriptional regulation; post-transcriptional regulation; information-theory; mutual information; motif-discovery
17.  Genetic Variation of Pre-mRNA Alternative Splicing in Human Populations 
The precise splicing outcome of a transcribed gene is controlled by complex interactions between cis regulatory splicing signals and trans-acting regulators. In higher eukaryotes, alternative splicing is a prevalent mechanism for generating transcriptome and proteome diversity. Alternative splicing can modulate gene function, affect organismal phenotype and cause disease. Common genetic variation that affects splicing regulation can lead to differences in alternative splicing between human individuals and consequently impact expression level or protein function. In several well-documented examples, such natural variation of alternative splicing has indeed been shown to influence disease susceptibility and drug response. With new microarray- and sequencing-based genomic technologies that can analyze eukaryotic transcriptomes at the exon- or nucleotide-level, it has become possible to globally compare the alternative splicing profiles across human individuals in any tissue or cell type of interest. Recent large-scale transcriptome studies using high-density splicing-sensitive microarray and deep RNA sequencing (RNA-Seq) have revealed widespread genetic variation of alternative splicing in humans. In the future, an extensive catalogue of alternative splicing variation in human populations will help elucidate the molecular underpinnings of complex traits and human diseases, and shed light on the mechanisms of splicing regulation in human cells.
doi:10.1002/wrna.120
PMCID: PMC3339278  PMID: 22095823
18.  Mapping of numerous disease-associated expression polymorphisms in primary peripheral blood CD4+ lymphocytes 
Human Molecular Genetics  2010;19(23):4745-4757.
Genome-wide association studies of human gene expression promise to identify functional regulatory genetic variation that contributes to phenotypic diversity. However, it is unclear how useful this approach will be for the identification of disease-susceptibility variants. We generated gene expression profiles for 22 184 mRNA transcripts using RNA derived from peripheral blood CD4+ lymphocytes, and genome-wide genotype data for 516 512 autosomal markers in 200 subjects. We screened for cis-acting variants by testing variants mapping within 50 kb of expressed transcripts for association with transcript abundance using generalized linear models. Significant associations were identified for 1585 genes at a false discovery rate of 0.05 (corresponding to P-values ranging from 1 × 10−91 to 7 × 10−4). Importantly, we identified evidence of regulatory variation for 119 previously mapped disease genes, including 24 examples where the variant with the strongest evidence of disease-association demonstrates strong association with specific transcript abundance. The prevalence of cis-acting variants among disease-associated genes was 63% higher than the genome-wide rate in our data set (P = 6.41 × 10−6), and although many of the implicated loci were associated with immune-related diseases (including asthma, connective tissue disorders and inflammatory bowel disease), associations with genes implicated in non-immune-related diseases including lipid profiles, anthropomorphic measurements, cancer and neurologic disease were also observed. Genetic variants that confer inter-individual differences in gene expression represent an important subset of variants that contribute to disease susceptibility. Population-based integrative genetic approaches can help identify such variation and enhance our understanding of the genetic basis of complex traits.
doi:10.1093/hmg/ddq392
PMCID: PMC2972694  PMID: 20833654
19.  The interplay between evolution, regulation and tissue specificity in the Human Hereditary Diseasome 
BMC Genomics  2010;11(Suppl 4):S23.
Background
Human disease genes can be distinguished from essential (embryonically lethal) and non-disease genes using gene attributes. Such attributes include gene age, tissue specificity of expression, regulatory capacity, sequence length, rate of sequence variation and capacity for interaction. The resulting information has been used to inform data mining approaches seeking to identify novel disease genes. Given the dynamic nature of this field and the rapid rise in relevant information, we have chosen to perform a single integrated mining approach to explore relationships among gene attributes and thereby characterise evolutionary trends associated with disease genes.
Results
All against all cross comparison of 2,522 disease gene attributes revealed significant relationships existed between the age, disease-association and expression pattern of genes and the tissues within which they are expressed. We found that the over-representation of disease genes among old genes holds for tissue-specific genes, but the correlation between age and disease association vanished when conditioning on tissue-specificity. Of the 32 tissues studied, the genes expressed in pancreas are on average older than the genes expressed in any other tissue, while the testis expressed the lowest proportion of old genes. Following a focussed analysis on the impact of regulatory apparatus on evolution of disease genes, we show that regulators, comprising transcription factors and post-translation modified proteins, are over-represented among ancient disease genes. In addition, we show that the proportion of regulator genes is affected by gene age among disease genes and by tissue-specificity among non-disease genes. Finally, using 55,606 true positive gene interaction data, we find that old disease genes interacts with other old disease genes and interacting new genes interacts with genes originating from higher phylostrata.
Conclusion
This study supports the non-random nature of the human diseasome. We have identified a variety of distinct features and correlations to other molecular attributes that can be used to distinguish the set of disease causing genes. This was achieved by harnessing the power of mining large scale datasets from OMIM and other databases. Ultimately such knowledge may contribute to the identification of novel human disease genes and an enhanced understanding of human biology.
doi:10.1186/1471-2164-11-S4-S23
PMCID: PMC3005915  PMID: 21143807
20.  Gene Promoter Scan Methodology for Identifying and Classifying Coregulated Promoters 
Methods in enzymology  2007;422:361-385.
A critical challenge of the postgenomic era is to understand how genes are differentially regulated. Genetic and genomic approaches have been used successfully to assign genes to distinct regulatory networks in both prokaryotes and eukaryotes. However, little is known about what determines the differential expression of genes within a particular network, even when it involves a single transcription factor. The fact that coregulated genes may be differentially expressed suggests that subtle differences in the shared cis-acting regulatory elements are likely to be significant. This chapter describes a method, termed gene promoter scan (GPS), that discriminates among coregulated promoters by simultaneously considering a variety of cis-acting regulatory features. Application of this method to the PhoP/PhoQ two-component regulatory system of Escherichia coli and Salmonella enterica uncovered novel members of the PhoP regulon, as well as regulatory interactions that had not been discovered using previous approaches. The predictions made by GPS were validated experimentally to establish that the PhoP protein uses multiple mechanisms to control gene transcription and is a central element in a highly connected network.
doi:10.1016/S0076-6879(06)22018-4
PMCID: PMC3755887  PMID: 17628149
21.  Genetics and Regulatory Impact of Alternative Polyadenylation in Human B-Lymphoblastoid Cells 
PLoS Genetics  2012;8(8):e1002882.
Gene expression varies widely between individuals of a population, and regulatory change can underlie phenotypes of evolutionary and biomedical relevance. A key question in the field is how DNA sequence variants impact gene expression, with most mechanistic studies to date focused on the effects of genetic change on regulatory regions upstream of protein-coding sequence. By contrast, the role of RNA 3′-end processing in regulatory variation remains largely unknown, owing in part to the challenge of identifying functional elements in 3′ untranslated regions. In this work, we conducted a genomic survey of transcript ends in lymphoblastoid cells from genetically distinct human individuals. Our analysis mapped the cis-regulatory architecture of 3′ gene ends, finding that transcript end positions did not fall randomly in untranslated regions, but rather preferentially flanked the locations of 3′ regulatory elements, including miRNA sites. The usage of these transcript length forms and motifs varied across human individuals, and polymorphisms in polyadenylation signals and other 3′ motifs were significant predictors of expression levels of the genes in which they lay. Independent single-gene experiments confirmed the effects of polyadenylation variants on steady-state expression of their respective genes, and validated the regulatory function of 3′ cis-regulatory sequence elements that mediated expression of these distinct RNA length forms. Focusing on the immune regulator IRF5, we established the effect of natural variation in RNA 3′-end processing on regulatory response to antigen stimulation. Our results underscore the importance of two mechanisms at play in the genetics of 3′-end variation: the usage of distinct 3′-end processing signals and the effects of 3′ sequence elements that determine transcript fate. Our findings suggest that the strategy of integrating observed 3′-end positions with inferred 3′ regulatory motifs will prove to be a critical tool in continued efforts to interpret human genome variation.
Author Summary
Messenger RNAs carry the instructions necessary to synthesize proteins that do work for the cell. Extending beyond the protein-coding sequence of a given mRNA is an additional stretch of sequence, harboring signals that govern how much protein is made and how long the mRNA remains in the cell before it is broken down. The incorporation of this end region into mature mRNA is itself subject to change; for the vast majority of human genes, how and why cells use different mRNA ends remains largely unknown. In this work, we surveyed mRNA ends from ∼10,000 genes in immune cells from genetically distinct human individuals. We found that mRNA end positions were not randomly distributed, but rather preferentially flanked the locations of regulatory signals that govern mRNA fate. The usage of these mRNA length forms and regulatory elements varied across individuals and could be dissected molecularly. Our results uncover key mechanisms and regulatory effects of transcript end processing, particularly as these are perturbed by genetic differences between humans.
doi:10.1371/journal.pgen.1002882
PMCID: PMC3420953  PMID: 22916029
22.  Allelic Expression Changes in Medaka (Oryzias latipes) Hybrids between Inbred Strains Derived from Genetically Distant Populations 
PLoS ONE  2012;7(5):e36875.
Variations in allele expressions between genetically distant populations are one of the most important factors which affects their morphological and physiological variations. These variations are caused by natural mutations accumulated in their habitats. It has been reported that allelic expression differences in the hybrids of genetically distant populations are different from parental strains. In that case, there is a possibility that allelic expression changes lead to novel phenotypes in hybrids. Based on genomic information of the genetically distant populations, quantification and comparison of allelic expression changes make importance of regulatory sequences (cis-acting factors) or upstream regulatory factors (trans-acting modulators) for these changes clearer. In this study, we focused on two Medaka inbred strains, Hd-rR and HNI, derived from genetically distant populations and their hybrids. They are highly polymorphic and we can utilize whole-genome information. To analyze allelic expression changes, we established a method to quantify and compare allele-specific expressions of 11 genes between the parental strains and their reciprocal hybrids. In intestines of reciprocal hybrids, allelic expression was either similar or different in comparison with the parental strains. Total expressions in Hd-rR and HNI were tissue-dependent in the case of HPRT1, with high up-regulation of Hd-rR allele expression in liver. The proportion of genes with differential allelic expression in Medaka hybrids seems to be the same as that in other animals, despite the high SNP rate in the genomes of the two inbred strains. It is suggested that each tissue of the strain difference in trans-acting modulators is more important than polymorphisms in cis-regulatory sequences in producing the allelic expression changes in reciprocal hybrids.
doi:10.1371/journal.pone.0036875
PMCID: PMC3349633  PMID: 22590630
23.  Genomic architecture of sickle cell disease in West African children 
Sickle cell disease (SCD) is a congenital blood disease, affecting predominantly children from sub-Saharan Africa, but also populations world-wide. Although the causal mutation of SCD is known, the sources of clinical variability of SCD remain poorly understood, with only a few highly heritable traits associated with SCD having been identified. Phenotypic heterogeneity in the clinical expression of SCD is problematic for follow-up (FU), management, and treatment of patients. Here we used the joint analysis of gene expression and whole genome genotyping data to identify the genetic regulatory effects contributing to gene expression variation among groups of patients exhibiting clinical variability, as well as unaffected siblings, in Benin, West Africa. We characterized and replicated patterns of whole blood gene expression variation within and between SCD patients at entry to clinic, as well as in follow-up programs. We present a global map of genes involved in the disease through analysis of whole blood sampled from the cohort. Genome-wide association mapping of gene expression revealed 390 peak genome-wide significant expression SNPs (eSNPs) and 6 significant eSNP-by-clinical status interaction effects. The strong modulation of the transcriptome implicates pathways affecting core circulating cell functions and shows how genotypic regulatory variation likely contributes to the clinical variation observed in SCD.
doi:10.3389/fgene.2014.00026
PMCID: PMC3924578  PMID: 24592274
sickle cell disease; genomics; transcriptome; eSNP mapping; gene-by-environment interactions
24.  Short non-coding RNA biology and neurodegenerative disorders: novel disease targets and therapeutics 
Human Molecular Genetics  2009;18(R1):R27-R39.
Genomic studies in model organisms and in humans have shown that complexity in biological systems arises not from the absolute number of genes, but from the differential use of combinations of genetic programmes and the myriad ways in which these are regulated spatially and temporally during development, senescence and in disease. Nowhere is this lesson in biological complexity likely to be more apparent than in the human nervous system. Increasingly, the role of genomic non-protein coding small regulatory RNAs, in particular the microRNAs (miRNAs), in regulating cellular pathways controlling fundamental functions in the nervous system and in neurodegenerative disease is being appreciated. Not only might dysregulated expression of miRNAs serve as potential disease biomarkers but increasingly such short regulatory RNAs are being implicated directly in the pathogenesis of complex, sporadic neurodegenerative disease. Moreover, the targeting and exploitation of short RNA silencing pathways, commonly known as RNA interference, and the development of related tools, offers novel therapeutic approaches to target upstream disease components with the promise of providing future disease modifying therapies for neurodegenerative disorders.
doi:10.1093/hmg/ddp070
PMCID: PMC2657944  PMID: 19297399
25.  Passive and active DNA methylation and the interplay with genetic variation in gene regulation 
eLife  2013;2:e00523.
DNA methylation is an essential epigenetic mark whose role in gene regulation and its dependency on genomic sequence and environment are not fully understood. In this study we provide novel insights into the mechanistic relationships between genetic variation, DNA methylation and transcriptome sequencing data in three different cell-types of the GenCord human population cohort. We find that the association between DNA methylation and gene expression variation among individuals are likely due to different mechanisms from those establishing methylation-expression patterns during differentiation. Furthermore, cell-type differential DNA methylation may delineate a platform in which local inter-individual changes may respond to or act in gene regulation. We show that unlike genetic regulatory variation, DNA methylation alone does not significantly drive allele specific expression. Finally, inferred mechanistic relationships using genetic variation as well as correlations with TF abundance reveal both a passive and active role of DNA methylation to regulatory interactions influencing gene expression.
DOI: http://dx.doi.org/10.7554/eLife.00523.001
eLife digest
Variations occur throughout our genome. These variations can cause genes to be expressed (switched on) in slightly different ways among individuals. Moreover, the same gene can also be expressed in different ways in different cells within an individual. A third level of variation is supplied by epigenetic markers: these are molecules that bind to the DNA at specific points and can have profound effects on the expression of nearby genes. One such epigenetic marker is the addition of a methyl group to a cytosine base, a process that is known as DNA methylation.
DNA methylation usually happens when a cytosine base is next to a guanine base, forming a CpG site. In mammals, most CpG sites have methyl groups attached, although regions with a lot of CpG sites (called CpG islands) are mostly unmethylated. Initial studies suggested that methylation prevented particular genes from being expressed, but more recent work has indicated that methylation can be associated with both reduced and increased expression of genes. Moreover, it is not clear if this association is active (i.e., changes in methylation drive changes in gene expression) or passive (DNA methylation is the result of gene regulation).
Now, Gutierrez-Arcelus et al. have carried out a large-scale study to clarify the relationships between three different types of gene-related variations among individuals. They extracted fibroblasts, T-cells and lymphoblastoid cells from the umbilical cords of 204 babies, and analysed them for variations in DNA sequence, gene expression and DNA methylation. Their results show that the associations between the three are more complex than was previously thought.
Gutierrez-Arcelus et al. show that the mechanisms that control the association between the variations in DNA methylation and gene expression in individuals are likely to be different to those that are responsible for the establishment of methylation patterns during the process of cell differentiation. They also find that the association between DNA methylation and gene expression can be either active or passive, and can depend on the context in which they occur in our genome. Finally, where the two copies or alleles of a gene are not equally expressed in a given cell, the difference in expression is primarily regulated by DNA sequence variation, with DNA methylation having little or no role on its own. Equally complex interactions and effects are expected in further studies of genetic and epigenetic variation.
DOI: http://dx.doi.org/10.7554/eLife.00523.002
doi:10.7554/eLife.00523
PMCID: PMC3673336  PMID: 23755361
methylation; gene regulation; epigenetics; genome variation; Human

Results 1-25 (838117)