Human growth has an estimated heritability of about 80%–90%. Nevertheless, the underlying cause of shortness of stature remains unknown in the majority of individuals. Genome-wide association studies (GWAS) showed that both common single nucleotide polymorphisms and copy number variants (CNVs) contribute to height variation under a polygenic model, although explaining only a small fraction of overall genetic variability in the general population. Under the hypothesis that severe forms of growth retardation might also be caused by major gene effects, we searched for rare CNVs in 200 families, 92 sporadic and 108 familial, with idiopathic short stature compared to 820 control individuals. Although similar in number, patients had overall significantly larger CNVs (p-value<1×10−7). In a gene-based analysis of all non-polymorphic CNVs>50 kb for gene function, tissue expression, and murine knock-out phenotypes, we identified 10 duplications and 10 deletions ranging in size from 109 kb to 14 Mb, of which 7 were de novo (p<0.03) and 13 inherited from the likewise affected parent but absent in controls. Patients with these likely disease causing 20 CNVs were smaller than the remaining group (p<0.01). Eleven (55%) of these CNVs either overlapped with known microaberration syndromes associated with short stature or contained GWAS loci for height. Haploinsufficiency (HI) score and further expression profiling suggested dosage sensitivity of major growth-related genes at these loci. Overall 10% of patients carried a disease-causing CNV indicating that, like in neurodevelopmental disorders, rare CNVs are a frequent cause of severe growth retardation.
With a frequency of 3%, shortness of stature is a common medical concern. Although family studies have clearly shown that gene defects play a pivotal role in the development of short stature, the underlying genetic variants involved remain unknown in about 80% of cases. In contrast to recent studies which aimed at the identification of common genetic variants to explain minor differences in the height variation in the general population, we targeted rare genomic variants where we expected a major gene effect on growth. By examining 200 patients clinically evaluated for short stature, we show that rare structural chromosomal aberrations (CNVs) are associated with shortness of stature in 10% of the cases. The identified CNVs were either de novo or segregated with short stature in the families and include genes that are functionally involved in growth regulation in humans or mice. We furthermore demonstrate an overlap of these CNVs with known microdeletion syndromes. Interestingly, 3 CNVs contain positions of common variants and confirm the localization of major growth-related genes. These findings are particularly important for identification of biological pathways leading to short stature, but also for further therapeutic approaches.
We describe a novel approach to capturing the covariance structure of peripheral blood gene expression that relies on the identification of highly conserved Axes of variation. Starting with a comparison of microarray transcriptome profiles for a new dataset of 189 healthy adult participants in the Emory-Georgia Tech Center for Health Discovery and Well-Being (CHDWB) cohort, with a previously published study of 208 adult Moroccans, we identify nine Axes each with between 99 and 1,028 strongly co-regulated transcripts in common. Each axis is enriched for gene ontology categories related to sub-classes of blood and immune function, including T-cell and B-cell physiology and innate, adaptive, and anti-viral responses. Conservation of the Axes is demonstrated in each of five additional population-based gene expression profiling studies, one of which is robustly associated with Body Mass Index in the CHDWB as well as Finnish and Australian cohorts. Furthermore, ten tightly co-regulated genes can be used to define each Axis as “Blood Informative Transcripts” (BITs), generating scores that define an individual with respect to the represented immune activity and blood physiology. We show that environmental factors, including lifestyle differences in Morocco and infection leading to active or latent tuberculosis, significantly impact specific axes, but that there is also significant heritability for the Axis scores. In the context of personalized medicine, reanalysis of the longitudinal profile of one individual during and after infection with two respiratory viruses demonstrates that specific axes also characterize clinical incidents. This mode of analysis suggests the view that, rather than unique subsets of genes marking each class of disease, differential expression reflects movement along the major normal Axes in response to environmental and genetic stimuli.
Gene expression profiling of human tissues typically reveals a complex structure of co-regulation of gene expression that has yet to be explored with regard to the genetic and environmental sources of covariance or its implications for quantitative and clinical traits. Here we show that peripheral blood samples from multiple studies can be described by nine common axes of variation that collectively explain up to one half of all transcriptional variance in blood. Specific axes diverge according to environmental variables such as lifestyle and infectious disease exposure, but a strong genetic component to axis regulation is also inferred. As few as 10 “blood-informative transcripts” (BITs) can be used to define each axis and potentially classify individuals with respect to multiple aspects of their blood and immune function. The analysis of longitudinal profiles of one individual shows how these change relative to clinical shifts in metabolic profile following viral infection. The notion that gene expression diverges along genetic paths of least resistance defined by these axes has important implications for interpreting differential expression in case-control studies of disease.
The phylogeographic population structure of Mycobacterium tuberculosis suggests local adaptation to sympatric human populations. We hypothesized that HIV infection, which induces immunodeficiency, will alter the sympatric relationship between M. tuberculosis and its human host. To test this hypothesis, we performed a nine-year nation-wide molecular-epidemiological study of HIV–infected and HIV–negative patients with tuberculosis (TB) between 2000 and 2008 in Switzerland. We analyzed 518 TB patients of whom 112 (21.6%) were HIV–infected and 233 (45.0%) were born in Europe. We found that among European-born TB patients, recent transmission was more likely to occur in sympatric compared to allopatric host–pathogen combinations (adjusted odds ratio [OR] 7.5, 95% confidence interval [95% CI] 1.21–infinity, p = 0.03). HIV infection was significantly associated with TB caused by an allopatric (as opposed to sympatric) M. tuberculosis lineage (OR 7.0, 95% CI 2.5–19.1, p<0.0001). This association remained when adjusting for frequent travelling, contact with foreigners, age, sex, and country of birth (adjusted OR 5.6, 95% CI 1.5–20.8, p = 0.01). Moreover, it became stronger with greater immunosuppression as defined by CD4 T-cell depletion and was not the result of increased social mixing in HIV–infected patients. Our observation was replicated in a second independent panel of 440 M. tuberculosis strains collected during a population-based study in the Canton of Bern between 1991 and 2011. In summary, these findings support a model for TB in which the stable relationship between the human host and its locally adapted M. tuberculosis is disrupted by HIV infection.
Human tuberculosis (TB) caused by Mycobacterium tuberculosis kills 1.5 million people each year. M. tuberculosis has been affecting humans for millennia, suggesting that different strain lineages may be adapted to specific human populations. The combination of a particular strain lineage and its corresponding patient population can be classified as sympatric (e.g. Euro-American lineage in Europeans) or allopatric (e.g. East-Asian lineage in Europeans). We hypothesized that infection with the human immunodeficiency virus (HIV), which impairs the human immune system, will interfere with this host–pathogen relationship. We performed a nation-wide molecular-epidemiological study of HIV–infected and HIV–negative TB patients between 2000 and 2008 in Switzerland. We found that HIV infection was associated with the less adapted allopatric lineages among patients born in Europe, and this was not explained by social or other patient factors such as increased social mixing in HIV–infected individuals. Strikingly, the association between HIV infection and less adapted M. tuberculosis lineages was stronger in patients with more pronounced immunodeficiency. Our observation was replicated in a second independent panel of M. tuberculosis strains collected during a population-based study in the Canton of Bern. In summary, our study provides evidence that the sympatric host–pathogen relationship in TB is disrupted by HIV infection.
Myopia, or nearsightedness, is the most common eye disorder, resulting primarily from excess elongation of the eye. The etiology of myopia, although known to be complex, is poorly understood. Here we report the largest ever genome-wide association study (45,771 participants) on myopia in Europeans. We performed a survival analysis on age of myopia onset and identified 22 significant associations (), two of which are replications of earlier associations with refractive error. Ten of the 20 novel associations identified replicate in a separate cohort of 8,323 participants who reported if they had developed myopia before age 10. These 22 associations in total explain 2.9% of the variance in myopia age of onset and point toward a number of different mechanisms behind the development of myopia. One association is in the gene PRSS56, which has previously been linked to abnormally small eyes; one is in a gene that forms part of the extracellular matrix (LAMA2); two are in or near genes involved in the regeneration of 11-cis-retinal (RGR and RDH5); two are near genes known to be involved in the growth and guidance of retinal ganglion cells (ZIC2, SFRP1); and five are in or near genes involved in neuronal signaling or development. These novel findings point toward multiple genetic factors involved in the development of myopia and suggest that complex interactions between extracellular matrix remodeling, neuronal development, and visual signals from the retina may underlie the development of myopia in humans.
The genetic basis of myopia, or nearsightedness, is believed to be complex and affected by multiple genes. Two genetic association studies have each identified a single genetic region associated with myopia in European populations. Here we report the results of the largest ever genetic association study on myopia in over 45,000 people of European ancestry. We identified 22 genetic regions significantly associated with myopia age of onset. Two are replications of the previously identified associations, and 20 are novel. Ten of the novel associations replicate in a small separate cohort. Sixteen of the novel associations are in or near genes implicated in eye development, neuronal development and signaling, the visual cycle of the retina, and general morphology: BMP3, BMP4, DLG2, DLX1, KCNMA1, KCNQ5, LAMA2, LRRC4C, PRSS56, RBFOX1, RDH5, RGR, SFRP1, TJP2, ZBTB38, and ZIC2. These findings point to numerous biological pathways involved in the development of myopia and, in particular, suggest that early eye and neuronal development may lead to the eventual development of myopia in humans.
Using a phenome-wide association study (PheWAS) approach, we comprehensively tested genetic variants for association with phenotypes available for 70,061 study participants in the Population Architecture using Genomics and Epidemiology (PAGE) network. Our aim was to better characterize the genetic architecture of complex traits and identify novel pleiotropic relationships. This PheWAS drew on five population-based studies representing four major racial/ethnic groups (European Americans (EA), African Americans (AA), Hispanics/Mexican-Americans, and Asian/Pacific Islanders) in PAGE, each site with measurements for multiple traits, associated laboratory measures, and intermediate biomarkers. A total of 83 single nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWAS) were genotyped across two or more PAGE study sites. Comprehensive tests of association, stratified by race/ethnicity, were performed, encompassing 4,706 phenotypes mapped to 105 phenotype-classes, and association results were compared across study sites. A total of 111 PheWAS results had significant associations for two or more PAGE study sites with consistent direction of effect with a significance threshold of p<0.01 for the same racial/ethnic group, SNP, and phenotype-class. Among results identified for SNPs previously associated with phenotypes such as lipid traits, type 2 diabetes, and body mass index, 52 replicated previously published genotype–phenotype associations, 26 represented phenotypes closely related to previously known genotype–phenotype associations, and 33 represented potentially novel genotype–phenotype associations with pleiotropic effects. The majority of the potentially novel results were for single PheWAS phenotype-classes, for example, for CDKN2A/B rs1333049 (previously associated with type 2 diabetes in EA) a PheWAS association was identified for hemoglobin levels in AA. Of note, however, GALNT2 rs2144300 (previously associated with high-density lipoprotein cholesterol levels in EA) had multiple potentially novel PheWAS associations, with hypertension related phenotypes in AA and with serum calcium levels and coronary artery disease phenotypes in EA. PheWAS identifies associations for hypothesis generation and exploration of the genetic architecture of complex traits.
In phenome-wide association studies (PheWAS) all potential genetic variants in a dataset are systematically tested for association with all available phenotypes and traits that have been measured in study participants. By investigating the relationship between genetic variation and a diversity of phenotypes, there is the potential for uncovering novel relationships between single nucleotide polymorphisms (SNPs), phenotypes, and networks of interrelated phenotypes. PheWAS also can expose pleiotropy, provide novel mechanistic insights, and foster hypothesis generation. This approach is complementary to genome-wide association studies (GWAS) that test the association between hundreds of thousands, to over a million, single nucleotide polymorphisms and a single phenotype or limited phenotypic domain. The Population Architecture using Genomics and Epidemiology (PAGE) network has measures for a wide array of phenotypes and traits, including prevalent and incident status for clinical conditions and risk factors, as well as clinical parameters and intermediate biomarkers. We performed tests of association between a series of genome-wide association study (GWAS)–identified SNPs and a comprehensive range of phenotypes from the PAGE network in a high-throughput manner. We replicated a number of previously reported associations, validating the PheWAS approach. We also identified novel genotype–phenotype associations possibly representing pleiotropic effects.
Glycosylation of immunoglobulin G (IgG) influences IgG effector function by modulating binding to Fc receptors. To identify genetic loci associated with IgG glycosylation, we quantitated N-linked IgG glycans using two approaches. After isolating IgG from human plasma, we performed 77 quantitative measurements of N-glycosylation using ultra-performance liquid chromatography (UPLC) in 2,247 individuals from four European discovery populations. In parallel, we measured IgG N-glycans using MALDI-TOF mass spectrometry (MS) in a replication cohort of 1,848 Europeans. Meta-analysis of genome-wide association study (GWAS) results identified 9 genome-wide significant loci (P<2.27×10−9) in the discovery analysis and two of the same loci (B4GALT1 and MGAT3) in the replication cohort. Four loci contained genes encoding glycosyltransferases (ST6GAL1, B4GALT1, FUT8, and MGAT3), while the remaining 5 contained genes that have not been previously implicated in protein glycosylation (IKZF1, IL6ST-ANKRD55, ABCF2-SMARCD3, SUV420H1, and SMARCB1-DERL3). However, most of them have been strongly associated with autoimmune and inflammatory conditions (e.g., systemic lupus erythematosus, rheumatoid arthritis, ulcerative colitis, Crohn's disease, diabetes type 1, multiple sclerosis, Graves' disease, celiac disease, nodular sclerosis) and/or haematological cancers (acute lymphoblastic leukaemia, Hodgkin lymphoma, and multiple myeloma). Follow-up functional experiments in haplodeficient Ikzf1 knock-out mice showed the same general pattern of changes in IgG glycosylation as identified in the meta-analysis. As IKZF1 was associated with multiple IgG N-glycan traits, we explored biomarker potential of affected N-glycans in 101 cases with SLE and 183 matched controls and demonstrated substantial discriminative power in a ROC-curve analysis (area under the curve = 0.842). Our study shows that it is possible to identify new loci that control glycosylation of a single plasma protein using GWAS. The results may also provide an explanation for the reported pleiotropy and antagonistic effects of loci involved in autoimmune diseases and haematological cancer.
After analysing glycans attached to human immunoglobulin G in 4,095 individuals, we performed the first genome-wide association study (GWAS) of the glycome of an individual protein. Nine genetic loci were found to associate with glycans with genome-wide significance. Of these, four were enzymes that directly participate in IgG glycosylation, thus the observed associations were biologically founded. The remaining five genetic loci were not previously implicated in protein glycosylation, but the most of them have been reported to be relevant for autoimmune and inflammatory conditions and/or haematological cancers. A particularly interesting gene, IKZF1 was found to be associated with multiple IgG N-glycans. This gene has been implicated in numerous diseases, including systemic lupus erythematosus (SLE). We analysed N-glycans in 101 cases with SLE and 183 matched controls and demonstrated their substantial biomarker potential. Our study shows that it is possible to identify new loci that control glycosylation of a single plasma protein using GWAS. Our results may also provide an explanation for opposite effects of some genes in autoimmune diseases and haematological cancer.
In order to assess whether gene expression variability could be influenced by several SNPs acting in cis, either through additive or more complex haplotype effects, a systematic genome-wide search for cis haplotype expression quantitative trait loci (eQTL) was conducted in a sample of 758 individuals, part of the Cardiogenics Transcriptomic Study, for which genome-wide monocyte expression and GWAS data were available. 19,805 RNA probes were assessed for cis haplotypic regulation through investigation of ∼2,1×109 haplotypic combinations. 2,650 probes demonstrated haplotypic p-values >104-fold smaller than the best single SNP p-value. Replication of significant haplotype effects were tested for 412 probes for which SNPs (or proxies) that defined the detected haplotypes were available in the Gutenberg Health Study composed of 1,374 individuals. At the Bonferroni correction level of 1.2×10−4 (∼0.05/412), 193 haplotypic signals replicated. 1000G imputation was then conducted, and 105 haplotypic signals still remained more informative than imputed SNPs. In-depth analysis of these 105 cis eQTL revealed that at 76 loci genetic associations were compatible with additive effects of several SNPs, while for the 29 remaining regions data could be compatible with a more complex haplotypic pattern. As 24 of the 105 cis eQTL have previously been reported to be disease-associated loci, this work highlights the need for conducting haplotype-based and 1000G imputed cis eQTL analysis before commencing functional studies at disease-associated loci.
In order to assess whether gene expression variability could be influenced by the presence of more than one cis-acting SNP, we have conducted a systematic genome-wide search for haplotypic cis eQTL effects in a sample of 758 individuals and replicated the findings in an independent sample of 1,374 subjects. In both studies, genome-wide monocytes expression and genotype data were available. We identified 105 genes whose monocyte expression was under the influence of multiple cis-acting SNPs. About 75% of the detected genetic effects were related to independent additive SNP effects and the last quarter due to more complex haplotype effects. Of note, 24 of the genes identified to be affected by multiple cis eSNPs have been previously reported to reside at disease-associated loci. This could suggest that such multiple locus-specific genetic effects could contribute to the susceptibility to human diseases.
Infection with Epstein-Barr virus (EBV) is highly prevalent worldwide, and it has been associated with infectious mononucleosis and severe diseases including Burkitt lymphoma, Hodgkin lymphoma, nasopharyngeal lymphoma, and lymphoproliferative disorders. Although EBV has been the focus of extensive research, much still remains unknown concerning what makes some individuals more sensitive to infection and to adverse outcomes as a result of infection. Here we use an integrative genomics approach in order to localize genetic factors influencing levels of Epstein Barr virus (EBV) nuclear antigen-1 (EBNA-1) IgG antibodies, as a measure of history of infection with this pathogen, in large Mexican American families. Genome-wide evidence of both significant linkage and association was obtained on chromosome 6 in the human leukocyte antigen (HLA) region and replicated in an independent Mexican American sample of large families (minimum p-value in combined analysis of both datasets is 1.4×10−15 for SNPs rs477515 and rs2516049). Conditional association analyses indicate the presence of at least two separate loci within MHC class II, and along with lymphocyte expression data suggest genes HLA-DRB1 and HLA-DQB1 as the best candidates. The association signals are specific to EBV and are not found with IgG antibodies to 12 other pathogens examined, and therefore do not simply reveal a general HLA effect. We investigated whether SNPs significantly associated with diseases in which EBV is known or suspected to play a role (namely nasopharyngeal lymphoma, Hodgkin lymphoma, systemic lupus erythematosus, and multiple sclerosis) also show evidence of associated with EBNA-1 antibody levels, finding an overlap only for the HLA locus, but none elsewhere in the genome. The significance of this work is that a major locus related to EBV infection has been identified, which may ultimately reveal the underlying mechanisms by which the immune system regulates infection with this pathogen.
Many factors influence individual differences in susceptibility to infectious disease, including genetic factors of the host. Here we use several genome-wide investigative tools (linkage, association, joint linkage and association, and the analysis of gene expression data) to search for host genetic factors influencing Epstein-Barr virus (EBV) infection. EBV is a human herpes virus that infects up to 90% of adults worldwide, infection with which has been associated with severe complications including malignancies and autoimmune disorders. In a sample of >1,300 Mexican American family members, we found significant evidence of association of anti–EBV antibody levels with loci on chromosome 6 in the human leukocyte antigen region, which contains genes related to immune function. The top two independent loci in this region were HLA-DRB1 and HLA-DQB1, both of which are involved in the presentation of foreign antigens to T cells. This finding was specific to EBV and not to 12 other pathogens we examined. We also report an overlap of genetic factors influencing both EBV antibody level and EBV–related cancers and autoimmune disorders. This work demonstrates the presence of EBV susceptibility loci and provides impetus for further investigation to better understand the underlying mechanisms related to differences in disease progression among individuals infected with this pathogen.
Next-generation sequencing has made possible the detection of rare variant (RV) associations with quantitative traits (QT). Due to high sequencing cost, many studies can only sequence a modest number of selected samples with extreme QT. Therefore association testing in individual studies can be underpowered. Besides the primary trait, many clinically important secondary traits are often measured. It is highly beneficial if multiple studies can be jointly analyzed for detecting associations with commonly measured traits. However, analyzing secondary traits in selected samples can be biased if sample ascertainment is not properly modeled. Some methods exist for analyzing secondary traits in selected samples, where some burden tests can be implemented. However p-values can only be evaluated analytically via asymptotic approximations, which may not be accurate. Additionally, potentially more powerful sequence kernel association tests, variable selection-based methods, and burden tests that require permutations cannot be incorporated. To overcome these limitations, we developed a unified method for analyzing secondary trait associations with RVs (STAR) in selected samples, incorporating all RV tests. Statistical significance can be evaluated either through permutations or analytically. STAR makes it possible to apply more powerful RV tests to analyze secondary trait associations. It also enables jointly analyzing multiple cohorts ascertained under different study designs, which greatly boosts power. The performance of STAR and commonly used RV association tests were comprehensively evaluated using simulation studies. STAR was also implemented to analyze a dataset from the SardiNIA project where samples with extreme low-density lipoprotein levels were sequenced. A significant association between LDLR and systolic blood pressure was identified, which is supported by pharmacogenetic studies. In summary, for sequencing studies, STAR is an important tool for detecting secondary-trait RV associations.
Next-generation sequencing has greatly expanded our ability to identify missing heritability due to rare variants. In order to increase the power to detect associations, one desirable study design is to combine samples from multiple cohorts for mapping commonly measured traits. However, many current studies sequence selected samples (e.g. samples with extreme QT), which can bias the analysis of secondary traits, unless the sampling ascertainment mechanisms are properly adjusted. We developed a unified method for detecting secondary trait associations with rare variants (STAR) in selected and random samples, which can flexibly incorporate all rare variant association tests and allow joint analysis of multiple cohorts ascertained under different study designs. We demonstrate via simulations that STAR greatly boosts the power for detecting secondary trait associations. As an application of STAR, a dataset from the SardiNIA project was analyzed, where DNA samples from well-phenotyped individuals with extreme low-density lipoprotein levels were sequenced. LDLR was identified to be significantly associated with systolic blood pressure, which is supported by a previous pharmacogenetics study. In conclusion, STAR is an important tool for sequence-based association studies.
In previous geographical genomics studies of the impact of lifestyle on gene expression inferred from microarray analysis of peripheral blood samples, we described the complex influences of culture, ethnicity, and gender in Morocco, and of pregnancy in Brisbane. Here we describe the use of nanofluidic Fluidigm quantitative RT-PCR arrays targeted at a set of 96 transcripts that are broadly informative of the major axes of immune gene expression, to explore the population structure of transcription in Fiji. As in Morocco, major differences are seen between the peripheral blood transcriptomes of rural villagers and residents of the capital city, Suva. The effect is much greater in Indian villages than in Melanesian highlanders and appears to be similar with respect to the nature of at least two axes of variation. Gender differences are much smaller than ethnicity or lifestyle effects. Body mass index is shown to associate with one of the axes as it does in Atlanta and Brisbane, establishing a link between the epidemiological transition of human metabolic disease, and gene expression profiles.
epidemiological transition; gene expression profiling; body mass index; TLR signaling; axes of variation
Recent gene expression QTL (eQTL) mapping studies have provided considerable insight into the genetic basis for inter-individual regulatory variation. However, a limitation of all eQTL studies to date, which have used measurements of steady-state gene expression levels, is the inability to directly distinguish between variation in transcription and decay rates. To address this gap, we performed a genome-wide study of variation in gene-specific mRNA decay rates across individuals. Using a time-course study design, we estimated mRNA decay rates for over 16,000 genes in 70 Yoruban HapMap lymphoblastoid cell lines (LCLs), for which extensive genotyping data are available. Considering mRNA decay rates across genes, we found that: (i) as expected, highly expressed genes are generally associated with lower mRNA decay rates, (ii) genes with rapid mRNA decay rates are enriched with putative binding sites for miRNA and RNA binding proteins, and (iii) genes with similar functional roles tend to exhibit correlated rates of mRNA decay. Focusing on variation in mRNA decay across individuals, we estimate that steady-state expression levels are significantly correlated with variation in decay rates in 10% of genes. Somewhat counter-intuitively, for about half of these genes, higher expression is associated with faster decay rates, possibly due to a coupling of mRNA decay with transcriptional processes in genes involved in rapid cellular responses. Finally, we used these data to map genetic variation that is specifically associated with variation in mRNA decay rates across individuals. We found 195 such loci, which we named RNA decay quantitative trait loci (“rdQTLs”). All the observed rdQTLs are located near the regulated genes and therefore are assumed to act in cis. By analyzing our data within the context of known steady-state eQTLs, we estimate that a substantial fraction of eQTLs are associated with inter-individual variation in mRNA decay rates.
Recent studies of functional genetic variation in humans have identified numerous loci that are associated with variation in gene expression levels, called expression quantitative trait loci (eQTLs). The mechanisms by which these loci affect gene expression, however, are still largely unknown. Specifically, since most studies rely on measures of steady-state gene expression levels, they are unable to distinguish between the relative influences of either transcriptional- or decay-related processes. To address this gap, we examined the specific impact of mRNA decay processes on steady-state gene expression levels for over 16,000 genes in human lymphoblastoid cell lines. By characterizing decay rates in 70 individuals, we show that steady-state expression levels are significantly influenced by variation in decay rates for 10% of genes. Yet, for roughly half of these genes, we find that individuals with higher expression levels also have faster decay rates. This pattern points to a non-simple mechanistic interplay between transcriptional and decay processes, especially for genes involved in rapid cellular responses. Finally, we identify 195 genetic variants that are significantly associated with both gene expression variation and variation in mRNA decay rates. Using these data, we estimate that that a substantial fraction of eQTLs are associated with inter-individual variation in mRNA decay rates.
Family samples, which can be enriched for rare causal variants by focusing on families with multiple extreme individuals and which facilitate detection of de novo mutation events, provide an attractive resource for next-generation sequencing studies. Here, we describe, implement, and evaluate a likelihood-based framework for analysis of next generation sequence data in family samples. Our framework is able to identify variant sites accurately and to assign individual genotypes, and can handle de novo mutation events, increasing the sensitivity and specificity of variant calling and de novo mutation detection. Through simulations we show explicit modeling of family relationships is especially useful for analyses of low-frequency variants and that genotype accuracy increases with the number of individuals sequenced per family. Compared with the standard approach of ignoring relatedness, our methods identify and accurately genotype more variants, and have high specificity for detecting de novo mutation events. The improvement in accuracy using our methods over the standard approach is particularly pronounced for low-frequency variants. Furthermore the family-aware calling framework dramatically reduces Mendelian inconsistencies and is beneficial for family-based analysis. We hope our framework and software will facilitate continuing efforts to identify genetic factors underlying human diseases.
New sequencing methods can be used to study how genetic variation contributes to disease. For studies of rare variation, family designs are especially attractive because they allow even very rare variants to be observed in multiple individuals and because they can be used to study the impact of de novo mutation events. An important challenge is that most raw sequencing data include many errors. Here, we develop a new approach for interpreting sequence data. We show that by analyzing sequence data across many family members together it is possible to greatly reduce error rates (measured either as the number of true variants that are missed or the number of false variants that are claimed). In addition to facilitating detection and genotyping of SNPs, our methods can interface with existing tools to improve the accuracy of more challenging short insertion deletion polymorphisms and other types of variants. Our methods should make studies of families even more attractive because, in addition to making it easy to study rare variants and de novo mutation events, family studies will now be able to better transform sequence data into accurate genotypes.
Weight control diets favorably affect parameters of the metabolic syndrome and delay the onset of diabetic complications. The adaptations occurring in adipose tissue (AT) are likely to have a profound impact on the whole body response as AT is a key target of dietary intervention. Identification of environmental and individual factors controlling AT adaptation is therefore essential. Here, expression of 271 transcripts, selected for regulation according to obesity and weight changes, was determined in 515 individuals before, after 8-week low-calorie diet-induced weight loss, and after 26-week ad libitum weight maintenance diets. For 175 genes, opposite regulation was observed during calorie restriction and weight maintenance phases, independently of variations in body weight. Metabolism and immunity genes showed inverse profiles. During the dietary intervention, network-based analyses revealed strong interconnection between expression of genes involved in de novo lipogenesis and components of the metabolic syndrome. Sex had a marked influence on AT expression of 88 transcripts, which persisted during the entire dietary intervention and after control for fat mass. In women, the influence of body mass index on expression of a subset of genes persisted during the dietary intervention. Twenty-two genes revealed a metabolic syndrome signature common to men and women. Genetic control of AT gene expression by cis signals was observed for 46 genes. Dietary intervention, sex, and cis genetic variants independently controlled AT gene expression. These analyses help understanding the relative importance of environmental and individual factors that control the expression of human AT genes and therefore may foster strategies aimed at improving AT function in metabolic diseases.
In obesity, an excess of adipose tissue is associated with dyslipidemia and diabetic complications. Gene expression is under the control of various genetic and environmental factors. As a central organ for the control of metabolic disturbances in conditions of both weight gain and loss, a comprehensive understanding of the control of adipose tissue gene expression is of paramount interest. We analyzed adipose tissue gene expression in obese individuals from the DiOGenes protocol, one of the largest dietary interventions worldwide. We found evidence for composite control of adipose tissue gene expression by nutrition, metabolic syndrome, body mass index, sex, and genotype with two main novel features. First, we observed a preeminent effect of sex on adipose tissue gene expression, which was independent of nutritional status, fat mass, and sex chromosomes. Second, the control of gene expression by cis genetic factors was unaffected by sex and nutritional status. Altogether, the effects of the investigated factors were most often independent of each other. Comprehension of the relative importance of environmental and individual factors that control the expression of human adipose tissue genes may help deciphering strategies aimed at controlling adipose tissue function during metabolic disorders.
Inter-individual variation in facial shape is one of the most noticeable phenotypes in humans, and it is clearly under genetic regulation; however, almost nothing is known about the genetic basis of normal human facial morphology. We therefore conducted a genome-wide association study for facial shape phenotypes in multiple discovery and replication cohorts, considering almost ten thousand individuals of European descent from several countries. Phenotyping of facial shape features was based on landmark data obtained from three-dimensional head magnetic resonance images (MRIs) and two-dimensional portrait images. We identified five independent genetic loci associated with different facial phenotypes, suggesting the involvement of five candidate genes—PRDM16, PAX3, TP63, C5orf50, and COL17A1—in the determination of the human face. Three of them have been implicated previously in vertebrate craniofacial development and disease, and the remaining two genes potentially represent novel players in the molecular networks governing facial development. Our finding at PAX3 influencing the position of the nasion replicates a recent GWAS of facial features. In addition to the reported GWA findings, we established links between common DNA variants previously associated with NSCL/P at 2p21, 8q24, 13q31, and 17q22 and normal facial-shape variations based on a candidate gene approach. Overall our study implies that DNA variants in genes essential for craniofacial development contribute with relatively small effect size to the spectrum of normal variation in human facial morphology. This observation has important consequences for future studies aiming to identify more genes involved in the human facial morphology, as well as for potential applications of DNA prediction of facial shape such as in future forensic applications.
Monozygotic twins look more alike than dizygotic twins or other siblings, and siblings in turn look more alike than unrelated individuals, indicating that human facial morphology has a strong genetic component. We quantitatively assessed human facial shape phenotypes based on statistical shape analyses of facial landmarks obtained from three-dimensional magnetic resonance images of the head. These phenotypes turned out to be highly promising for studying the genetic basis of human facial variation in that they showed high heritability in our twin data. A subsequent genome-wide association study (GWAS) identified five candidate genes affecting facial shape in Europeans: PRDM16, PAX3, TP63, C5orf50, and COL17A1. In addition, our data suggest that genetic variants associated with NSCL/P also influence normal facial shape variation. Overall, this study provides novel and confirmatory links between common DNA variants and normal variation in human facial morphology. Our results also suggest that the high heritability of facial phenotypes seems to be explained by a large number of DNA variants with relatively small individual effect size, a phenomenon well known for other complex human traits, such as adult body height.
To identify genetic loci influencing bone accrual, we performed a genome-wide association scan for total-body bone mineral density (TB-BMD) variation in 2,660 children of different ethnicities. We discovered variants in 7q31.31 associated with BMD measurements, with the lowest P = 4.1×10−11 observed for rs917727 with minor allele frequency of 0.37. We sought replication for all SNPs located ±500 kb from rs917727 in 11,052 additional individuals from five independent studies including children and adults, together with de novo genotyping of rs3801387 (in perfect linkage disequilibrium (LD) with rs917727) in 1,014 mothers of children from the discovery cohort. The top signal mapping in the surroundings of WNT16 was replicated across studies with a meta-analysis P = 2.6×10−31 and an effect size explaining between 0.6%–1.8% of TB-BMD variance. Conditional analyses on this signal revealed a secondary signal for total body BMD (P = 1.42×10−10) for rs4609139 and mapping to C7orf58. We also examined the genomic region for association with skull BMD to test if the associations were independent of skeletal loading. We identified two signals influencing skull BMD variation, including rs917727 (P = 1.9×10−16) and rs7801723 (P = 8.9×10−28), also mapping to C7orf58 (r2 = 0.50 with rs4609139). Wnt16 knockout (KO) mice with reduced total body BMD and gene expression profiles in human bone biopsies support a role of C7orf58 and WNT16 on the BMD phenotypes observed at the human population level. In summary, we detected two independent signals influencing total body and skull BMD variation in children and adults, thus demonstrating the presence of allelic heterogeneity at the WNT16 locus. One of the skull BMD signals mapping to C7orf58 is mostly driven by children, suggesting temporal determination on peak bone mass acquisition. Our life-course approach postulates that these genetic effects influencing peak bone mass accrual may impact the risk of osteoporosis later in life.
Genetic investigations on bone mineral density (BMD) variation in children allow the identification of factors determining peak bone mass and their influence on developing osteoporosis later in life. We ran a genome-wide association study (GWAS) for total body BMD based on 2,660 children of different ethnic backgrounds, followed by replication in an additional 12,066 individuals comprising children, young adults, and elderly populations. Our GWAS meta-analysis identified two independent signals in the 7q31.31 locus, arising from SNPs in the vicinity of WNT16, FAM3C, and C7orf58. These variants were also associated with skull BMD, a skeletal trait with much less environmental influence for which one of the signals displayed age-specific effects. Integration of functional studies in a Wnt16 knockout mouse model and gene expression profiles in human bone tissue provided additional evidence that WNT16 and C7orf58 underlie the described associations. All together our findings demonstrate the relevance of these factors for bone biology, the attainment of peak bone mass, and their likely impact on bone fragility later in life.
Parkinson disease (PD) is a complex neurodegenerative disorder with largely unknown genetic mechanisms. While the degeneration of dopaminergic neurons in PD mainly takes place in the substantia nigra pars compacta (SN) region, other brain areas, including the prefrontal cortex, develop Lewy bodies, the neuropathological hallmark of PD. We generated and analyzed expression data from the prefrontal cortex Brodmann Area 9 (BA9) of 27 PD and 26 control samples using the 44K One-Color Agilent 60-mer Whole Human Genome Microarray. All samples were male, without significant Alzheimer disease pathology and with extensive pathological annotation available. 507 of the 39,122 analyzed expression probes were different between PD and control samples at false discovery rate (FDR) of 5%. One of the genes with significantly increased expression in PD was the forkhead box O1 (FOXO1) transcription factor. Notably, genes carrying the FoxO1 binding site were significantly enriched in the FDR–significant group of genes (177 genes covered by 189 probes), suggesting a role for FoxO1 upstream of the observed expression changes. Single-nucleotide polymorphisms (SNPs) selected from a recent meta-analysis of PD genome-wide association studies (GWAS) were successfully genotyped in 50 out of the 53 microarray brains, allowing a targeted expression–SNP (eSNP) analysis for 52 SNPs associated with PD affection at genome-wide significance and the 189 probes from FoxO1 regulated genes. A significant association was observed between a SNP in the cyclin G associated kinase (GAK) gene and a probe in the spermine oxidase (SMOX) gene. Further examination of the FOXO1 region in a meta-analysis of six available GWAS showed two SNPs significantly associated with age at onset of PD. These results implicate FOXO1 as a PD–relevant gene and warrant further functional analyses of its transcriptional regulatory mechanisms.
Parkinson disease (PD) is a neurodegenerative disease, which impairs the motor and cognitive abilities of affected individuals. Although the involvement of specific genes in the disease process has been recognized, the underlying genetic mechanisms are not yet understood. One common investigation approach for PD has been the comparison of gene expression levels in brain tissue from PD cases with those from neurologically healthy controls. We performed such an expression analysis in prefrontal cortex tissue from a set of 27 PD and 26 control samples. One of the 489 differentially expressed genes, forkhead box O1 (FOXO1), is involved in transcriptional regulation. Notably, the set of differentially expressed genes identified in our study was enriched for genes regulated by the FoxO1 protein. Analyses of DNA sequence variants known as single-nucleotide polymorphisms (SNPs) in the FOXO1 region, as well as of PD–relevant SNPs across the genome, suggest functional connections between this gene and 1) the age at onset in PD, and 2) the spermine oxidase (SMOX) gene. These findings implicate the involvement of FOXO1 in PD pathogenesis.
Genetic variants that modify brain gene expression may also influence risk for human diseases. We measured expression levels of 24,526 transcripts in brain samples from the cerebellum and temporal cortex of autopsied subjects with Alzheimer's disease (AD, cerebellar n = 197, temporal cortex n = 202) and with other brain pathologies (non–AD, cerebellar n = 177, temporal cortex n = 197). We conducted an expression genome-wide association study (eGWAS) using 213,528 cisSNPs within ±100 kb of the tested transcripts. We identified 2,980 cerebellar cisSNP/transcript level associations (2,596 unique cisSNPs) significant in both ADs and non–ADs (q<0.05, p = 7.70×10−5–1.67×10−82). Of these, 2,089 were also significant in the temporal cortex (p = 1.85×10−5–1.70×10−141). The top cerebellar cisSNPs had 2.4-fold enrichment for human disease-associated variants (p<10−6). We identified novel cisSNP/transcript associations for human disease-associated variants, including progressive supranuclear palsy SLCO1A2/rs11568563, Parkinson's disease (PD) MMRN1/rs6532197, Paget's disease OPTN/rs1561570; and we confirmed others, including PD MAPT/rs242557, systemic lupus erythematosus and ulcerative colitis IRF5/rs4728142, and type 1 diabetes mellitus RPS26/rs1701704. In our eGWAS, there was 2.9–3.3 fold enrichment (p<10−6) of significant cisSNPs with suggestive AD–risk association (p<10−3) in the Alzheimer's Disease Genetics Consortium GWAS. These results demonstrate the significant contributions of genetic factors to human brain gene expression, which are reliably detected across different brain regions and pathologies. The significant enrichment of brain cisSNPs among disease-associated variants advocates gene expression changes as a mechanism for many central nervous system (CNS) and non–CNS diseases. Combined assessment of expression and disease GWAS may provide complementary information in discovery of human disease variants with functional implications. Our findings have implications for the design and interpretation of eGWAS in general and the use of brain expression quantitative trait loci in the study of human disease genetics.
Genetic variants that regulate gene expression levels can also influence human disease risk. Discovery of genomic loci that alter brain gene expression levels (brain expression quantitative trait loci = eQTLs) can be instrumental in the identification of genetic risk underlying both central nervous system (CNS) and non–CNS diseases. To systematically assess the role of brain eQTLs in human disease and to evaluate the influence of brain region and pathology in eQTL mapping, we performed an expression genome-wide association study (eGWAS) in 773 brain samples from the cerebellum and temporal cortex of ∼200 autopsied subjects with Alzheimer's disease (AD) and ∼200 with other brain pathologies (non–AD). We identified ∼3,000 significant associations between cisSNPs near ∼700 genes and their cerebellar transcript levels, which replicate in ADs and non–ADs. More than 2,000 of these associations were reproducible in the temporal cortex. The top cisSNPs are enriched for both CNS and non–CNS disease-associated variants. We identified novel and confirmed previous cisSNP/transcript associations for many disease loci, suggesting gene expression regulation as their mechanism of action. These findings demonstrate the reproducibility of the eQTL approach across different brain regions and pathologies, and advocate the combined use of gene expression and disease GWAS for identification and functional characterization of human disease-associated variants.
Telomere length, an indicator of ageing and longevity, has been correlated with several biomarkers of cardiometabolic disease in both Arab children and adults. It is not known, however, whether or not telomere length is a highly conserved inheritable trait in this homogeneous cohort, where age-related diseases are highly prevalent. As such, the aim of this study was to address the inheritability of telomere length in Saudi families and the impact of cardiometabolic disease biomarkers on telomere length.
A total of 119 randomly selected Saudi families (123 adults and 131 children) were included in this cross-sectional study. Anthropometrics were obtained and fasting blood samples were taken for routine analyses of fasting glucose and lipid profile. Leukocyte telomere length was determined using quantitative real time PCR.
Telomere length was highly heritable as assessed by a parent-offspring regression [h2 = 0.64 (p = 0.0006)]. Telomere length was modestly associated with BMI (R2 0.07; p-value 0.0087), total cholesterol (R2 0.08; p-value 0.0033), and LDL-cholesterol (R2 0.15; p-value 3 x 10-5) after adjustments for gender, age and age within generation.
The high heritability of telomere length in Arab families, and the associations of telomere length with various cardiometabolic parameters suggest heritable genetic fetal and/or epigenetic influences on the early predisposition of Arab children to age-related diseases and accelerated ageing.
Telomere length; Heritability; Arabs; Ageing
Organisms in the wild are subject to multiple, fluctuating environmental factors, and it is in complex natural environments that genetic regulatory networks actually function and evolve. We assessed genome-wide gene expression patterns in the wild in two natural accessions of the model plant Arabidopsis thaliana and examined the nature of transcriptional variation throughout its life cycle and gene expression correlations with natural environmental fluctuations. We grew plants in a natural field environment and measured genome-wide time-series gene expression from the plant shoot every three days, spanning the seedling to reproductive stages. We find that 15,352 genes were expressed in the A. thaliana shoot in the field, and accession and flowering status (vegetative versus flowering) were strong components of transcriptional variation in this plant. We identified between ∼110 and 190 time-varying gene expression clusters in the field, many of which were significantly overrepresented by genes regulated by abiotic and biotic environmental stresses. The two main principal components of vegetative shoot gene expression (PCveg) correlate to temperature and precipitation occurrence in the field. The largest PCveg axes included thermoregulatory genes while the second major PCveg was associated with precipitation and contained drought-responsive genes. By exposing A. thaliana to natural environments in an open field, we provide a framework for further understanding the genetic networks that are deployed in natural environments, and we connect plant molecular genetics in the laboratory to plant organismal ecology in the wild.
Plants in the real world are continuously exposed to multiple environmental signals and must respond appropriately to the dynamic conditions found in nature. Environmental signals can fluctuate during an individual's life cycle with varying degrees of predictability, and complex natural environments are where gene activity evolves. We grew two natural accessions of the model plant Arabidopsis thaliana in an open field in New York in the spring and examined genome-wide gene expression patterns in the wild. We find nearly 200 gene expression clusters in these field-grown plants, and many of these clusters were enriched in genes that had previously been shown to be associated with expression under various abiotic or biotic environmental stress conditions. Two major principal components of gene expression were associated with environmental fluctuations in temperature and rainfall, and we identified several genes (such as the thermoregulatory nucleosome occupancy gene ARP6 and the drought-sensitive hormone biosynthetic gene AAO3) that could be found in these principal components. By exploring genome-wide gene expression in plants in the wild, we were able to connect mechanistic aspects of plant molecular biology with ecological responses in nature and to begin to understand how organisms behave and adapt in their natural environments.
Autozygosity occurs when two chromosomal segments that are identical from a common ancestor are inherited from each parent. This occurs at high rates in the offspring of mates who are closely related (inbreeding), but also occurs at lower levels among the offspring of distantly related mates. Here, we use runs of homozygosity in genome-wide SNP data to estimate the proportion of the autosome that exists in autozygous tracts in 9,388 cases with schizophrenia and 12,456 controls. We estimate that the odds of schizophrenia increase by ∼17% for every 1% increase in genome-wide autozygosity. This association is not due to one or a few regions, but results from many autozygous segments spread throughout the genome, and is consistent with a role for multiple recessive or partially recessive alleles in the etiology of schizophrenia. Such a bias towards recessivity suggests that alleles that increase the risk of schizophrenia have been selected against over evolutionary time.
Inbreeding occurs when genetic relatives have offspring. Because all humans are related to one another, even if very distantly, all people are inbred to various degrees. From a genetic standpoint, it is well known that inbreeding increases the risk that a child will have a rare recessive genetic disease, but there is also increasing interest in understanding whether inbreeding is a risk factor for more common, complex disorders such as schizophrenia. In this investigation, we used single-nucleotide polymorphism data to quantify the degree to which 9,388 schizophrenia cases and 12,456 controls were inbred, and we tested the hypothesis that people whose genome shows higher evidence of being inbred are at higher risk of having schizophrenia. We estimate that the odds of schizophrenia increase by ∼17% for every 1% increase in inbreeding. This finding is consistent with a role for multiple recessive or partially recessive alleles in the etiology of schizophrenia, and it suggests that genetic variants that increase the risk of schizophrenia have been selected against over evolutionary time.
Autism is a highly heritable neurodevelopmental disorder, yet the genetic underpinnings of the disorder are largely unknown. Aberrant brain overgrowth is a well-replicated observation in the autism literature; but association, linkage, and expression studies have not identified genetic factors that explain this trajectory. Few studies have had sufficient statistical power to investigate whole-genome gene expression and genotypic variation in the autistic brain, especially in regions that display the greatest growth abnormality. Previous functional genomic studies have identified possible alterations in transcript levels of genes related to neurodevelopment and immune function. Thus, there is a need for genetic studies involving key brain regions to replicate these findings and solidify the role of particular functional pathways in autism pathogenesis. We therefore sought to identify abnormal brain gene expression patterns via whole-genome analysis of mRNA levels and copy number variations (CNVs) in autistic and control postmortem brain samples. We focused on prefrontal cortex tissue where excess neuron numbers and cortical overgrowth are pronounced in the majority of autism cases. We found evidence for dysregulation in pathways governing cell number, cortical patterning, and differentiation in young autistic prefrontal cortex. In contrast, adult autistic prefrontal cortex showed dysregulation of signaling and repair pathways. Genes regulating cell cycle also exhibited autism-specific CNVs in DNA derived from prefrontal cortex, and these genes were significantly associated with autism in genome-wide association study datasets. Our results suggest that CNVs and age-dependent gene expression changes in autism may reflect distinct pathological processes in the developing versus the mature autistic prefrontal cortex. Our results raise the hypothesis that genetic dysregulation in the developing brain leads to abnormal regional patterning, excess prefrontal neurons, cortical overgrowth, and neural dysfunction in autism.
Autism is a disorder characterized by aberrant social, communication, and restricted and repetitive behaviors. It develops clinically in the first years of life. Toddlers and children with autism often exhibit early brain enlargement and excess neuron numbers in the prefrontal cortex. Adults with autism generally do not display enlargement but instead may have a smaller brain size. Thus, we investigated DNA and mRNA patterns in prefrontal cortex from young versus adult postmortem individuals with autism to identify age-related gene expression differences as well as possible genetic correlates of abnormal brain enlargement, excess neuron numbers, and abnormal functioning in this disorder. We found abnormalities in genetic pathways governing cell number, neurodevelopment, and cortical lateralization in autism. We also found that the key pathways associated with autism are different between younger and older autistic individuals. These findings suggest that dysregulated gene pathways in the early stages of neurodevelopment could lead to later behavioral and cognitive deficits associated with autism.
Autism spectrum disorders (ASD) are neurodevelopmental disorders with phenotypic and genetic heterogeneity. Recent studies have reported rare and de novo mutations in ASD, but the allelic architecture of ASD remains unclear. To assess the role of common and rare variations in ASD, we constructed a gene co-expression network based on a widespread survey of gene expression in the human brain. We identified modules associated with specific cell types and processes. By integrating known rare mutations and the results of an ASD genome-wide association study (GWAS), we identified two neuronal modules that are perturbed by both rare and common variations. These modules contain highly connected genes that are involved in synaptic and neuronal plasticity and that are expressed in areas associated with learning and memory and sensory perception. The enrichment of common risk variants was replicated in two additional samples which include both simplex and multiplex families. An analysis of the combined contribution of common variants in the neuronal modules revealed a polygenic component to the risk of ASD. The results of this study point toward contribution of minor and major perturbations in the two sub-networks of neuronal genes to ASD risk.
Autism spectrum disorders (ASD) are neurodevelopmental syndromes with a strong genetic basis, but are influenced by many different genes. Recent studies have identified multiple genetic risk factors, including rare mutations and genetic variations common in the population. To identify possible connections between different genetic risk factors, we constructed a network based on the expression pattern of genes across different brain areas. We identified groups of genes that are expressed in a similar pattern across the brain, suggesting that they are involved in the same processes or types of cells. We found that the genetic risk factors were enriched in specific groups of connected genes. Of these, the strongest enrichment was discovered in a group of neuronal genes that are involved in processes of learning and memory, and are highly expressed during infancy. Further study of this group of genes has the potential to reveal a more detailed picture of the neuronal mechanisms leading to ASD and to provide knowledge required for developing diagnostic tools and effective therapies.