We expand our previous deterministic power calculations by calculating the required sample size to detect C in ACE models. The theoretical expected value of the maximum log-likelihood for the AE model was derived using two optimisation methods and these gave near-identical results. Theoretical predictions were verified by computer simulation and the results agreed very well. We have developed a user-friendly web-based tool, TwinPower, to perform power calculations to detect either A or C for the classical twin design. This new tool can be found at http://genepi.qimr.edu.au/cgi-bin/twinpower.cgi
Genome-wide association analysis on monozygotic twin pairs offers a route to discovery of gene–environment interactions through testing for variability loci associated with sensitivity to individual environment/lifestyle. We present a genome-wide scan of loci associated with intra-pair differences in serum lipid and apolipoprotein levels. We report data for 1,720 monozygotic female twin pairs from GenomEUtwin project with 2.5 million SNPs, imputed or genotyped, and measured serum lipid fractions for both twins. We found one locus associated with intra-pair differences in high density lipoprotein (HDL) cholesterol, rs2483058 in an intron of SRGAP2, where twins carrying the C allele are more sensitive to environmental factors (p = 3.98 × 10−8). We followed up the association in further genotyped monozygotic twins (N = 1 261) which showed a moderate association for the variant (p = .002, same direction of an effect). In addition, we report a new association on the level of apolipoprotein A-II (p = 4.03 × 10−8).
twins; association; lipids; apolipoproteins; interaction
Schizophrenia is a highly heritable disorder. Genetic risk is conferred by a large number of alleles, including common alleles of small effect that might be detected by genome-wide association studies. Here, we report a multi-stage schizophrenia genome-wide association study of up to 36,989 cases and 113,075 controls. We identify 128 independent associations spanning 108 conservatively defined loci that meet genome-wide significance, 83 of which have not been previously reported. Associations were enriched among genes expressed in brain providing biological plausibility for the findings. Many findings have the potential to provide entirely novel insights into aetiology, but associations at DRD2 and multiple genes involved in glutamatergic neurotransmission highlight molecules of known and potential therapeutic relevance to schizophrenia, and are consistent with leading pathophysiological hypotheses. Independent of genes expressed in brain, associations were enriched among genes expressed in tissues that play important roles in immunity, providing support for the hypothesized link between the immune system and schizophrenia.
A balanced t(1;11) translocation which transects the Disrupted in schizophrenia 1 (DISC1) gene shows genome-wide significant linkage for schizophrenia and recurrent major depressive disorder in a single large Scottish family, but genome-wide and exome sequencing-based association studies have not supported a role for DISC1 in psychiatric illness. To explore DISC1 in more detail, we sequenced 528 kb of the DISC1 locus in 653 cases and 889 controls. We report 2,718 validated single nucleotide polymorphisms of which 2,010 have a minor allele frequency of less than 1%. Only 38% of these variants are reported in the 1000 Genomes Project European subset. This suggests that many DISC1 SNPs remain undiscovered and are essentially private. Rare coding variants identified exclusively in patients were found in likely functional protein domains. Significant region-wide association was observed between rs16856199 and recurrent major depressive disorder (P=0.026, unadjusted P=6.3 × 10−5, OR=3.48). This was not replicated in additional recurrent major depression samples (replication P=0.11). Combined analysis of both the original and replication set supported the original association (P=0.0058, OR=1.46). Evidence for segregation of this variant with disease in families was limited to those of rMDD individuals referred from primary care. Burden analysis for coding and non-coding variants gave nominal associations with diagnosis and measures of mood and cognition. Together, these observations are likely to generalise to other candidate genes for major mental illness and may thus provide guidelines for the design of future studies.
The main genetic determinant of soluble IL-6R levels is the missense variant rs2228145, which maps to the cleavage site of IL-6R. For each Ala allele, sIL-6R serum levels increase by ~20 ng/ml and asthma risk by 1.09-fold. However, this variant does not explain the total heritability for sIL-6R levels. Additional independent variants in IL6R may therefore contribute to variation in sIL-6R levels and influence asthma risk. We imputed 471 variants in IL6R and tested these for association with sIL-6R serum levels in 360 individuals. An intronic variant (rs12083537) was associated with sIL-6R levels independently of rs4129267 (P = 0.0005), a proxy SNP for rs2228145. A significant and consistent association for rs12083537 was observed in a replication panel of 354 individuals (P = 0.033). Each rs12083537:A allele increased sIL-6R serum levels by 2.4 ng/ml Analysis of mRNA levels in two cohorts did not identify significant associations between rs12083537 and IL6R transcription levels. On the other hand, results from 16 705 asthmatics and 30 809 controls showed that the rs12083537:A allele increased asthma risk by 1.04-fold (P = 0.0419). Genetic risk scores based on IL6R regulatory variants may prove useful in explaining variation in clinical response to tocilizumab, an anti-IL-6R monoclonal antibody.
allergy; eQTL; expression; disease
Heritability is a population parameter of importance in evolution, plant and animal breeding, and human medical genetics. It can be estimated using pedigree designs and, more recently, using relationships estimated from markers. We derive the sampling variance of the estimate of heritability for a wide range of experimental designs, assuming that estimation is by maximum likelihood and that the resemblance between relatives is solely due to additive genetic variation. We show that well-known results for balanced designs are special cases of a more general unified framework. For pedigree designs, the sampling variance is inversely proportional to the variance of relationship in the pedigree and it is proportional to 1/N, whereas for population samples it is approximately proportional to 1/N2, where N is the sample size. Variation in relatedness is a key parameter in the quantification of the sampling variance of heritability. Consequently, the sampling variance is high for populations with large recent effective population size (e.g., humans) because this causes low variation in relationship. However, even using human population samples, low sampling variance is possible with high N.
experimental design; genomic relationship; heritability; maximum likelihood; sampling variance
Epistasis is the phenomenon whereby one polymorphism’s effect on a trait depends on other polymorphisms present in the genome. The extent to which epistasis influences complex traits1 and contributes to their variation2,3 is a fundamental question in evolution and human genetics. Though often demonstrated in artificial gene manipulation studies in model organisms4,5, and some examples have been reported in other species6, few examples exist for epistasis amongst natural polymorphisms in human traits7,8. Its absence from empirical findings may simply be due to low incidence in the genetic control of complex traits2,3, but an alternative view is that it has previously been too technically challenging to detect due to statistical and computational issues9. Here we show that, using advanced computation10 and a gene expression study design, many instances of epistasis are found between common single nucleotide polymorphisms (SNPs). In a cohort of 846 individuals with 7339 gene expression levels measured in peripheral blood, we found 501 significant pairwise interactions between common SNPs influencing the expression of 238 genes (p < 2.91 × 10−16). Replication of these interactions in two independent data sets11,12 showed both concordance of direction of epistatic effects (p = 5.56 ×10−31) and enrichment of interaction p-values, with 30 being significant at a conservative threshold of p < 0.05/501. Forty-four of the genetic interactions are located within 2Mb of regions of known physical chromosome interactions13 (p = 1.8 × 10−10). Epistatic networks of three SNPs or more influence the expression levels of 129 genes, whereby one cis-acting SNP is modulated by several trans-acting SNPs. For example MBNL1 is influenced by an additive effect at rs13069559 which itself is masked by trans-SNPs on 14 different chromosomes, with nearly identical genotype-phenotype (GP) maps for each cis-trans interaction. This study presents the first evidence for multiple instances of segregating common polymorphisms interacting to influence human traits.
In Mendelian randomization (MR) studies, where genetic variants are used as proxy measures for an exposure trait of interest, obtaining adequate statistical power is frequently a concern due to the small amount of variation in a phenotypic trait that is typically explained by genetic variants. A range of power estimates based on simulations and specific parameters for two-stage least squares (2SLS) MR analyses based on continuous variables has previously been published. However there are presently no specific equations or software tools one can implement for calculating power of a given MR study. Using asymptotic theory, we show that in the case of continuous variables and a single instrument, for example a single-nucleotide polymorphism (SNP) or multiple SNP predictor, statistical power for a fixed sample size is a function of two parameters: the proportion of variation in the exposure variable explained by the genetic predictor and the true causal association between the exposure and outcome variable. We demonstrate that power for 2SLS MR can be derived using the non-centrality parameter (NCP) of the statistical test that is employed to test whether the 2SLS regression coefficient is zero. We show that the previously published power estimates from simulations can be represented theoretically using this NCP-based approach, with similar estimates observed when the simulation-based estimates are compared with our NCP-based approach. General equations for calculating statistical power for 2SLS MR using the NCP are provided in this note, and we implement the calculations in a web-based application.
Power; Mendelian randomization; non-centrality parameter; instrumental variable
genome-wide association study; epidemiology; Mendelian randomization; interaction; polygene score
A major challenge in human genetics is to devise a systematic strategy to integrate disease-associated variants with diverse genomic and biological datasets to provide insight into disease pathogenesis and guide drug discovery for complex traits such as rheumatoid arthritis (RA)1. Here, we performed a genome-wide association study (GWAS) meta-analysis in a total of >100,000 subjects of European and Asian ancestries (29,880 RA cases and 73,758 controls), by evaluating ~10 million single nucleotide polymorphisms (SNPs). We discovered 42 novel RA risk loci at a genome-wide level of significance, bringing the total to 1012–4. We devised an in-silico pipeline using established bioinformatics methods based on functional annotation5, cis-acting expression quantitative trait loci (cis-eQTL)6, and pathway analyses7–9 – as well as novel methods based on genetic overlap with human primary immunodeficiency (PID), hematological cancer somatic mutations and knock-out mouse phenotypes – to identify 98 biological candidate genes at these 101 risk loci. We demonstrate that these genes are the targets of approved therapies for RA, and further suggest that drugs approved for other indications may be repurposed for the treatment of RA. Together, this comprehensive genetic study sheds light on fundamental genes, pathways and cell types that contribute to RA pathogenesis, and provides empirical evidence that the genetics of RA can provide important information for drug discovery.
Mixed linear models are emerging as a method of choice for conducting genetic association studies in humans and other organisms. The advantages of mixed linear model association (MLMA) include preventing false-positive associations due to population or relatedness structure, and increasing power by applying a correction that is specific to this structure. An underappreciated point is that MLMA can also increase power in studies without sample structure, by implicitly conditioning on associated loci other than the candidate locus. Numerous variations on the standard MLMA approach have recently been published, with a focus on reducing computational cost. These advances provide researchers applying MLMA methods with many options to choose from, but we caution that MLMA methods are still subject to potential pitfalls. Here, we describe and quantify the advantages and pitfalls of MLMA methods as a function of study design, and provide recommendations for the application of these methods in practical settings.
Family studies are consistent with genetic effects making substantial contributions to risk of psychiatric disorders such as schizophrenia, yet robust identification of specific genetic variants that explain variation in population risk had been disappointing until the advent of technologies that assay the entire genome in large samples. We highlight recent progress that has led to a better understanding of the number of risk variants in the population and the interaction of allele frequency and effect size. The emerging genetic architecture implies a large number of contributing loci (that is, a high genome-wide mutational target) and suggests that genetic risk to psychiatric disorders involves the combined effects of many common variants of small effect, as well as rare and de novo variants of large effect. The capture of a substantial proportion of genetic risk facilitates new study designs to investigate the combined effects of genes and the environment.
The success of genome-wide association studies has led to increasing interest in making predictions of complex trait phenotypes including disease from genotype data. Rigorous assessment of the value of predictors is critical before implementation. Here we discuss some of the limitations and pitfalls of prediction analysis and show how naïve implementations can lead to severe bias and misinterpretation of results.
Despite the important role DNA methylation plays in transcriptional regulation, the transgenerational inheritance of DNA methylation is not well understood. The genetic heritability of DNA methylation has been estimated using twin pairs, although concern has been expressed whether the underlying assumption of equal common environmental effects are applicable due to intrauterine differences between monozygotic and dizygotic twins. We estimate the heritability of DNA methylation on peripheral blood leukocytes using Illumina HumanMethylation450 array using a family based sample of 614 people from 117 families, allowing comparison both within and across generations.
The correlations from the various available relative pairs indicate that on average the similarity in DNA methylation between relatives is predominantly due to genetic effects with any common environmental or zygotic effects being limited. The average heritability of DNA methylation measured at probes with no known SNPs is estimated as 0.187. The ten most heritable methylation probes were investigated with a genome-wide association study, all showing highly statistically significant cis mQTLs. Further investigation of one of these cis mQTL, found in the MHC region of chromosome 6, showed the most significantly associated SNP was also associated with over 200 other DNA methylation probes in this region and the gene expression level of 9 genes.
The majority of transgenerational similarity in DNA methylation is attributable to genetic effects, and approximately 20% of individual differences in DNA methylation in the population are caused by DNA sequence variation that is not located within CpG sites.
Understanding genetic variation of complex traits in human populations has moved from the quantification of the resemblance between close relatives to the dissection of genetic variation into the contributions of individual genomic loci. But major questions remain unanswered: how much phenotypic variation is genetic, how much of the genetic variation is additive and what is the joint distribution of effect size and allele frequency at causal variants? We review and compare three whole-genome analysis methods that use mixed linear models (MLM) to estimate genetic variation, using the relationship between close or distant relatives based on pedigree or SNPs. We discuss theory, estimation procedures, bias and precision of each method and review recent advances in the dissection of additive genetic variation of complex traits in human populations that are based upon the application of MLM. Using genome wide data, SNPs account for far more of the genetic variation than the highly significant SNPs associated with a trait, but they do not account for all of the genetic variance estimated by pedigree based methods. We explain possible reasons for this ‘missing’ heritability.
Quantitative traits; whole genome methods; additive variance; genomic relationship; mixed linear model; genetic architecture
Education, socioeconomic status, and intelligence are commonly used as predictors of health outcomes, social environment, and mortality. Education and socioeconomic status are typically viewed as environmental variables although both correlate with intelligence, which has a substantial genetic basis. Using data from 6815 unrelated subjects from the Generation Scotland study, we examined the genetic contributions to these variables and their genetic correlations. Subjects underwent genome-wide testing for common single nucleotide polymorphisms (SNPs). DNA-derived heritability estimates and genetic correlations were calculated using the ‘Genome-wide Complex Trait Analyses’ (GCTA) procedures. 21% of the variation in education, 18% of the variation in socioeconomic status, and 29% of the variation in general cognitive ability was explained by variation in common SNPs (SEs ~ 5%). The SNP-based genetic correlations of education and socioeconomic status with general intelligence were 0.95 (SE 0.13) and 0.26 (0.16), respectively. There are genetic contributions to intelligence and education with near-complete overlap between common additive SNP effects on these traits (genetic correlation ~ 1). Genetic influences on socioeconomic status are also associated with the genetic foundations of intelligence. The results are also compatible with substantial environmental contributions to socioeconomic status.
•Generation Scotland is a large family-based cohort of ~ 24,000 people.•We investigate the genetic influences on education, SES, and intelligence.•Both DNA-based (subset of ~ 6500) and pedigree-based analyses are used.•Genetic effects on SES and education are linked to the genetic basis of intelligence.•There are also substantial environmental effects on all three traits.
Generation Scotland; Intelligence; Education; Socioeconomic status; Genetics
Identifying the downstream effects of disease-associated single nucleotide polymorphisms (SNPs) is challenging: the causal gene is often unknown or it is unclear how the SNP affects the causal gene, making it difficult to design experiments that reveal functional consequences. To help overcome this problem, we performed the largest expression quantitative trait locus (eQTL) meta-analysis so far reported in non-transformed peripheral blood samples of 5,311 individuals, with replication in 2,775 individuals. We identified and replicated trans-eQTLs for 233 SNPs (reflecting 103 independent loci) that were previously associated with complex traits at genome-wide significance. Although we did not study specific patient cohorts, we identified trait-associated SNPs that affect multiple trans-genes that are known to be markedly altered in patients: for example, systemic lupus erythematosus (SLE) SNP rs49170141 altered C1QB and five type 1 interferon response genes, both hallmarks of SLE2-4. Subsequent ChIP-seq data analysis on these trans-genes implicated transcription factor IKZF1 as the causal gene at this locus, with DeepSAGE RNA-sequencing revealing that rs4917014 strongly alters 3’ UTR levels of IKZF1. Variants associated with cholesterol metabolism and type 1 diabetes showed similar phenomena, indicating that large-scale eQTL mapping provides insight into the downstream effects of many trait-associated variants.
We have recently developed analysis methods (GREML) to estimate the genetic variance of a complex trait/disease and the genetic correlation between two complex traits/diseases using genome-wide single nucleotide polymorphism (SNP) data in unrelated individuals. Here we use analytical derivations and simulations to quantify the sampling variance of the estimate of the proportion of phenotypic variance captured by all SNPs for quantitative traits and case-control studies. We also derive the approximate sampling variance of the estimate of a genetic correlation in a bivariate analysis, when two complex traits are either measured on the same or different individuals. We show that the sampling variance is inversely proportional to the number of pairwise contrasts in the analysis and to the variance in SNP-derived genetic relationships. For bivariate analysis, the sampling variance of the genetic correlation additionally depends on the harmonic mean of the proportion of variance explained by the SNPs for the two traits and the genetic correlation between the traits, and depends on the phenotypic correlation when the traits are measured on the same individuals. We provide an online tool for calculating the power of detecting genetic (co)variation using genome-wide SNP data. The new theory and online tool will be helpful to plan experimental designs to estimate the missing heritability that has not yet been fully revealed through genome-wide association studies, and to estimate the genetic overlap between complex traits (diseases) in particular when the traits (diseases) are not measured on the same samples.
Genome-wide association studies (GWAS) have identified thousands of genetic variants for hundreds of traits and diseases. However, the genetic variants discovered from GWAS only explained a small fraction of the heritability, resulting in the question of “missing heritability”. We have recently developed approaches (called GREML) to estimate the overall contribution of all SNPs to the phenotypic variance of a trait (disease) and the proportion of genetic overlap between traits (diseases). A frequently asked question is that how many samples are required to estimate the proportion of variance attributable to all SNPs and the proportion of genetic overlap with useful precision. In this study, we derive the standard errors of the estimated parameters from theory and find that they are highly consistent with those observed values from published results and those obtained from simulation. The theory together with an online application tool will be helpful to plan experimental design to quantify the missing heritability, and to estimate the genetic overlap between traits (diseases) especially when it is unfeasible to have the traits (diseases) measured on the same individuals.
Greater height and higher intelligence test scores are predictors of better health outcomes. Here, we used molecular (single-nucleotide polymorphism) data to estimate the genetic correlation between height and general intelligence (g) in 6,815 unrelated subjects (median age 57, IQR 49–63) from the Generation Scotland: Scottish Family Health Study cohort. The phenotypic correlation between height and g was 0.16 (SE 0.01). The genetic correlation between height and g was 0.28 (SE 0.09) with a bivariate heritability estimate of 0.71. Understanding the molecular basis of the correlation between height and intelligence may help explain any shared role in determining health outcomes. This study identified a modest genetic correlation between height and intelligence with the majority of the phenotypic correlation being explained by shared genetic influences.
Electronic supplementary material
The online version of this article (doi:10.1007/s10519-014-9644-z) contains supplementary material, which is available to authorized users.
Height; Intelligence; Molecular genetics; Genetic correlation; Generation Scotland
People meeting diagnostic criteria for anxiety or depressive disorders tend to score high on the personality scale of neuroticism. Studying this personality dimension can give insights into the aetiology of these important psychiatric disorders.
To undertake a comprehensive genome-wide linkage study of neuroticism, using large study samples that have been measured multiple times. To compare the results between countries for replication and across time within countries for consistency.
Genome wide linkage scan.
Twin individuals and their family members from Australia (AU) and the Netherlands (NL).
19,635 sibling pairs completed self-report questionnaires for neuroticism up to five times over a period of up to 22 years. 5,069 sibling pairs were genotyped with microsatellite markers.
Non-parametric linkage analyses were conducted in Merlin-Regress for the mean neuroticism scores averaged across time. Additional analyses were conducted for the time specific measures of neuroticism from each country to investigate consistency of linkage results.
Three chromosomal regions exceeded empirically-derived thresholds for suggestive linkage using mean neuroticism scores: 10p 5 cM (NL), 14q 103 cM (NL) and 18q 117 cM (AU & NL combined), but only 14q retains significance after correction for multiple testing. These regions all showed evidence for linkage in individual time-specific measures of neuroticism and one (18q) showed some evidence for replication between countries. Linkage intervals for these regions all overlap with regions identified in other studies of neuroticism or related traits and/or in studies of anxiety in mice.
Our results demonstrate the value of the availability of multiple measures over time and add to the optimism reported in recent reviews for replication of linkage regions for neuroticism. These regions are likely to harbour causal variants for neuroticism and its related psychiatric disorders and can inform prioritisation of results from genome-wide association studies.
Personality traits are basic dimensions of behavioural variation, and twin, family, and adoption studies show that around 30% of the between-individual variation is due to genetic variation. There is rapidly-growing interest in understanding the evolutionary basis of this genetic variation. Several evolutionary mechanisms could explain how genetic variation is maintained in traits, and each of these makes predictions in terms of the relative contribution of rare and common genetic variants to personality variation, the magnitude of nonadditive genetic influences, and whether personality is affected by inbreeding. Using genome-wide SNP data from >8,000 individuals, we estimated that little variation in the Cloninger personality dimensions (7.2% on average) is due to the combined effect of common, additive genetic variants across the genome, suggesting that most heritable variation in personality is due to rare variant effects and/or a combination of dominance and epistasis. Furthermore, higher levels of inbreeding were associated with less socially-desirable personality trait levels in three of the four personality dimensions. These findings are consistent with genetic variation in personality traits having been maintained by mutation-selection balance.
balancing selection; mutation-selection balance; antagonistic pleiotropy; correlational selection; neutral; trade-offs; personality; temperament; mutation; evolution; behavioural syndromes
A genome-wide association study of educational attainment was conducted in a discovery sample of 101,069 individuals and a replication sample of 25,490. Three independent SNPs are genome-wide significant (rs9320913, rs11584700, rs4851266), and all three replicate. Estimated effects sizes are small (R2 ≈ 0.02%), approximately 1 month of schooling per allele. A linear polygenic score from all measured SNPs accounts for ≈ 2% of the variance in both educational attainment and cognitive function. Genes in the region of the loci have previously been associated with health, cognitive, and central nervous system phenotypes, and bioinformatics analyses suggest the involvement of the anterior caudate nucleus. These findings provide promising candidate SNPs for follow-up work, and our effect size estimates can anchor power analyses in social-science genetics.