Search tips
Search criteria

Results 1-25 (1679569)

Clipboard (0)

Related Articles

1.  Genetic Predisposition to Increased Blood Cholesterol and Triglyceride Lipid Levels and Risk of Alzheimer Disease: A Mendelian Randomization Analysis 
PLoS Medicine  2014;11(9):e1001713.
In this study, Proitsi and colleagues use a Mendelian randomization approach to dissect the causal nature of the association between circulating lipid levels and late onset Alzheimer's Disease (LOAD) and find that genetic predisposition to increased plasma cholesterol and triglyceride lipid levels is not associated with elevated LOAD risk.
Please see later in the article for the Editors' Summary
Although altered lipid metabolism has been extensively implicated in the pathogenesis of Alzheimer disease (AD) through cell biological, epidemiological, and genetic studies, the molecular mechanisms linking cholesterol and AD pathology are still not well understood and contradictory results have been reported. We have used a Mendelian randomization approach to dissect the causal nature of the association between circulating lipid levels and late onset AD (LOAD) and test the hypothesis that genetically raised lipid levels increase the risk of LOAD.
Methods and Findings
We included 3,914 patients with LOAD, 1,675 older individuals without LOAD, and 4,989 individuals from the general population from six genome wide studies drawn from a white population (total n = 10,578). We constructed weighted genotype risk scores (GRSs) for four blood lipid phenotypes (high-density lipoprotein cholesterol [HDL-c], low-density lipoprotein cholesterol [LDL-c], triglycerides, and total cholesterol) using well-established SNPs in 157 loci for blood lipids reported by Willer and colleagues (2013). Both full GRSs using all SNPs associated with each trait at p<5×10−8 and trait specific scores using SNPs associated exclusively with each trait at p<5×10−8 were developed. We used logistic regression to investigate whether the GRSs were associated with LOAD in each study and results were combined together by meta-analysis. We found no association between any of the full GRSs and LOAD (meta-analysis results: odds ratio [OR] = 1.005, 95% CI 0.82–1.24, p = 0.962 per 1 unit increase in HDL-c; OR = 0.901, 95% CI 0.65–1.25, p = 0.530 per 1 unit increase in LDL-c; OR = 1.104, 95% CI 0.89–1.37, p = 0.362 per 1 unit increase in triglycerides; and OR = 0.954, 95% CI 0.76–1.21, p = 0.688 per 1 unit increase in total cholesterol). Results for the trait specific scores were similar; however, the trait specific scores explained much smaller phenotypic variance.
Genetic predisposition to increased blood cholesterol and triglyceride lipid levels is not associated with elevated LOAD risk. The observed epidemiological associations between abnormal lipid levels and LOAD risk could therefore be attributed to the result of biological pleiotropy or could be secondary to LOAD. Limitations of this study include the small proportion of lipid variance explained by the GRS, biases in case-control ascertainment, and the limitations implicit to Mendelian randomization studies. Future studies should focus on larger LOAD datasets with longitudinal sampled peripheral lipid measures and other markers of lipid metabolism, which have been shown to be altered in LOAD.
Please see later in the article for the Editors' Summary
Editors' Summary
Currently, about 44 million people worldwide have dementia, a group of brain disorders characterized by an irreversible decline in memory, communication, and other “cognitive” functions. Dementia mainly affects older people and, because people are living longer, experts estimate that more than 135 million people will have dementia by 2050. The commonest form of dementia is Alzheimer disease. In this type of dementia, protein clumps called plaques and neurofibrillary tangles form in the brain and cause its degeneration. The earliest sign of Alzheimer disease is usually increasing forgetfulness. As the disease progresses, affected individuals gradually lose their ability to deal with normal daily activities such as dressing. They may become anxious or aggressive or begin to wander. They may also eventually lose control of their bladder and of other physical functions. At present, there is no cure for Alzheimer disease although some of its symptoms can be managed with drugs. Most people with the disease are initially cared for at home by relatives and other unpaid carers, but many patients end their days in a care home or specialist nursing home.
Why Was This Study Done?
Several lines of evidence suggest that lipid metabolism (how the body handles cholesterol and other fats) is altered in patients whose Alzheimer disease develops after the age of 60 years (late onset Alzheimer disease, LOAD). In particular, epidemiological studies (observational investigations that examine the patterns and causes of disease in populations) have found an association between high amounts of cholesterol in the blood in midlife and the risk of LOAD. However, observational studies cannot prove that abnormal lipid metabolism (dyslipidemia) causes LOAD. People with dyslipidemia may share other characteristics that cause both dyslipidemia and LOAD (confounding) or LOAD might actually cause dyslipidemia (reverse causation). Here, the researchers use “Mendelian randomization” to examine whether lifetime changes in lipid metabolism caused by genes have a causal impact on LOAD risk. In Mendelian randomization, causality is inferred from associations between genetic variants that mimic the effect of a modifiable risk factor and the outcome of interest. Because gene variants are inherited randomly, they are not prone to confounding and are free from reverse causation. So, if dyslipidemia causes LOAD, genetic variants that affect lipid metabolism should be associated with an altered risk of LOAD.
What Did the Researchers Do and Find?
The researchers investigated whether genetic predisposition to raised lipid levels increased the risk of LOAD in 10,578 participants (3,914 patients with LOAD, 1,675 elderly people without LOAD, and 4,989 population controls) using data collected in six genome wide studies looking for gene variants associated with Alzheimer disease. The researchers constructed a genotype risk score (GRS) for each participant using genetic risk markers for four types of blood lipids on the basis of the presence of single nucleotide polymorphisms (SNPs, a type of gene variant) in their DNA. When the researchers used statistical methods to investigate the association between the GRS and LOAD among all the study participants, they found no association between the GRS and LOAD.
What Do These Findings Mean?
These findings suggest that the genetic predisposition to raised blood levels of four types of lipid is not causally associated with LOAD risk. The accuracy of this finding may be affected by several limitations of this study, including the small proportion of lipid variance explained by the GRS and the validity of several assumptions that underlie all Mendelian randomization studies. Moreover, because all the participants in this study were white, these findings may not apply to people of other ethnic backgrounds. Given their findings, the researchers suggest that the observed epidemiological associations between abnormal lipid levels in the blood and variation in lipid levels for reasons other than genetics, or to LOAD risk could be secondary to variation in lipid levels for reasons other than genetics, or to LOAD, a possibility that can be investigated by studying blood lipid levels and other markers of lipid metabolism over time in large groups of patients with LOAD. Importantly, however, these findings provide new information about the role of lipids in LOAD development that may eventually lead to new therapeutic and public-health interventions for Alzheimer disease.
Additional Information
Please access these websites via the online version of this summary at
The UK National Health Service Choices website provides information (including personal stories) about Alzheimer's disease
The UK not-for-profit organization Alzheimer's Society provides information for patients and carers about dementia, including personal experiences of living with Alzheimer's disease
The US not-for-profit organization Alzheimer's Association also provides information for patients and carers about dementia and personal stories about dementia
Alzheimer's Disease International is the international federation of Alzheimer disease associations around the world; it provides links to individual associations, information about dementia, and links to World Alzheimer Reports
MedlinePlus provides links to additional resources about Alzheimer's disease (in English and Spanish)
Wikipedia has a page on Mendelian randomization (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
PMCID: PMC4165594  PMID: 25226301
2.  The Role of Adiposity in Cardiometabolic Traits: A Mendelian Randomization Analysis 
Fall, Tove | Hägg, Sara | Mägi, Reedik | Ploner, Alexander | Fischer, Krista | Horikoshi, Momoko | Sarin, Antti-Pekka | Thorleifsson, Gudmar | Ladenvall, Claes | Kals, Mart | Kuningas, Maris | Draisma, Harmen H. M. | Ried, Janina S. | van Zuydam, Natalie R. | Huikari, Ville | Mangino, Massimo | Sonestedt, Emily | Benyamin, Beben | Nelson, Christopher P. | Rivera, Natalia V. | Kristiansson, Kati | Shen, Huei-yi | Havulinna, Aki S. | Dehghan, Abbas | Donnelly, Louise A. | Kaakinen, Marika | Nuotio, Marja-Liisa | Robertson, Neil | de Bruijn, Renée F. A. G. | Ikram, M. Arfan | Amin, Najaf | Balmforth, Anthony J. | Braund, Peter S. | Doney, Alexander S. F. | Döring, Angela | Elliott, Paul | Esko, Tõnu | Franco, Oscar H. | Gretarsdottir, Solveig | Hartikainen, Anna-Liisa | Heikkilä, Kauko | Herzig, Karl-Heinz | Holm, Hilma | Hottenga, Jouke Jan | Hyppönen, Elina | Illig, Thomas | Isaacs, Aaron | Isomaa, Bo | Karssen, Lennart C. | Kettunen, Johannes | Koenig, Wolfgang | Kuulasmaa, Kari | Laatikainen, Tiina | Laitinen, Jaana | Lindgren, Cecilia | Lyssenko, Valeriya | Läärä, Esa | Rayner, Nigel W. | Männistö, Satu | Pouta, Anneli | Rathmann, Wolfgang | Rivadeneira, Fernando | Ruokonen, Aimo | Savolainen, Markku J. | Sijbrands, Eric J. G. | Small, Kerrin S. | Smit, Jan H. | Steinthorsdottir, Valgerdur | Syvänen, Ann-Christine | Taanila, Anja | Tobin, Martin D. | Uitterlinden, Andre G. | Willems, Sara M. | Willemsen, Gonneke | Witteman, Jacqueline | Perola, Markus | Evans, Alun | Ferrières, Jean | Virtamo, Jarmo | Kee, Frank | Tregouet, David-Alexandre | Arveiler, Dominique | Amouyel, Philippe | Ferrario, Marco M. | Brambilla, Paolo | Hall, Alistair S. | Heath, Andrew C. | Madden, Pamela A. F. | Martin, Nicholas G. | Montgomery, Grant W. | Whitfield, John B. | Jula, Antti | Knekt, Paul | Oostra, Ben | van Duijn, Cornelia M. | Penninx, Brenda W. J. H. | Davey Smith, George | Kaprio, Jaakko | Samani, Nilesh J. | Gieger, Christian | Peters, Annette | Wichmann, H.-Erich | Boomsma, Dorret I. | de Geus, Eco J. C. | Tuomi, TiinaMaija | Power, Chris | Hammond, Christopher J. | Spector, Tim D. | Lind, Lars | Orho-Melander, Marju | Palmer, Colin Neil Alexander | Morris, Andrew D. | Groop, Leif | Järvelin, Marjo-Riitta | Salomaa, Veikko | Vartiainen, Erkki | Hofman, Albert | Ripatti, Samuli | Metspalu, Andres | Thorsteinsdottir, Unnur | Stefansson, Kari | Pedersen, Nancy L. | McCarthy, Mark I. | Ingelsson, Erik | Prokopenko, Inga
PLoS Medicine  2013;10(6):e1001474.
In this study, Prokopenko and colleagues provide novel evidence for causal relationship between adiposity and heart failure and increased liver enzymes using a Mendelian randomization study design.
Please see later in the article for the Editors' Summary
The association between adiposity and cardiometabolic traits is well known from epidemiological studies. Whilst the causal relationship is clear for some of these traits, for others it is not. We aimed to determine whether adiposity is causally related to various cardiometabolic traits using the Mendelian randomization approach.
Methods and Findings
We used the adiposity-associated variant rs9939609 at the FTO locus as an instrumental variable (IV) for body mass index (BMI) in a Mendelian randomization design. Thirty-six population-based studies of individuals of European descent contributed to the analyses.
Age- and sex-adjusted regression models were fitted to test for association between (i) rs9939609 and BMI (n = 198,502), (ii) rs9939609 and 24 traits, and (iii) BMI and 24 traits. The causal effect of BMI on the outcome measures was quantified by IV estimators. The estimators were compared to the BMI–trait associations derived from the same individuals. In the IV analysis, we demonstrated novel evidence for a causal relationship between adiposity and incident heart failure (hazard ratio, 1.19 per BMI-unit increase; 95% CI, 1.03–1.39) and replicated earlier reports of a causal association with type 2 diabetes, metabolic syndrome, dyslipidemia, and hypertension (odds ratio for IV estimator, 1.1–1.4; all p<0.05). For quantitative traits, our results provide novel evidence for a causal effect of adiposity on the liver enzymes alanine aminotransferase and gamma-glutamyl transferase and confirm previous reports of a causal effect of adiposity on systolic and diastolic blood pressure, fasting insulin, 2-h post-load glucose from the oral glucose tolerance test, C-reactive protein, triglycerides, and high-density lipoprotein cholesterol levels (all p<0.05). The estimated causal effects were in agreement with traditional observational measures in all instances except for type 2 diabetes, where the causal estimate was larger than the observational estimate (p = 0.001).
We provide novel evidence for a causal relationship between adiposity and heart failure as well as between adiposity and increased liver enzymes.
Please see later in the article for the Editors' Summary
Editors' Summary
Cardiovascular disease (CVD)—disease that affects the heart and/or the blood vessels—is a major cause of illness and death worldwide. In the US, for example, coronary heart disease—a CVD in which narrowing of the heart's blood vessels by fatty deposits slows the blood supply to the heart and may eventually cause a heart attack—is the leading cause of death, and stroke—a CVD in which the brain's blood supply is interrupted—is the fourth leading cause of death. Globally, both the incidence of CVD (the number of new cases in a population every year) and its prevalence (the proportion of the population with CVD) are increasing, particularly in low- and middle-income countries. This increasing burden of CVD is occurring in parallel with a global increase in the incidence and prevalence of obesity—having an unhealthy amount of body fat (adiposity)—and of metabolic diseases—conditions such as diabetes in which metabolism (the processes that the body uses to make energy from food) is disrupted, with resulting high blood sugar and damage to the blood vessels.
Why Was This Study Done?
Epidemiological studies—investigations that record the patterns and causes of disease in populations—have reported an association between adiposity (indicated by an increased body mass index [BMI], which is calculated by dividing body weight in kilograms by height in meters squared) and cardiometabolic traits such as coronary heart disease, stroke, heart failure (a condition in which the heart is incapable of pumping sufficient amounts of blood around the body), diabetes, high blood pressure (hypertension), and high blood cholesterol (dyslipidemia). However, observational studies cannot prove that adiposity causes any particular cardiometabolic trait because overweight individuals may share other characteristics (confounding factors) that are the real causes of both obesity and the cardiometabolic disease. Moreover, it is possible that having CVD or a metabolic disease causes obesity (reverse causation). For example, individuals with heart failure cannot do much exercise, so heart failure may cause obesity rather than vice versa. Here, the researchers use “Mendelian randomization” to examine whether adiposity is causally related to various cardiometabolic traits. Because gene variants are inherited randomly, they are not prone to confounding and are free from reverse causation. It is known that a genetic variant (rs9939609) within the genome region that encodes the fat-mass- and obesity-associated gene (FTO) is associated with increased BMI. Thus, an investigation of the associations between rs9939609 and cardiometabolic traits can indicate whether obesity is causally related to these traits.
What Did the Researchers Do and Find?
The researchers analyzed the association between rs9939609 (the “instrumental variable,” or IV) and BMI, between rs9939609 and 24 cardiometabolic traits, and between BMI and the same traits using genetic and health data collected in 36 population-based studies of nearly 200,000 individuals of European descent. They then quantified the strength of the causal association between BMI and the cardiometabolic traits by calculating “IV estimators.” Higher BMI showed a causal relationship with heart failure, metabolic syndrome (a combination of medical disorders that increases the risk of developing CVD), type 2 diabetes, dyslipidemia, hypertension, increased blood levels of liver enzymes (an indicator of liver damage; some metabolic disorders involve liver damage), and several other cardiometabolic traits. All the IV estimators were similar to the BMI–cardiovascular trait associations (observational estimates) derived from the same individuals, with the exception of diabetes, where the causal estimate was higher than the observational estimate, probably because the observational estimate is based on a single BMI measurement, whereas the causal estimate considers lifetime changes in BMI.
What Do These Findings Mean?
Like all Mendelian randomization studies, the reliability of the causal associations reported here depends on several assumptions made by the researchers. Nevertheless, these findings provide support for many previously suspected and biologically plausible causal relationships, such as that between adiposity and hypertension. They also provide new insights into the causal effect of obesity on liver enzyme levels and on heart failure. In the latter case, these findings suggest that a one-unit increase in BMI might increase the incidence of heart failure by 17%. In the US, this corresponds to 113,000 additional cases of heart failure for every unit increase in BMI at the population level. Although additional studies are needed to confirm and extend these findings, these results suggest that global efforts to reduce the burden of obesity will likely also reduce the occurrence of CVD and metabolic disorders.
Additional Information
Please access these websites via the online version of this summary at
The American Heart Association provides information on all aspects of cardiovascular disease and tips on keeping the heart healthy, including weight management (in several languages); its website includes personal stories about stroke and heart attacks
The US Centers for Disease Control and Prevention has information on heart disease, stroke, and all aspects of overweight and obesity (in English and Spanish)
The UK National Health Service Choices website provides information about cardiovascular disease and obesity, including a personal story about losing weight
The World Health Organization provides information on obesity (in several languages)
The International Obesity Taskforce provides information about the global obesity epidemic
Wikipedia has a page on Mendelian randomization (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
MedlinePlus provides links to other sources of information on heart disease, on vascular disease, on obesity, and on metabolic disorders (in English and Spanish)
The International Association for the Study of Obesity provides maps and information about obesity worldwide
The International Diabetes Federation has a web page that describes types, complications, and risk factors of diabetes
PMCID: PMC3692470  PMID: 23824655
3.  A Meta-Analysis of Genome-Wide Association Scans Identifies IL18RAP, PTPN2, TAGAP, and PUS10 As Shared Risk Loci for Crohn's Disease and Celiac Disease 
PLoS Genetics  2011;7(1):e1001283.
Crohn's disease (CD) and celiac disease (CelD) are chronic intestinal inflammatory diseases, involving genetic and environmental factors in their pathogenesis. The two diseases can co-occur within families, and studies suggest that CelD patients have a higher risk to develop CD than the general population. These observations suggest that CD and CelD may share common genetic risk loci. Two such shared loci, IL18RAP and PTPN2, have already been identified independently in these two diseases. The aim of our study was to explicitly identify shared risk loci for these diseases by combining results from genome-wide association study (GWAS) datasets of CD and CelD. Specifically, GWAS results from CelD (768 cases, 1,422 controls) and CD (3,230 cases, 4,829 controls) were combined in a meta-analysis. Nine independent regions had nominal association p-value <1.0×10−5 in this meta-analysis and showed evidence of association to the individual diseases in the original scans (p-value <1×10−2 in CelD and <1×10−3 in CD). These include the two previously reported shared loci, IL18RAP and PTPN2, with p-values of 3.37×10−8 and 6.39×10−9, respectively, in the meta-analysis. The other seven had not been reported as shared loci and thus were tested in additional CelD (3,149 cases and 4,714 controls) and CD (1,835 cases and 1,669 controls) cohorts. Two of these loci, TAGAP and PUS10, showed significant evidence of replication (Bonferroni corrected p-values <0.0071) in the combined CelD and CD replication cohorts and were firmly established as shared risk loci of genome-wide significance, with overall combined p-values of 1.55×10−10 and 1.38×10−11 respectively. Through a meta-analysis of GWAS data from CD and CelD, we have identified four shared risk loci: PTPN2, IL18RAP, TAGAP, and PUS10. The combined analysis of the two datasets provided the power, lacking in the individual GWAS for single diseases, to detect shared loci with a relatively small effect.
Author Summary
Celiac disease and Crohn's disease are both chronic inflammatory diseases of the digestive tract. Both of these diseases are complex genetic traits with multiple genetic and non-genetic risk factors. Recent genome-wide association (GWA) studies have identified some of the genetic risk factors for these diseases. Interestingly, in addition to some similarities in phenotype, these studies have shown that CelD and CD share some genetic risk factors. Specifically, by comparing the results of independent GWA studies of CD and CelD, two genetic risk loci were found in common: the PTPN2 locus and the IL18RAP locus. Therefore, in order to directly test for additional shared genetic risk factors, we combined the GWA results from two large studies of CelD and CD, essentially creating a combined phenotype with anyone with CD or CelD being coded as affected. Association results were then replicated in additional cohorts of CelD and CD. It is expected that shared risk loci should show association in this analysis, whereas the signal of risk loci specific to either of the two diseases should be diluted. With this method of meta-analysis, we identified next to PTPN2 and IL18 RAP two loci harbouring TAGAP and PUS10 as shared risk loci for Crohn's disease and celiac disease at genome-wide significance.
PMCID: PMC3029251  PMID: 21298027
4.  Proteins Encoded in Genomic Regions Associated with Immune-Mediated Disease Physically Interact and Suggest Underlying Biology 
PLoS Genetics  2011;7(1):e1001273.
Genome-wide association studies (GWAS) have defined over 150 genomic regions unequivocally containing variation predisposing to immune-mediated disease. Inferring disease biology from these observations, however, hinges on our ability to discover the molecular processes being perturbed by these risk variants. It has previously been observed that different genes harboring causal mutations for the same Mendelian disease often physically interact. We sought to evaluate the degree to which this is true of genes within strongly associated loci in complex disease. Using sets of loci defined in rheumatoid arthritis (RA) and Crohn's disease (CD) GWAS, we build protein–protein interaction (PPI) networks for genes within associated loci and find abundant physical interactions between protein products of associated genes. We apply multiple permutation approaches to show that these networks are more densely connected than chance expectation. To confirm biological relevance, we show that the components of the networks tend to be expressed in similar tissues relevant to the phenotypes in question, suggesting the network indicates common underlying processes perturbed by risk loci. Furthermore, we show that the RA and CD networks have predictive power by demonstrating that proteins in these networks, not encoded in the confirmed list of disease associated loci, are significantly enriched for association to the phenotypes in question in extended GWAS analysis. Finally, we test our method in 3 non-immune traits to assess its applicability to complex traits in general. We find that genes in loci associated to height and lipid levels assemble into significantly connected networks but did not detect excess connectivity among Type 2 Diabetes (T2D) loci beyond chance. Taken together, our results constitute evidence that, for many of the complex diseases studied here, common genetic associations implicate regions encoding proteins that physically interact in a preferential manner, in line with observations in Mendelian disease.
Author Summary
Genome-wide association studies have uncovered hundreds of DNA changes associated with complex disease. The ultimate promise of these studies is the understanding of disease biology; this goal, however, is not easily achieved because each disease has yielded numerous associations, each one pointing to a region of the genome, rather than a specific causal mutation. Presumably, the causal variants affect components of common molecular processes, and a first step in understanding the disease biology perturbed in patients is to identify connections among regions associated to disease. Since it has been reported in numerous Mendelian diseases that protein products of causal genes tend to physically bind each other, we chose to approach this problem using known protein–protein interactions to test whether any of the products of genes in five complex trait-associated loci bind each other. We applied several permutation methods and find robustly significant connectivity within four of the traits. In Crohn's disease and rheumatoid arthritis, we are able to show that these genes are co-expressed and that other proteins emerging in the network are enriched for association to disease. These findings suggest that, for the complex traits studied here, associated loci contain variants that affect common molecular processes, rather than distinct mechanisms specific to each association.
PMCID: PMC3020935  PMID: 21249183
5.  Integrating Functional Data to Prioritize Causal Variants in Statistical Fine-Mapping Studies 
PLoS Genetics  2014;10(10):e1004722.
Standard statistical approaches for prioritization of variants for functional testing in fine-mapping studies either use marginal association statistics or estimate posterior probabilities for variants to be causal under simplifying assumptions. Here, we present a probabilistic framework that integrates association strength with functional genomic annotation data to improve accuracy in selecting plausible causal variants for functional validation. A key feature of our approach is that it empirically estimates the contribution of each functional annotation to the trait of interest directly from summary association statistics while allowing for multiple causal variants at any risk locus. We devise efficient algorithms that estimate the parameters of our model across all risk loci to further increase performance. Using simulations starting from the 1000 Genomes data, we find that our framework consistently outperforms the current state-of-the-art fine-mapping methods, reducing the number of variants that need to be selected to capture 90% of the causal variants from an average of 13.3 to 10.4 SNPs per locus (as compared to the next-best performing strategy). Furthermore, we introduce a cost-to-benefit optimization framework for determining the number of variants to be followed up in functional assays and assess its performance using real and simulation data. We validate our findings using a large scale meta-analysis of four blood lipids traits and find that the relative probability for causality is increased for variants in exons and transcription start sites and decreased in repressed genomic regions at the risk loci of these traits. Using these highly predictive, trait-specific functional annotations, we estimate causality probabilities across all traits and variants, reducing the size of the 90% confidence set from an average of 17.5 to 13.5 variants per locus in this data.
Author Summary
Genome-wide association studies (GWAS) have successfully identified numerous regions in the genome that harbor genetic variants that increase risk for various complex traits and diseases. However, it is generally the case that GWAS risk variants are not themselves causally affecting the trait, but rather, are correlated to the true causal variant through linkage disequilibrium (LD). Plausible causal variants are identified in fine-mapping studies through targeted sequencing followed by prioritization of variants for functional validation. In this work, we propose methods that leverage two sources of independent information, the association strength and genomic functional location, to prioritize causal variants. We demonstrate in simulations and empirical data that our approach reduces the number of SNPs that need to be selected for follow-up to identify the true causal variants at GWAS risk loci.
PMCID: PMC4214605  PMID: 25357204
6.  The Causal Effect of Vitamin D Binding Protein (DBP) Levels on Calcemic and Cardiometabolic Diseases: A Mendelian Randomization Study 
PLoS Medicine  2014;11(10):e1001751.
In this study, Richards and colleagues undertook a Mendelian randomization study to determine whether vitamin D binding protein (DBP) levels have a causal effect on common calcemic and cardiometabolic diseases. They concluded that DBP has no demonstrable causal effect on any of the diseases or traits investigated here, except Vit D levels.
Please see later in the article for the Editors' Summary
Observational studies have shown that vitamin D binding protein (DBP) levels, a key determinant of 25-hydroxy-vitamin D (25OHD) levels, and 25OHD levels themselves both associate with risk of disease. If 25OHD levels have a causal influence on disease, and DBP lies in this causal pathway, then DBP levels should likewise be causally associated with disease. We undertook a Mendelian randomization study to determine whether DBP levels have causal effects on common calcemic and cardiometabolic disease.
Methods and Findings
We measured DBP and 25OHD levels in 2,254 individuals, followed for up to 10 y, in the Canadian Multicentre Osteoporosis Study (CaMos). Using the single nucleotide polymorphism rs2282679 as an instrumental variable, we applied Mendelian randomization methods to determine the causal effect of DBP on calcemic (osteoporosis and hyperparathyroidism) and cardiometabolic diseases (hypertension, type 2 diabetes, coronary artery disease, and stroke) and related traits, first in CaMos and then in large-scale genome-wide association study consortia. The effect allele was associated with an age- and sex-adjusted decrease in DBP level of 27.4 mg/l (95% CI 24.7, 30.0; n = 2,254). DBP had a strong observational and causal association with 25OHD levels (p = 3.2×10−19). While DBP levels were observationally associated with calcium and body mass index (BMI), these associations were not supported by causal analyses. Despite well-powered sample sizes from consortia, there were no associations of rs2282679 with any other traits and diseases: fasting glucose (0.00 mmol/l [95% CI −0.01, 0.01]; p = 1.00; n = 46,186); fasting insulin (0.01 pmol/l [95% CI −0.00, 0.01,]; p = 0.22; n = 46,186); BMI (0.00 kg/m2 [95% CI −0.01, 0.01]; p = 0.80; n = 127,587); bone mineral density (0.01 g/cm2 [95% CI −0.01, 0.03]; p = 0.36; n = 32,961); mean arterial pressure (−0.06 mm Hg [95% CI −0.19, 0.07]); p = 0.36; n = 28,775); ischemic stroke (odds ratio [OR] = 1.00 [95% CI 0.97, 1.04]; p = 0.92; n = 12,389/62,004 cases/controls); coronary artery disease (OR = 1.02 [95% CI 0.99, 1.05]; p = 0.31; n = 22,233/64,762); or type 2 diabetes (OR = 1.01 [95% CI 0.97, 1.05]; p = 0.76; n = 9,580/53,810).
DBP has no demonstrable causal effect on any of the diseases or traits investigated here, except 25OHD levels. It remains to be determined whether 25OHD has a causal effect on these outcomes independent of DBP.
Please see later in the article for the Editors' Summary
Editors' Summary
Vitamin D deficiency is an increasingly common public health concern. According to some estimates, more than a billion people worldwide may be vitamin D deficient. Indeed, many people living in the US and Europe (in particular, elderly people, breastfed infants, people with dark skin, and obese individuals) have serum (circulating) 25-hydroxy-vitamin D (25OHD) levels below 50 nmol/l, the threshold for vitamin D deficiency. Vitamin D helps the body absorb calcium, a mineral that is essential for healthy bones. Consequently, vitamin D deficiency can lead to calcemic diseases such as rickets (a condition that affects bone development in children), osteomalacia (soft bones in adults), and osteoporosis (a condition in which the bones weaken and become susceptible to fracture). We get most of our vitamin D needs from our skin, which makes vitamin D after exposure to sunlight. Vitamin D is also found naturally in oily fish and eggs, and is added to some other foods, including cereals and milk, but some people need to take vitamin D supplements to avoid vitamin D deficiency.
Why Was This Study Done?
Observational studies have reported that the low levels of serum 25OHD and serum vitamin D binding protein (DBP, a key determinant of serum 25OHD level) are both associated with the risk of several common diseases and traits. Such studies have implicated vitamin D deficiency in cardiometabolic disease (cardiovascular diseases that affect the heart and/or blood vessels and metabolic diseases that affect the cellular chemical reactions needed to sustain life), in some cancers, and in Alzheimer disease. But observational studies cannot prove that vitamin D deficiency or DBP levels actually cause any of these diseases. So, for example, an observational study might report an association between vitamin D deficiency and type 2 diabetes (a metabolic disease), but the individuals who develop type 2 diabetes might share another unknown characteristic that is actually responsible for disease development (a confounding factor). Alternatively, type 2 diabetes might reduce circulating vitamin D levels (reverse causation). Here, the researchers undertake a Mendelian randomization study to determine whether circulating DBP levels have causal effects on calcemic and cardiometabolic diseases. In Mendelian randomization, causality is inferred from associations between genetic variants that mimic the influence of a modifiable environmental exposure and the outcome of interest. Because gene variants are inherited randomly, they are not prone to confounding and are free from reverse causation. So, if low DBP levels lead to low serum 25OHD levels, and vitamin D levels have a causal effect on common diseases, genetic variants associated with low DBP levels should be associated with the development of common diseases.
What Did the Researchers Do and Find?
The researchers analyzed the association between a genetic variant called single nucleotide polymorphism (SNP) rs2282679, which is known to alter DBP levels, and calcemic and cardiometabolic diseases and related traits in 2,254 participants in the Canadian Multicentre Osteoporosis Study (CaMos). The researchers report that there was a strong association between SNP rs2282679 and both serum DBP and 25OHD levels among the CaMos participants. However, there were no significant associations (associations unlikely to have occurred by chance) between SNP rs2282679 and calcium level, osteoporosis, or several cardiometabolic diseases, including heart attacks and diabetes. Moreover, when the researchers examined publically available genome-wide association study data collected by several international consortia investigating genetic influences on disease, they found no significant associations between rs2282679 and a wide range of calcemic and cardiometabolic diseases.
What Do These Findings Mean?
In this Mendelian randomization study, DBP level had no demonstrable causal effect on any of the calcemic or cardiometabolic diseases or traits investigated, except 25OHD level. Because most of the participants in CaMos and the international consortia were of European descent, these findings are applicable only to people of European ancestry. Moreover, like all Mendelian randomization studies, the reliability of these findings depends on several assumptions made by the researchers. Notably, although this study strongly suggests that DBP level does not have a causal influence on several common diseases, it remains to be determined whether 25OHD has a causal effect on any calcemic or cardiometabolic outcomes independent of DBP level.
Additional Information
Please access these websites via the online version of this summary at
The UK National Health Service Choices website provides information about vitamin D and about how to get vitamin D from sunshine; “Behind the Headlines” articles describe a recent observational study that reported an association between vitamin D deficiency and Alzheimer disease and the media coverage of this study, other health claims made for vitamin D, and a randomized control trial that questioned the role of vitamin D in disease
The US National Institutes of Health Office of Dietary Supplements provides information about vitamin D (in English and Spanish)
The US Centers for Disease Control and Prevention provides information about the vitamin D status of the US population
MedlinePlus has links to further information about vitamin D (in English and Spanish)
Information about the Canadian Multicentre Osteoporosis Study is available
Wikipedia has a page on Mendelian randomization (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
PMCID: PMC4211663  PMID: 25350643
7.  Investigation of altering single-nucleotide polymorphism density on the power to detect trait loci and frequency of false positive in nonparametric linkage analyses of qualitative traits 
BMC Genetics  2005;6(Suppl 1):S20.
Genome-wide linkage analysis using microsatellite markers has been successful in the identification of numerous Mendelian and complex disease loci. The recent availability of high-density single-nucleotide polymorphism (SNP) maps provides a potentially more powerful option. Using the simulated and Collaborative Study on the Genetics of Alcoholism (COGA) datasets from the Genetics Analysis Workshop 14 (GAW14), we examined how altering the density of SNP marker sets impacted the overall information content, the power to detect trait loci, and the number of false positive results. For the simulated data we used SNP maps with density of 0.3 cM, 1 cM, 2 cM, and 3 cM. For the COGA data we combined the marker sets from Illumina and Affymetrix to create a map with average density of 0.25 cM and then, using a sub-sample of these markers, created maps with density of 0.3 cM, 0.6 cM, 1 cM, 2 cM, and 3 cM. For each marker set, multipoint linkage analysis using MERLIN was performed for both dominant and recessive traits derived from marker loci. Our results showed that information content increased with increased map density. For the homogeneous, completely penetrant traits we created, there was only a modest difference in ability to detect trait loci. Additionally, as map density increased there was only a slight increase in the number of false positive results when there was linkage disequilibrium (LD) between markers. The presence of LD between markers may have led to an increased number of false positive regions but no clear relationship between regions of high LD and locations of false positive linkage signals was observed.
PMCID: PMC1866766  PMID: 16451629
8.  Two-Stage Two-Locus Models in Genome-Wide Association 
PLoS Genetics  2006;2(9):e157.
Studies in model organisms suggest that epistasis may play an important role in the etiology of complex diseases and traits in humans. With the era of large-scale genome-wide association studies fast approaching, it is important to quantify whether it will be possible to detect interacting loci using realistic sample sizes in humans and to what extent undetected epistasis will adversely affect power to detect association when single-locus approaches are employed. We therefore investigated the power to detect association for an extensive range of two-locus quantitative trait models that incorporated varying degrees of epistasis. We compared the power to detect association using a single-locus model that ignored interaction effects, a full two-locus model that allowed for interactions, and, most important, two two-stage strategies whereby a subset of loci initially identified using single-locus tests were analyzed using the full two-locus model. Despite the penalty introduced by multiple testing, fitting the full two-locus model performed better than single-locus tests for many of the situations considered, particularly when compared with attempts to detect both individual loci. Using a two-stage strategy reduced the computational burden associated with performing an exhaustive two-locus search across the genome but was not as powerful as the exhaustive search when loci interacted. Two-stage approaches also increased the risk of missing interacting loci that contributed little effect at the margins. Based on our extensive simulations, our results suggest that an exhaustive search involving all pairwise combinations of markers across the genome might provide a useful complement to single-locus scans in identifying interacting loci that contribute to moderate proportions of the phenotypic variance.
Although there is growing appreciation that attempting to map genetic interactions in humans may be a fruitful endeavor, there is no consensus as to the best strategy for their detection, particularly in the case of genome-wide association where the number of potential comparisons is enormous. In this article, the authors compare the performance of four different search strategies to detect loci which interact in genome-wide association—a single-locus search, an exhaustive two-locus search, and two, two-stage procedures in which a subset of loci initially identified with single-locus tests are analyzed using a full two-locus model. Their results show that when loci interact, an exhaustive two-locus search across the genome is superior to a two-stage strategy, and in many situations can identify loci which would not have been identified solely using a single-locus search. Their findings suggest that an exhaustive search involving all pairwise combinations of markers across the genome may provide a useful complement to single-locus scans in identifying interacting loci that contribute to moderate proportions of the phenotypic variance.
PMCID: PMC1570380  PMID: 17002500
9.  Candidate Causal Regulatory Effects by Integration of Expression QTLs with Complex Trait Genetic Associations 
PLoS Genetics  2010;6(4):e1000895.
The recent success of genome-wide association studies (GWAS) is now followed by the challenge to determine how the reported susceptibility variants mediate complex traits and diseases. Expression quantitative trait loci (eQTLs) have been implicated in disease associations through overlaps between eQTLs and GWAS signals. However, the abundance of eQTLs and the strong correlation structure (LD) in the genome make it likely that some of these overlaps are coincidental and not driven by the same functional variants. In the present study, we propose an empirical methodology, which we call Regulatory Trait Concordance (RTC) that accounts for local LD structure and integrates eQTLs and GWAS results in order to reveal the subset of association signals that are due to cis eQTLs. We simulate genomic regions of various LD patterns with both a single or two causal variants and show that our score outperforms SNP correlation metrics, be they statistical (r2) or historical (D'). Following the observation of a significant abundance of regulatory signals among currently published GWAS loci, we apply our method with the goal to prioritize relevant genes for each of the respective complex traits. We detect several potential disease-causing regulatory effects, with a strong enrichment for immunity-related conditions, consistent with the nature of the cell line tested (LCLs). Furthermore, we present an extension of the method in trans, where interrogating the whole genome for downstream effects of the disease variant can be informative regarding its unknown primary biological effect. We conclude that integrating cellular phenotype associations with organismal complex traits will facilitate the biological interpretation of the genetic effects on these traits.
Author Summary
Genome-wide association studies have led to the identification of susceptibility loci for a variety of human complex traits. What is still largely missing, however, is the understanding of the biological context in which these candidate variants act and of how they determine each trait. Given the localization of many GWAS loci outside coding regions and the important role of regulatory variation in shaping phenotypic variance, gene expression has been proposed as a plausible informative intermediate phenotype. Here we show that for a subset of the currently published GWAS this is indeed the case, by observing a significant excess of regulatory variants among disease loci. We propose an empirical methodology (regulatory trait concordance—RTC) able to integrate expression and disease data in order to detect causal regulatory effects. We show that the RTC outperforms simple correlation metrics under various simulated linkage disequilibrium (LD) scenarios. Our method is able to recover previously suspected causal regulatory effects from the literature and, as expected given the nature of the tested tissue, an overrepresentation of immunity-related candidates is observed. As the number of available tissues will increase, this prioritization approach will become even more useful in understanding the implication of regulatory variants in disease etiology.
PMCID: PMC2848550  PMID: 20369022
10.  Identifying the genetic determinants of transcription factor activity 
Genome-wide messenger RNA expression levels are highly heritable. However, the molecular mechanisms underlying this heritability are poorly understood.The influence of trans-acting polymorphisms is often mediated by changes in the regulatory activity of one or more sequence-specific transcription factors (TFs). We use a method that exploits prior information about the DNA-binding specificity of each TF to estimate its genotype-specific regulatory activity. To this end, we perform linear regression of genotype-specific differential mRNA expression on TF-specific promoter-binding affinity.Treating inferred TF activity as a quantitative trait and mapping it across a panel of segregants from an experimental genetic cross allows us to identify trans-acting loci (‘aQTLs') whose allelic variation modulates the TF. A few of these aQTL regions contain the gene encoding the TF itself; several others contain a gene whose protein product is known to interact with the TF.Our method is strictly causal, as it only uses sequence-based features as predictors. Application to budding yeast demonstrates a dramatic increase in statistical power, compared with existing methods, to detect locus-TF associations and trans-acting loci. Our aQTL mapping strategy also succeeds in mouse.
Genetic sequence variation naturally perturbs mRNA expression levels in the cell. In recent years, analysis of parallel genotyping and expression profiling data for segregants from genetic crosses between parental strains has revealed that mRNA expression levels are highly heritable. Expression quantitative trait loci (eQTLs), whose allelic variation regulates the expression level of individual genes, have successfully been identified (Brem et al, 2002; Schadt et al, 2003). The molecular mechanisms underlying the heritability of mRNA expression are poorly understood. However, they are likely to involve mediation by transcription factors (TFs). We present a new transcription-factor-centric method that greatly increases our ability to understand what drives the genetic variation in mRNA expression (Figure 1). Our method identifies genomic loci (‘aQTLs') whose allelic variation modulates the protein-level activity of specific TFs. To map aQTLs, we integrate genotyping and expression profiling data with quantitative prior information about DNA-binding specificity of transcription factors in the form of position-specific affinity matrices (Bussemaker et al, 2007). We applied our method in two different organisms: budding yeast and mouse.
In our approach, the inferred TF activity is explicitly treated as a quantitative trait, and genetically mapped. The decrease of ‘phenotype space' from that of all genes (in the eQTL approach) to that of all TFs (in our aQTL approach) increases the statistical power to detect trans-acting loci in two distinct ways. First, as each inferred TF activity is derived from a large number of genes, it is far less noisy than mRNA levels of individual genes. Second, the number of trait/marker combinations that needs to be tested for statistical significance in parallel is roughly two orders of magnitude smaller than for eQTLs. We identified a total of 103 locus-TF associations, a more than six-fold improvement over the 17 locus-TF associations identified by several existing methods (Brem et al, 2002; Yvert et al, 2003; Lee et al, 2006; Smith and Kruglyak, 2008; Zhu et al, 2008). The total number of distinct genomic loci identified as an aQTL equals 31, which includes 11 of the 13 previously identified eQTL hotspots (Smith and Kruglyak, 2008).
To better understand the mechanisms underlying the identified genetic linkages, we examined the genes within each aQTL region. First, we found four ‘local' aQTLs, which encompass the gene encoding the TF itself. This includes the known polymorphism in the HAP1 gene (Brem et al, 2002), but also novel predictions of trans-acting polymorphisms in RFX1, STB5, and HAP4. Second, using high-throughput protein–protein interaction data, we identified putative causal genes for several aQTLs. For example, we predict that a polymorphism in the cyclin-dependent kinase CDC28 antagonistically modulates the functionally distinct cell cycle regulators Fkh1 and Fkh2. In this and other cases, our approach naturally accounts for post-translational modulation of TF activity at the protein level.
We validated our ability to predict locus-TF associations in yeast using gene expression profiles of allele replacement strains from a previous study (Smith and Kruglyak, 2008). Chromosome 15 contains an aQTL whose allelic status influences the activity of no fewer than 30 distinct TFs. This locus includes IRA2, which controls intracellular cAMP levels. We used the gene expression profile of IRA2 replacement strains to confirm that the polymorphism within IRA2 indeed modulates a subset of the TFs whose activity was predicted to link to this locus, and no other TFs.
Application of our approach to mouse data identified an aQTL modulating the activity of a specific TF in liver cells. We identified an aQTL on mouse chromosome 7 for Zscan4, a transcription factor containing four zinc finger domains and a SCAN domain. Even though we could not detect a candidate causal gene for Zscan4p because of lack of information about the mouse genome, our result demonstrates that our method also works in higher eukaryotes.
In summary, aQTL mapping has a greatly improved sensitivity to detect molecular mechanisms underlying the heritability of gene expression. The successful application of our approach to yeast and mouse data underscores the value of explicitly treating the inferred TF activity as a quantitative trait for increasing statistical power of detecting trans-acting loci. Furthermore, our method is computationally efficient, and easily applicable to any other organism whenever prior information about the DNA-binding specificity of TFs is available.
Analysis of parallel genotyping and expression profiling data has shown that mRNA expression levels are highly heritable. Currently, only a tiny fraction of this genetic variance can be mechanistically accounted for. The influence of trans-acting polymorphisms on gene expression traits is often mediated by transcription factors (TFs). We present a method that exploits prior knowledge about the in vitro DNA-binding specificity of a TF in order to map the loci (‘aQTLs') whose inheritance modulates its protein-level regulatory activity. Genome-wide regression of differential mRNA expression on predicted promoter affinity is used to estimate segregant-specific TF activity, which is subsequently mapped as a quantitative phenotype. In budding yeast, our method identifies six times as many locus-TF associations and more than twice as many trans-acting loci as all existing methods combined. Application to mouse data from an F2 intercross identified an aQTL on chromosome VII modulating the activity of Zscan4 in liver cells. Our method has greatly improved statistical power over existing methods, is mechanism based, strictly causal, computationally efficient, and generally applicable.
PMCID: PMC2964119  PMID: 20865005
gene expression; gene regulatory networks; genetic variation; quantitative trait loci; transcription factors
11.  Multifactor-dimensionality reduction versus family-based association tests in detecting susceptibility loci in discordant sib-pair studies 
BMC Genetics  2005;6(Suppl 1):S146.
Complex diseases are generally thought to be under the influence of multiple, and possibly interacting, genes. Many association methods have been developed to identify susceptibility genes assuming a single-gene disease model, referred to as single-locus methods. Multilocus methods consider joint effects of multiple genes and environmental factors. One commonly used method for family-based association analysis is implemented in FBAT. The multifactor-dimensionality reduction method (MDR) is a multilocus method, which identifies multiple genetic loci associated with the occurrence of complex disease. Many studies of late onset complex diseases employ a discordant sib pairs design. We compared the FBAT and MDR in their ability to detect susceptibility loci using a discordant sib-pair dataset generated from the simulated data made available to participants in the Genetic Analysis Workshop 14. Using FBAT, we were able to identify the effect of one susceptibility locus. However, the finding was not statistically significant. We were not able to detect any of the interactions using this method. This is probably because the FBAT test is designed to find loci with major effects, not interactions. Using MDR, the best result we obtained identified two interactions. However, neither of these reached a level of statistical significance. This is mainly due to the heterogeneity of the disease trait and noise in the data.
PMCID: PMC1866789  PMID: 16451606
12.  Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics 
PLoS Genetics  2014;10(5):e1004383.
Genetic association studies, in particular the genome-wide association study (GWAS) design, have provided a wealth of novel insights into the aetiology of a wide range of human diseases and traits, in particular cardiovascular diseases and lipid biomarkers. The next challenge consists of understanding the molecular basis of these associations. The integration of multiple association datasets, including gene expression datasets, can contribute to this goal. We have developed a novel statistical methodology to assess whether two association signals are consistent with a shared causal variant. An application is the integration of disease scans with expression quantitative trait locus (eQTL) studies, but any pair of GWAS datasets can be integrated in this framework. We demonstrate the value of the approach by re-analysing a gene expression dataset in 966 liver samples with a published meta-analysis of lipid traits including >100,000 individuals of European ancestry. Combining all lipid biomarkers, our re-analysis supported 26 out of 38 reported colocalisation results with eQTLs and identified 14 new colocalisation results, hence highlighting the value of a formal statistical test. In three cases of reported eQTL-lipid pairs (SYPL2, IFT172, TBKBP1) for which our analysis suggests that the eQTL pattern is not consistent with the lipid association, we identify alternative colocalisation results with SORT1, GCKR, and KPNB1, indicating that these genes are more likely to be causal in these genomic intervals. A key feature of the method is the ability to derive the output statistics from single SNP summary statistics, hence making it possible to perform systematic meta-analysis type comparisons across multiple GWAS datasets (implemented online at Our methodology provides information about candidate causal genes in associated intervals and has direct implications for the understanding of complex diseases as well as the design of drugs to target disease pathways.
Author Summary
Genome-wide association studies (GWAS) have found a large number of genetic regions (“loci”) affecting clinical end-points and phenotypes, many outside coding intervals. One approach to understanding the biological basis of these associations has been to explore whether GWAS signals from intermediate cellular phenotypes, in particular gene expression, are located in the same loci (“colocalise”) and are potentially mediating the disease signals. However, it is not clear how to assess whether the same variants are responsible for the two GWAS signals or whether it is distinct causal variants close to each other. In this paper, we describe a statistical method that can use simply single variant summary statistics to test for colocalisation of GWAS signals. We describe one application of our method to a meta-analysis of blood lipids and liver expression, although any two datasets resulting from association studies can be used. Our method is able to detect the subset of GWAS signals explained by regulatory effects and identify candidate genes affected by the same GWAS variants. As summary GWAS data are increasingly available, applications of colocalisation methods to integrate the findings will be essential for functional follow-up, and will also be particularly useful to identify tissue specific signals in eQTL datasets.
PMCID: PMC4022491  PMID: 24830394
13.  The genetic study of three population microisolates in South Tyrol (MICROS): study design and epidemiological perspectives 
BMC Medical Genetics  2007;8:29.
There is increasing evidence of the important role that small, isolated populations could play in finding genes involved in the etiology of diseases. For historical and political reasons, South Tyrol, the northern most Italian region, includes several villages of small dimensions which remained isolated over the centuries.
The MICROS study is a population-based survey on three small, isolated villages, characterized by: old settlement; small number of founders; high endogamy rates; slow/null population expansion. During the stage-1 (2002/03) genealogical data, screening questionnaires, clinical measurements, blood and urine samples, and DNA were collected for 1175 adult volunteers. Stage-2, concerning trait diagnoses, linkage analysis and association studies, is ongoing. The selection of the traits is being driven by expert clinicians. Preliminary, descriptive statistics were obtained. Power simulations for finding linkage on a quantitative trait locus (QTL) were undertaken.
Starting from participants, genealogies were reconstructed for 50,037 subjects, going back to the early 1600s. Within the last five generations, subjects were clustered in one pedigree of 7049 subjects plus 178 smaller pedigrees (3 to 85 subjects each). A significant probability of familial clustering was assessed for many traits, especially among the cardiovascular, neurological and respiratory traits. Simulations showed that the MICROS pedigree has a substantial power to detect a LOD score ≥ 3 when the QTL specific heritability is ≥ 20%.
The MICROS study is an extensive, ongoing, two-stage survey aimed at characterizing the genetic epidemiology of Mendelian and complex diseases. Our approach, involving different scientific disciplines, is an advantageous strategy to define and to study population isolates. The isolation of the Alpine populations, together with the extensive data collected so far, make the MICROS study a powerful resource for the study of diseases in many fields of medicine. Recent successes and simulation studies give us confidence that our pedigrees can be valuable both in finding new candidates loci and to confirm existing candidate genes.
PMCID: PMC1913911  PMID: 17550581
14.  Incorporating Single-Locus Tests into Haplotype Cladistic Analysis in Case-Control Studies 
PLoS Genetics  2007;3(3):e46.
In case-control studies, genetic associations for complex diseases may be probed either with single-locus tests or with haplotype-based tests. Although there are different views on the relative merits and preferences of the two test strategies, haplotype-based analyses are generally believed to be more powerful to detect genes with modest effects. However, a main drawback of haplotype-based association tests is the large number of distinct haplotypes, which increases the degrees of freedom for corresponding test statistics and thus reduces the statistical power. To decrease the degrees of freedom and enhance the efficiency and power of haplotype analysis, we propose an improved haplotype clustering method that is based on the haplotype cladistic analysis developed by Durrant et al. In our method, we attempt to combine the strengths of single-locus analysis and haplotype-based analysis into one single test framework. Novel in our method is that we develop a more informative haplotype similarity measurement by using p-values obtained from single-locus association tests to construct a measure of weight, which to some extent incorporates the information of disease outcomes. The weights are then used in computation of similarity measures to construct distance metrics between haplotype pairs in haplotype cladistic analysis. To assess our proposed new method, we performed simulation analyses to compare the relative performances of (1) conventional haplotype-based analysis using original haplotype, (2) single-locus allele-based analysis, (3) original haplotype cladistic analysis (CLADHC) by Durrant et al., and (4) our weighted haplotype cladistic analysis method, under different scenarios. Our weighted cladistic analysis method shows an increased statistical power and robustness, compared with the methods of haplotype cladistic analysis, single-locus test, and the traditional haplotype-based analyses. The real data analyses also show that our proposed method has practical significance in the human genetics field.
Author Summary
Methods of haplotype-based analysis and single-locus analysis are widely used in genetic association studies. There is no consensus as to the best strategy for the performance of the two methods. Although haplotype-based analysis is a powerful tool, the large number of distinct haplotypes may reduce its efficiency. Haplotype clustering analysis is a promising way of decreasing haplotype dimensionality. A potential limitation of many existing clustering methods is that they do not allow the clustering to adapt to the position of the underlying trait locus. In this study, we proposed a weighted haplotype cladistic analysis method by incorporating a single-locus test into haplotype clustering. Under this framework, relationships between single loci and the disease outcomes can be considered when creating the hierarchical tree of haplotypes. The extensive simulations show that our method is robust against varied simulation conditions and is more powerful than either the original unweighted cladistic analysis method or single-locus analysis methods in case-control studies. Our hybrid method combining haplotype-based and single-locus analyses can be readily extended to whole genome association studies.
PMCID: PMC1829402  PMID: 17381242
15.  Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network 
PLoS Genetics  2009;5(8):e1000587.
Many complex disease syndromes, such as asthma, consist of a large number of highly related, rather than independent, clinical or molecular phenotypes. This raises a new technical challenge in identifying genetic variations associated simultaneously with correlated traits. In this study, we propose a new statistical framework called graph-guided fused lasso (GFlasso) to directly and effectively incorporate the correlation structure of multiple quantitative traits such as clinical metrics and gene expressions in association analysis. Our approach represents correlation information explicitly among the quantitative traits as a quantitative trait network (QTN) and then leverages this network to encode structured regularization functions in a multivariate regression model over the genotypes and traits. The result is that the genetic markers that jointly influence subgroups of highly correlated traits can be detected jointly with high sensitivity and specificity. While most of the traditional methods examined each phenotype independently and combined the results afterwards, our approach analyzes all of the traits jointly in a single statistical framework. This allows our method to borrow information across correlated phenotypes to discover the genetic markers that perturb a subset of the correlated traits synergistically. Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits. We found that our method showed an increased power in detecting causal variants affecting correlated traits. Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.
Author Summary
An association study examines a phenotype against genotypic variations over a large set of individuals in order to find the genetic variant that gives rise to the variation in the phenotype. Many complex disease syndromes consist of a large number of highly related clinical phenotypes, and the patient cohorts are routinely surveyed with a large number of traits, such as hundreds of clinical phenotypes and genome-wide profiling of thousands of gene expressions, many of which are correlated. However, most of the conventional approaches for association mapping or eQTL analysis consider a single phenotype at a time instead of taking advantage of the relatedness of traits by analyzing them jointly. Assuming that a group of tightly correlated traits may share a common genetic basis, in this paper, we present a new framework for association analysis that searches for genetic variations influencing a group of correlated traits. We explicitly represent the correlation information in multiple quantitative traits as a quantitative trait network and directly incorporate this network information to scan the genome for association. Our results on simulated and asthma data show that our approach has a significant advantage in detecting associations when a genetic marker perturbs synergistically a group of traits.
PMCID: PMC2719086  PMID: 19680538
16.  Genome-Wide Interaction-Based Association Analysis Identified Multiple New Susceptibility Loci for Common Diseases 
PLoS Genetics  2011;7(3):e1001338.
Genome-wide interaction-based association (GWIBA) analysis has the potential to identify novel susceptibility loci. These interaction effects could be missed with the prevailing approaches in genome-wide association studies (GWAS). However, no convincing loci have been discovered exclusively from GWIBA methods, and the intensive computation involved is a major barrier for application. Here, we developed a fast, multi-thread/parallel program named “pair-wise interaction-based association mapping” (PIAM) for exhaustive two-locus searches. With this program, we performed a complete GWIBA analysis on seven diseases with stringent control for false positives, and we validated the results for three of these diseases. We identified one pair-wise interaction between a previously identified locus, C1orf106, and one new locus, TEC, that was specific for Crohn's disease, with a Bonferroni corrected P<0.05 (P = 0.039). This interaction was replicated with a pair of proxy linked loci (P = 0.013) on an independent dataset. Five other interactions had corrected P<0.5. We identified the allelic effect of a locus close to SLC7A13 for coronary artery disease. This was replicated with a linked locus on an independent dataset (P = 1.09×10−7). Through a local validation analysis that evaluated association signals, rather than locus-based associations, we found that several other regions showed association/interaction signals with nominal P<0.05. In conclusion, this study demonstrated that the GWIBA approach was successful for identifying novel loci, and the results provide new insights into the genetic architecture of common diseases. In addition, our PIAM program was capable of handling very large GWAS datasets that are likely to be produced in the future.
Author Summary
Recent studies on the genetic basis of common diseases have identified many loci that confer disease susceptibility. However, much of the heritability of these diseases remains unexplained. Loci involved in gene–gene interactions are considered cryptic, because they confer susceptibility, but may not generate a detectable signal on their own. These interactions may account for the “missing heritability” of common diseases. Theoretically, these interactions can be identified with the genome-wide interaction-based association analysis. But, in reality, very few gene–gene interactions have been identified with that method, and most were based on prior biological knowledge. Here, we applied a parallel computing technique that facilitated the identification of multiple new cryptic susceptibility loci involved in common diseases. We applied stringent control for false positives, and we validated our findings with independent datasets. This study demonstrated that interactions between gene loci could be successfully identified with the genome-wide interaction-based approach. With this approach, we also identified cryptic loci with moderate single-locus effects. The identified loci and interactions merit further investigations for fine mapping and functional analyses. Our results extend the current knowledge of common diseases for future studies in genetic mapping. This approach is applicable to current and future genome-wide association datasets.
PMCID: PMC3060075  PMID: 21437271
17.  Transmission ratio distortion in families from the Framingham Heart Study 
BMC Genetics  2003;4(Suppl 1):S48.
One implicit assumption in most linkage analysis is that live-born siblings unselected for a phenotype do not share alleles greater than the Mendelian expectation at any particular locus. However, since most families are recruited for genetic studies because of the presence of disease, there is little data available to confirm that this is the case. We hypothesized that loci that behave in a non-Mendelian fashion could be identified using genotype data from the Framingham Heart Study families. We tested the hypothesis that live-born sibs, either stratified by or irrespective of gender, demonstrate excess sharing of alleles on the autosomes, i.e., transmission ratio distortion. Multipoint linkage analysis of siblings either according to gender or not was performed using an allele-sharing method. Such observations may have implications for the mapping of loci for complex disease and quantitative traits in human pedigrees.
No results that reached genome-wide significance were observed. However, four regions demonstrated excess sharing of alleles at p < 0.002 when sibships were stratified by gender-three of which were present in males. Of note, a female-specific locus co-localized with region that is linked to mean systolic blood pressure in the same families. In addition, three other regions demonstrated excess sharing of alleles in sibships irrespective of gender, including a region on chromosome 10p14-p15 (p = 7.5 × 10-4).
Although no loci meeting genome-wide significance were detected to demonstrate transmission ratio distortion, loci with suggestive evidence for linkage were detected. These may have implications for the mapping of susceptibility loci for complex disease in human pedigrees.
PMCID: PMC1866484  PMID: 14975116
18.  A 2-step strategy for detecting pleiotropic effects on multiple longitudinal traits 
Frontiers in Genetics  2014;5:357.
Genetic pleiotropy refers to the situation in which a single gene influences multiple traits and so it is considered as a major factor that underlies genetic correlation among traits. To identify pleiotropy, an important focus in genome-wide association studies (GWAS) is on finding genetic variants that are simultaneously associated with multiple traits. On the other hand, longitudinal designs are often employed in many complex disease studies, such that, traits are measured repeatedly over time within the same subject. Performing genetic association analysis simultaneously on multiple longitudinal traits for detecting pleiotropic effects is interesting but challenging. In this paper, we propose a 2-step method for simultaneously testing the genetic association with multiple longitudinal traits. In the first step, a mixed effects model is used to analyze each longitudinal trait. We focus on estimation of the random effect that accounts for the subject-specific genetic contribution to the trait; fixed effects of other confounding covariates are also estimated. This first step enables separation of the genetic effect from other confounding effects for each subject and for each longitudinal trait. Then in the second step, we perform a simultaneous association test on multiple estimated random effects arising from multiple longitudinal traits. The proposed method can efficiently detect pleiotropic effects on multiple longitudinal traits and can flexibly handle traits of different data types such as quantitative, binary, or count data. We apply this method to analyze the 16th Genetic Analysis Workshop (GAW16) Framingham Heart Study (FHS) data. A simulation study is also conducted to validate this 2-step method and evaluate its performance.
PMCID: PMC4202779  PMID: 25368629
pleiotropic effect; genetic association; multiple traits; longitudinal data; mixed effects model; single nucleotide polymorphisms (SNPs)
19.  Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering 
BMC Bioinformatics  2014;15:102.
Taking the advan tage of high-throughput single nucleotide polymorphism (SNP) genotyping technology, large genome-wide association studies (GWASs) have been considered to hold promise for unravelling complex relationships between genotype and phenotype. At present, traditional single-locus-based methods are insufficient to detect interactions consisting of multiple-locus, which are broadly existing in complex traits. In addition, statistic tests for high order epistatic interactions with more than 2 SNPs propose computational and analytical challenges because the computation increases exponentially as the cardinality of SNPs combinations gets larger.
In this paper, we provide a simple, fast and powerful method using dynamic clustering and cloud computing to detect genome-wide multi-locus epistatic interactions. We have constructed systematic experiments to compare powers performance against some recently proposed algorithms, including TEAM, SNPRuler, EDCF and BOOST. Furthermore, we have applied our method on two real GWAS datasets, Age-related macular degeneration (AMD) and Rheumatoid arthritis (RA) datasets, where we find some novel potential disease-related genetic factors which are not shown up in detections of 2-loci epistatic interactions.
Experimental results on simulated data demonstrate that our method is more powerful than some recently proposed methods on both two- and three-locus disease models. Our method has discovered many novel high-order associations that are significantly enriched in cases from two real GWAS datasets. Moreover, the running time of the cloud implementation for our method on AMD dataset and RA dataset are roughly 2 hours and 50 hours on a cluster with forty small virtual machines for detecting two-locus interactions, respectively. Therefore, we believe that our method is suitable and effective for the full-scale analysis of multiple-locus epistatic interactions in GWAS.
PMCID: PMC4021249  PMID: 24717145
Cloud computing; Genome-wide association studies; Dynamic clustering
20.  Quantifying Missing Heritability at Known GWAS Loci 
PLoS Genetics  2013;9(12):e1003993.
Recent work has shown that much of the missing heritability of complex traits can be resolved by estimates of heritability explained by all genotyped SNPs. However, it is currently unknown how much heritability is missing due to poor tagging or additional causal variants at known GWAS loci. Here, we use variance components to quantify the heritability explained by all SNPs at known GWAS loci in nine diseases from WTCCC1 and WTCCC2. After accounting for expectation, we observed all SNPs at known GWAS loci to explain more heritability than GWAS-associated SNPs on average (). For some diseases, this increase was individually significant: for Multiple Sclerosis (MS) () and for Crohn's Disease (CD) (); all analyses of autoimmune diseases excluded the well-studied MHC region. Additionally, we found that GWAS loci from other related traits also explained significant heritability. The union of all autoimmune disease loci explained more MS heritability than known MS SNPs () and more CD heritability than known CD SNPs (), with an analogous increase for all autoimmune diseases analyzed. We also observed significant increases in an analysis of Rheumatoid Arthritis (RA) samples typed on ImmunoChip, with more heritability from all SNPs at GWAS loci () and more heritability from all autoimmune disease loci () compared to known RA SNPs (including those identified in this cohort). Our methods adjust for LD between SNPs, which can bias standard estimates of heritability from SNPs even if all causal variants are typed. By comparing adjusted estimates, we hypothesize that the genome-wide distribution of causal variants is enriched for low-frequency alleles, but that causal variants at known GWAS loci are skewed towards common alleles. These findings have important ramifications for fine-mapping study design and our understanding of complex disease architecture.
Author Summary
Heritable diseases have an unknown underlying “genetic architecture” that defines the distribution of effect-sizes for disease-causing mutations. Understanding this genetic architecture is an important first step in designing disease-mapping studies, and many theories have been developed on the nature of this distribution. Here, we evaluate the hypothesis that additional heritable variation lies at previously known associated loci but is not fully explained by the single most associated marker. We develop methods based on variance-components analysis to quantify this type of “local” heritability, demonstrating that standard strategies can be falsely inflated or deflated due to correlation between neighboring markers and propose a robust adjustment. In analysis of nine common diseases we find a significant average increase of local heritability, consistent with multiple common causal variants at an average locus. Intriguingly, for autoimmune diseases we also observe significant local heritability in loci not associated with the specific disease but with other autoimmune diseases, implying a highly correlated underlying disease architecture. These findings have important implications to the design of future studies and our general understanding of common disease.
PMCID: PMC3873246  PMID: 24385918
21.  A logistic mixture model for a family-based association study 
BMC Proceedings  2007;1(Suppl 1):S44.
A family-based association study design is not only able to localize causative genes more precisely than linkage analysis, but it also helps explain the genetic mechanism underlying the trait under study. Therefore, it can be used to follow up an initial linkage scan. For an association study of binary traits in general pedigrees, we propose a logistic mixture model that regresses the trait value on the genotypic values of markers under investigation and other covariates such as environmental factors. We first tested both the validity and power of the new model by simulating nuclear families inheriting a simple Mendelian trait. It is powerful when the correct disease model is specified and shows much loss of power when the dominance of a model is inversely specified, i.e., a dominant model is wrongly specified as recessive or vice versa. We then applied the new model to the Genetic Analysis Workshop (GAW) 15 simulation data to test the performance of the model when adjusting for covariates in the case of complex traits. Adjusting for the covariate that interacts with disease loci improves the power to detect association. The simplest version of the model only takes monogenic inheritance into account, but analysis of the GAW simulation data shows that even this simple model can be powerful for complex traits.
PMCID: PMC2359869  PMID: 18466543
22.  Integrating Genome-Wide Genetic Variations and Monocyte Expression Data Reveals Trans-Regulated Gene Modules in Humans 
PLoS Genetics  2011;7(12):e1002367.
One major expectation from the transcriptome in humans is to characterize the biological basis of associations identified by genome-wide association studies. So far, few cis expression quantitative trait loci (eQTLs) have been reliably related to disease susceptibility. Trans-regulating mechanisms may play a more prominent role in disease susceptibility. We analyzed 12,808 genes detected in at least 5% of circulating monocyte samples from a population-based sample of 1,490 European unrelated subjects. We applied a method of extraction of expression patterns—independent component analysis—to identify sets of co-regulated genes. These patterns were then related to 675,350 SNPs to identify major trans-acting regulators. We detected three genomic regions significantly associated with co-regulated gene modules. Association of these loci with multiple expression traits was replicated in Cardiogenics, an independent study in which expression profiles of monocytes were available in 758 subjects. The locus 12q13 (lead SNP rs11171739), previously identified as a type 1 diabetes locus, was associated with a pattern including two cis eQTLs, RPS26 and SUOX, and 5 trans eQTLs, one of which (MADCAM1) is a potential candidate for mediating T1D susceptibility. The locus 12q24 (lead SNP rs653178), which has demonstrated extensive disease pleiotropy, including type 1 diabetes, hypertension, and celiac disease, was associated to a pattern strongly correlating to blood pressure level. The strongest trans eQTL in this pattern was CRIP1, a known marker of cellular proliferation in cancer. The locus 12q15 (lead SNP rs11177644) was associated with a pattern driven by two cis eQTLs, LYZ and YEATS4, and including 34 trans eQTLs, several of them tumor-related genes. This study shows that a method exploiting the structure of co-expressions among genes can help identify genomic regions involved in trans regulation of sets of genes and can provide clues for understanding the mechanisms linking genome-wide association loci to disease.
Author Summary
One major expectation from the transcriptome in humans is to help characterize the biological basis of associations identified by genome-wide association studies. Here, we take advantage of recent technical and methodological advances to examine the influence of natural genetic variability on >12,000 genes expressed in the monocyte, a blood cell playing a key role in immunity-related disorders and atherosclerosis. By examining 1,490 European population-based subjects, we identify three regions of the genome reproducibly associated with specific patterns of gene expression. Two of these regions overlap genetic variants previously known to be involved in the susceptibility to type 1 diabetes, celiac disease, and hypertension. Genes whose expression is modulated by these genetic variants may act as mediators in the causal relationship linking the variability of the genome to complex disease. These findings illustrate how integration of genetic and transcriptomic data at an epidemiological scale can help decipher the genetic basis of complex diseases.
PMCID: PMC3228821  PMID: 22144904
23.  A Copy Number Variant at the KITLG Locus Likely Confers Risk for Canine Squamous Cell Carcinoma of the Digit 
PLoS Genetics  2013;9(3):e1003409.
The domestic dog is a robust model for studying the genetics of complex disease susceptibility. The strategies used to develop and propagate modern breeds have resulted in an elevated risk for specific diseases in particular breeds. One example is that of Standard Poodles (STPOs), who have increased risk for squamous cell carcinoma of the digit (SCCD), a locally aggressive cancer that causes lytic bone lesions, sometimes with multiple toe recurrence. However, only STPOs of dark coat color are at high risk; light colored STPOs are almost entirely unaffected, suggesting that interactions between multiple pathways are necessary for oncogenesis. We performed a genome-wide association study (GWAS) on STPOs, comparing 31 SCCD cases to 34 unrelated black STPO controls. The peak SNP on canine chromosome 15 was statistically significant at the genome-wide level (Praw = 1.60×10−7; Pgenome = 0.0066). Additional mapping resolved the region to the KIT Ligand (KITLG) locus. Comparison of STPO cases to other at-risk breeds narrowed the locus to a 144.9-Kb region. Haplotype mapping among 84 STPO cases identified a minimal region of 28.3 Kb. A copy number variant (CNV) containing predicted enhancer elements was found to be strongly associated with SCCD in STPOs (P = 1.72×10−8). Light colored STPOs carry the CNV risk alleles at the same frequency as black STPOs, but are not susceptible to SCCD. A GWAS comparing 24 black and 24 light colored STPOs highlighted only the MC1R locus as significantly different between the two datasets, suggesting that a compensatory mutation within the MC1R locus likely protects light colored STPOs from disease. Our findings highlight a role for KITLG in SCCD susceptibility, as well as demonstrate that interactions between the KITLG and MC1R loci are potentially required for SCCD oncogenesis. These findings highlight how studies of breed-limited diseases are useful for disentangling multigene disorders.
Author Summary
Domesticated dogs offer a unique mechanism for disentangling complex genetic traits, such as cancer. Over 300 breeds exist worldwide, each selected for particular morphologic and behavioral traits. Unfortunately the breeding programs used to generate such diversity are associated with breed-specific increase in disease. Squamous cell carcinoma of the digit (SCCD) is a locally aggressive cancer that causes lytic bone lesions and, occasionally, death. Among the breeds with the highest risk is the Standard Poodle (STPO), where the disease is found only in dark-coated dogs. We show that the KITLG locus is highly associated with SCCD and that a 5.7-Kb copy number variant is likely causative for the disease when in an expanded form. Interestingly, light-colored STPO carry the putative causal variant at the same frequency as black STPOs, but are protected from SCCD. We show this is likely due to a compensatory mutation in the well-known coat color locus, MC1R. This work demonstrates the utility of dog breeds for understanding the genetic causes of complex diseases of interest to both human and animal health.
PMCID: PMC3610924  PMID: 23555311
24.  Application of a New Method for GWAS in a Related Case/Control Sample with Known Pedigree Structure: Identification of New Loci for Nephrolithiasis 
PLoS Genetics  2011;7(1):e1001281.
In contrast to large GWA studies based on thousands of individuals and large meta-analyses combining GWAS results, we analyzed a small case/control sample for uric acid nephrolithiasis. Our cohort of closely related individuals is derived from a small, genetically isolated village in Sardinia, with well-characterized genealogical data linking the extant population up to the 16th century. It is expected that the number of risk alleles involved in complex disorders is smaller in isolated founder populations than in more diverse populations, and the power to detect association with complex traits may be increased when related, homogeneous affected individuals are selected, as they are more likely to be enriched with and share specific risk variants than are unrelated, affected individuals from the general population. When related individuals are included in an association study, correlations among relatives must be accurately taken into account to ensure validity of the results. A recently proposed association method uses an empirical genotypic covariance matrix estimated from genome-screen data to allow for additional population structure and cryptic relatedness that may not be captured by the genealogical data. We apply the method to our data, and we also investigate the properties of the method, as well as other association methods, in our highly inbred population, as previous applications were to outbred samples. The more promising regions identified in our initial study in the genetic isolate were then further investigated in an independent sample collected from the Italian population. Among the loci that showed association in this study, we observed evidence of a possible involvement of the region encompassing the gene LRRC16A, already associated to serum uric acid levels in a large meta-analysis of 14 GWAS, suggesting that this locus might lead a pathway for uric acid metabolism that may be involved in gout as well as in nephrolithiasis.
Author Summary
There are a number of factors that contribute to renal stone formation, including diet and obesity, specific drugs, other diseases, climate changes, metabolic disorders, and genetic predisposition. In this article, we focus on identifying genomic regions that may be involved with nephrolithiasis associated with a uric acid component. We analyze data from a genetic isolate in Sardinia to take advantage of the potential improvement in power to detect association with complex traits when related, homogeneous affected individuals are selected. To take into account the correlations among our related sample of cases and controls, we applied a recently proposed method that corrects for both known and unknown population and pedigree structure using genome-wide data. In simulation studies for outbred populations with related individuals and population structure, the method has been demonstrated to provide a substantial improvement over a number of existing methods in terms of power and type 1 error. We investigate the properties of this new method, as well as other association methods, in our inbred sample. To our knowledge, this is the first application of this recently proposed method to a founder population. This study is also the first genome-wide association study carried out for uric acid nephrolithiasis.
PMCID: PMC3024262  PMID: 21283782
25.  Gene-Based Tests of Association 
PLoS Genetics  2011;7(7):e1002177.
Genome-wide association studies (GWAS) are now used routinely to identify SNPs associated with complex human phenotypes. In several cases, multiple variants within a gene contribute independently to disease risk. Here we introduce a novel Gene-Wide Significance (GWiS) test that uses greedy Bayesian model selection to identify the independent effects within a gene, which are combined to generate a stronger statistical signal. Permutation tests provide p-values that correct for the number of independent tests genome-wide and within each genetic locus. When applied to a dataset comprising 2.5 million SNPs in up to 8,000 individuals measured for various electrocardiography (ECG) parameters, this method identifies more validated associations than conventional GWAS approaches. The method also provides, for the first time, systematic assessments of the number of independent effects within a gene and the fraction of disease-associated genes housing multiple independent effects, observed at 35%–50% of loci in our study. This method can be generalized to other study designs, retains power for low-frequency alleles, and provides gene-based p-values that are directly compatible for pathway-based meta-analysis.
Author Summary
Genome-wide association studies (GWAS) have successfully identified genetic variants associated with complex human phenotypes. Despite a proliferation of analysis methods, most studies rely on simple, robust SNP–by–SNP univariate tests with ever-larger population sizes. Here we introduce a new test motivated by the biological hypothesis that a single gene may contain multiple variants that contribute independently to a trait. Applied to simulated phenotypes with real genotypes, our new method, Gene-Wide Significance (GWiS), has better power to identify true associations than traditional univariate methods, previous Bayesian methods, popular L1 regularized (LASSO) multivariate regression, and other approaches. GWiS retains power for low-frequency alleles that are increasingly important for personal genetics, and it is the only method tested that accurately estimates the number of independent effects within a gene. When applied to human data for multiple ECG traits, GWiS identifies more genome-wide significant loci (verified by meta-analyses of much larger populations) than any other method. We estimate that 35%–50% of ECG trait loci are likely to have multiple independent effects, suggesting that our method will reveal previously unidentified associations when applied to existing data and will improve power for future association studies.
PMCID: PMC3145613  PMID: 21829371

Results 1-25 (1679569)