PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1659366)

Clipboard (0)
None

Related Articles

1.  Underlying genetic models of inheritance in established type 2 diabetes associations 
American journal of epidemiology  2009;170(5):537-545.
For most associations of common polymorphisms with common diseases, the genetic model of inheritance is unknown. We extended and applied a Bayesian meta-analysis approach to data from 19 studies on 17 replicated associations for type 2 diabetes. For 13 polymorphisms, the data fit very well to an additive model, for 4 polymorphisms the data were consistent with either an additive or dominant model, and for 2 polymorphisms with an additive or recessive model of inheritance for the diabetes risk allele. Results were robust to using different priors and after excluding data where index polymorphisms had been examined indirectly through proxy markers. The Bayesian meta-analysis model yielded point estimates for the genetic effects that are very similar to those previously reported based on fixed or random effects models, but uncertainty about several of the effects was substantially larger. We also examined the extent of between-study heterogeneity in the genetic model and found generally small values of the between-study deviation for the genetic model parameter. Heterosis could not be excluded in 4 SNPs. Information on the genetic model of robustly replicated GWA-derived association signals may be useful for predictive modeling, and for designing biological and functional experiments.
doi:10.1093/aje/kwp145
PMCID: PMC2732984  PMID: 19602701
2.  Inflammation, Insulin Resistance, and Diabetes—Mendelian Randomization Using CRP Haplotypes Points Upstream 
PLoS Medicine  2008;5(8):e155.
Background
Raised C-reactive protein (CRP) is a risk factor for type 2 diabetes. According to the Mendelian randomization method, the association is likely to be causal if genetic variants that affect CRP level are associated with markers of diabetes development and diabetes. Our objective was to examine the nature of the association between CRP phenotype and diabetes development using CRP haplotypes as instrumental variables.
Methods and Findings
We genotyped three tagging SNPs (CRP + 2302G > A; CRP + 1444T > C; CRP + 4899T > G) in the CRP gene and measured serum CRP in 5,274 men and women at mean ages 49 and 61 y (Whitehall II Study). Homeostasis model assessment-insulin resistance (HOMA-IR) and hemoglobin A1c (HbA1c) were measured at age 61 y. Diabetes was ascertained by glucose tolerance test and self-report. Common major haplotypes were strongly associated with serum CRP levels, but unrelated to obesity, blood pressure, and socioeconomic position, which may confound the association between CRP and diabetes risk. Serum CRP was associated with these potential confounding factors. After adjustment for age and sex, baseline serum CRP was associated with incident diabetes (hazard ratio = 1.39 [95% confidence interval 1.29–1.51], HOMA-IR, and HbA1c, but the associations were considerably attenuated on adjustment for potential confounding factors. In contrast, CRP haplotypes were not associated with HOMA-IR or HbA1c (p = 0.52–0.92). The associations of CRP with HOMA-IR and HbA1c were all null when examined using instrumental variables analysis, with genetic variants as the instrument for serum CRP. Instrumental variables estimates differed from the directly observed associations (p = 0.007–0.11). Pooled analysis of CRP haplotypes and diabetes in Whitehall II and Northwick Park Heart Study II produced null findings (p = 0.25–0.88). Analyses based on the Wellcome Trust Case Control Consortium (1,923 diabetes cases, 2,932 controls) using three SNPs in tight linkage disequilibrium with our tagging SNPs also demonstrated null associations.
Conclusions
Observed associations between serum CRP and insulin resistance, glycemia, and diabetes are likely to be noncausal. Inflammation may play a causal role via upstream effectors rather than the downstream marker CRP.
Using a Mendelian randomization approach, Eric Brunner and colleagues show that the associations between serum C-reactive protein and insulin resistance, glycemia, and diabetes are likely to be noncausal.
Editors' Summary
Background.
Diabetes—a common, long-term (chronic) disease that causes heart, kidney, nerve, and eye problems and shortens life expectancy—is characterized by high levels of sugar (glucose) in the blood. In people without diabetes, blood sugar levels are controlled by the hormone insulin. Insulin is released by the pancreas after eating and “instructs” insulin-responsive muscle and fat cells to take up the glucose from the bloodstream that is produced by the digestion of food. In the early stages of type 2 diabetes (the commonest type of diabetes), the muscle and fat cells become nonresponsive to insulin (a condition called insulin resistance), and blood sugar levels increase. The pancreas responds by making more insulin—people with insulin resistance have high blood levels of both insulin and glucose. Eventually, however, the insulin-producing cells in the pancreas start to malfunction, insulin secretion decreases, and frank diabetes develops.
Why Was This Study Done?
Globally, about 200 million people have diabetes, but experts believe this number will double by 2030. Ways to prevent or delay the onset of diabetes are, therefore, urgently needed. One major risk factor for insulin resistance and diabetes is being overweight. According to one theory, increased body fat causes mild, chronic tissue inflammation, which leads to insulin resistance. Consistent with this idea, people with higher than normal amounts of the inflammatory protein C-reactive protein (CRP) in their blood have a high risk of developing diabetes. If inflammation does cause diabetes, then drugs that inhibit CRP might prevent diabetes. However, simply measuring CRP and determining whether the people with high levels develop diabetes cannot prove that CRP causes diabetes. Those people with high blood levels of CRP might have other unknown factors in common (confounding factors) that are the real causes of diabetes. In this study, the researchers use “Mendelian randomization” to examine whether increased blood CRP causes diabetes. Some variants of CRP (the gene that encodes CRP) increase the amount of CRP in the blood. Because these variants are inherited randomly, there is no likelihood of confounding factors, and an association between these variants and the development of insulin resistance and diabetes indicates, therefore, that increased CRP levels cause diabetes.
What Did the Researchers Do and Find?
The researchers measured blood CRP levels in more than 5,000 people enrolled in the Whitehall II study, which is investigating factors that affect disease development. They also used the “homeostasis model assessment-insulin resistance” (HOMA-IR) method to estimate insulin sensitivity from blood glucose and insulin measurements, and measured levels of hemoglobin A1c (HbA1c, hemoglobin with sugar attached—a measure of long-term blood sugar control) in these people. Finally, they looked at three “single polynucleotide polymorphisms” (SNPs, single nucleotide changes in a gene's DNA sequence; combinations of SNPs that are inherited as a block are called haplotypes) in CRP in each study participant. Common haplotypes of CRP were related to blood serum CRP levels and, as previously reported, increased blood CRP levels were associated with diabetes and with HOMA-IR and HbA1c values indicative of insulin resistance and poor blood sugar control, respectively. By contrast, CRP haplotypes were not related to HOMA-IR or HbA1c values. Similarly, pooled analysis of CRP haplotypes and diabetes in Whitehall II and another large study on health determinants (the Northwick Park Heart Study II) showed no association between CRP variants and diabetes risk. Finally, data from the Wellcome Trust Case Control Consortium also showed no association between CRP haplotypes and diabetes risk.
What Do These Findings Mean?
Together, these findings suggest that increased blood CRP levels are not responsible for the development of insulin resistance or diabetes, at least in European populations. It may be that there is a causal relationship between CRP levels and diabetes risk in other ethnic populations—further Mendelian randomization studies are needed to discover whether this is the case. For now, though, these findings suggest that drugs targeted against CRP are unlikely to prevent or delay the onset of diabetes. However, they do not discount the possibility that proteins involved earlier in the inflammatory process might cause diabetes and might thus represent good drug targets for diabetes prevention.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0050155.
This study is further discussed in a PLoS Medicine Perspective by Bernard Keavney
The MedlinePlus encyclopedia provides information about diabetes and about C-reactive protein (in English and Spanish)
US National Institute of Diabetes and Digestive and Kidney Diseases provides patient information on all aspects of diabetes, including information on insulin resistance (in English and Spanish)
The International Diabetes Federation provides information about diabetes, including information on the global diabetes epidemic
The US Centers for Disease Control and Prevention provides information for the public and professionals on all aspects of diabetes (in English and Spanish)
Wikipedia has a page on Mendelian randomization (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
doi:10.1371/journal.pmed.0050155
PMCID: PMC2504484  PMID: 18700811
3.  GUESS-ing Polygenic Associations with Multiple Phenotypes Using a GPU-Based Evolutionary Stochastic Search Algorithm 
PLoS Genetics  2013;9(8):e1003657.
Genome-wide association studies (GWAS) yielded significant advances in defining the genetic architecture of complex traits and disease. Still, a major hurdle of GWAS is narrowing down multiple genetic associations to a few causal variants for functional studies. This becomes critical in multi-phenotype GWAS where detection and interpretability of complex SNP(s)-trait(s) associations are complicated by complex Linkage Disequilibrium patterns between SNPs and correlation between traits. Here we propose a computationally efficient algorithm (GUESS) to explore complex genetic-association models and maximize genetic variant detection. We integrated our algorithm with a new Bayesian strategy for multi-phenotype analysis to identify the specific contribution of each SNP to different trait combinations and study genetic regulation of lipid metabolism in the Gutenberg Health Study (GHS). Despite the relatively small size of GHS (n = 3,175), when compared with the largest published meta-GWAS (n>100,000), GUESS recovered most of the major associations and was better at refining multi-trait associations than alternative methods. Amongst the new findings provided by GUESS, we revealed a strong association of SORT1 with TG-APOB and LIPC with TG-HDL phenotypic groups, which were overlooked in the larger meta-GWAS and not revealed by competing approaches, associations that we replicated in two independent cohorts. Moreover, we demonstrated the increased power of GUESS over alternative multi-phenotype approaches, both Bayesian and non-Bayesian, in a simulation study that mimics real-case scenarios. We showed that our parallel implementation based on Graphics Processing Units outperforms alternative multi-phenotype methods. Beyond multivariate modelling of multi-phenotypes, our Bayesian model employs a flexible hierarchical prior structure for genetic effects that adapts to any correlation structure of the predictors and increases the power to identify associated variants. This provides a powerful tool for the analysis of diverse genomic features, for instance including gene expression and exome sequencing data, where complex dependencies are present in the predictor space.
Author Summary
Nowadays, the availability of cheaper and accurate assays to quantify multiple (endo)phenotypes in large population cohorts allows multi-trait studies. However, these studies are limited by the lack of flexible models integrated with efficient computational tools for genome-wide multi SNPs-traits analyses. To overcome this problem, we propose a novel Bayesian analysis strategy and a new algorithmic implementation which exploits parallel processing architecture for fully multivariate modeling of groups of correlated phenotypes at the genome-wide scale. In addition to increased power of our algorithm over alternative Bayesian and well-established non-Bayesian multi-phenotype methods, we provide an application to a real case study of several blood lipid traits, and show how our method recovered most of the major associations and is better at refining multi-trait polygenic associations than alternative methods. We reveal and replicate in independent cohorts new associations with two phenotypic groups that were not detected by competing multivariate approaches and not noticed by a large meta-GWAS. We also discuss the applicability of the proposed method to large meta-analyses involving hundreds of thousands of individuals and to diverse genomic datasets where complex dependencies in the predictor space are present.
doi:10.1371/journal.pgen.1003657
PMCID: PMC3738451  PMID: 23950726
4.  Power to Detect Risk Alleles Using Genome-Wide Tag SNP Panels 
PLoS Genetics  2007;3(10):e170.
Advances in high-throughput genotyping and the International HapMap Project have enabled association studies at the whole-genome level. We have constructed whole-genome genotyping panels of over 550,000 (HumanHap550) and 650,000 (HumanHap650Y) SNP loci by choosing tag SNPs from all populations genotyped by the International HapMap Project. These panels also contain additional SNP content in regions that have historically been overrepresented in diseases, such as nonsynonymous sites, the MHC region, copy number variant regions and mitochondrial DNA. We estimate that the tag SNP loci in these panels cover the majority of all common variation in the genome as measured by coverage of both all common HapMap SNPs and an independent set of SNPs derived from complete resequencing of genes obtained from SeattleSNPs. We also estimate that, given a sample size of 1,000 cases and 1,000 controls, these panels have the power to detect single disease loci of moderate risk (λ ∼ 1.8–2.0). Relative risks as low as λ ∼ 1.1–1.3 can be detected using 10,000 cases and 10,000 controls depending on the sample population and disease model. If multiple loci are involved, the power increases significantly to detect at least one locus such that relative risks 20%–35% lower can be detected with 80% power if between two and four independent loci are involved. Although our SNP selection was based on HapMap data, which is a subset of all common SNPs, these panels effectively capture the majority of all common variation and provide high power to detect risk alleles that are not represented in the HapMap data.
Author Summary
Advances in high-throughput genotyping technology and the International HapMap Project have enabled genetic association studies at the whole-genome level. Our paper describes two genome-wide SNP panels that contain tag SNPs derived from the International HapMap Project. Tag SNPs are proxies for groups of highly correlated SNPs. Information can be captured for the entire group of correlated SNPs by genotyping only one representative SNP, the tag SNP. These whole-genome SNP panels also contain additional content thought to be overrepresented in disease, such as amino acid–changing nonsynonymous SNPs and mitochondrial SNPs. We show that these panels cover the genome with very high efficiency as measured by coverage of all HapMap SNPs and a set of SNPs derived from completely resequenced genes from the Seattle SNPs database. We also show that these panels have high power to detect disease risk alleles for both HapMap and non-HapMap SNPs. In complex disease where multiple risk alleles are believed to be involved, we show that the ability to detect at least one risk allele with the tag SNP panels is also high.
doi:10.1371/journal.pgen.0030170
PMCID: PMC2000969  PMID: 17922574
5.  Geographic Differences in Genetic Susceptibility to IgA Nephropathy: GWAS Replication Study and Geospatial Risk Analysis 
PLoS Genetics  2012;8(6):e1002765.
IgA nephropathy (IgAN), major cause of kidney failure worldwide, is common in Asians, moderately prevalent in Europeans, and rare in Africans. It is not known if these differences represent variation in genes, environment, or ascertainment. In a recent GWAS, we localized five IgAN susceptibility loci on Chr.6p21 (HLA-DQB1/DRB1, PSMB9/TAP1, and DPA1/DPB2 loci), Chr.1q32 (CFHR3/R1 locus), and Chr.22q12 (HORMAD2 locus). These IgAN loci are associated with risk of other immune-mediated disorders such as type I diabetes, multiple sclerosis, or inflammatory bowel disease. We tested association of these loci in eight new independent cohorts of Asian, European, and African-American ancestry (N = 4,789), followed by meta-analysis with risk-score modeling in 12 cohorts (N = 10,755) and geospatial analysis in 85 world populations. Four susceptibility loci robustly replicated and all five loci were genome-wide significant in the combined cohort (P = 5×10−32–3×10−10), with heterogeneity detected only at the PSMB9/TAP1 locus (I2 = 0.60). Conditional analyses identified two new independent risk alleles within the HLA-DQB1/DRB1 locus, defining multiple risk and protective haplotypes within this interval. We also detected a significant genetic interaction, whereby the odds ratio for the HORMAD2 protective allele was reversed in homozygotes for a CFHR3/R1 deletion (P = 2.5×10−4). A seven–SNP genetic risk score, which explained 4.7% of overall IgAN risk, increased sharply with Eastward and Northward distance from Africa (r = 0.30, P = 3×10−128). This model paralleled the known East–West gradient in disease risk. Moreover, the prediction of a South–North axis was confirmed by registry data showing that the prevalence of IgAN–attributable kidney failure is increased in Northern Europe, similar to multiple sclerosis and type I diabetes. Variation at IgAN susceptibility loci correlates with differences in disease prevalence among world populations. These findings inform genetic, biological, and epidemiological investigations of IgAN and permit cross-comparison with other complex traits that share genetic risk loci and geographic patterns with IgAN.
Author Summary
IgA nephropathy (IgAN) is the most common cause of kidney failure in Asia, has lower prevalence in Europe, and is very infrequent among populations of African ancestry. A long-standing question in the field is whether these differences represent variation in genes, environment, or ascertainment. In a recent genome-wide association study of 5,966 individuals, we identified five susceptibility loci for this trait. In this paper, we study the largest IgAN case-control cohort reported to date, composed of 10,775 individuals of European, Asian, and African-American ancestry. We confirm that all five loci are significant contributors to disease risk across this multi-ethnic cohort. In addition, we identify two novel independent susceptibility alleles within the HLA-DQB1/DRB1 locus and a new genetic interaction between loci on Chr.1p36 and Chr.22q22. We develop a seven–SNP genetic risk score that explains nearly 5% of variation in disease risk. In geospatial analysis of 85 world populations, the genetic risk score closely parallels worldwide patterns of disease prevalence. The genetic risk score also predicts an unsuspected Northward risk gradient in Europe. This genetic prediction is verified by examination of registry data demonstrating, similarly to other immune-mediated diseases such as multiple sclerosis and type I diabetes, a previously unrecognized increase in IgAN–attributable kidney failure in Northern European countries.
doi:10.1371/journal.pgen.1002765
PMCID: PMC3380840  PMID: 22737082
6.  Genome-wide association with diabetes-related traits in the Framingham Heart Study 
BMC Medical Genetics  2007;8(Suppl 1):S16.
Background
Susceptibility to type 2 diabetes may be conferred by genetic variants having modest effects on risk. Genome-wide fixed marker arrays offer a novel approach to detect these variants.
Methods
We used the Affymetrix 100K SNP array in 1,087 Framingham Offspring Study family members to examine genetic associations with three diabetes-related quantitative glucose traits (fasting plasma glucose (FPG), hemoglobin A1c, 28-yr time-averaged FPG (tFPG)), three insulin traits (fasting insulin, HOMA-insulin resistance, and 0–120 min insulin sensitivity index); and with risk for diabetes. We used additive generalized estimating equations (GEE) and family-based association test (FBAT) models to test associations of SNP genotypes with sex-age-age2-adjusted residual trait values, and Cox survival models to test incident diabetes.
Results
We found 415 SNPs associated (at p < 0.001) with at least one of the six quantitative traits in GEE, 242 in FBAT (18 overlapped with GEE for 639 non-overlapping SNPs), and 128 associated with incident diabetes (31 overlapped with the 639) giving 736 non-overlapping SNPs. Of these 736 SNPs, 439 were within 60 kb of a known gene. Additionally, 53 SNPs (of which 42 had r2 < 0.80 with each other) had p < 0.01 for incident diabetes AND (all 3 glucose traits OR all 3 insulin traits, OR 2 glucose traits and 2 insulin traits); of these, 36 overlapped with the 736 other SNPs. Of 100K SNPs, one (rs7100927) was in moderate LD (r2 = 0.50) with TCF7L2 (rs7903146), and was associated with risk of diabetes (Cox p-value 0.007, additive hazard ratio for diabetes = 1.56) and with tFPG (GEE p-value 0.03). There were no common (MAF > 1%) 100K SNPs in LD (r2 > 0.05) with ABCC8 A1369S (rs757110), KCNJ11 E23K (rs5219), or SNPs in CAPN10 or HNFa. PPARG P12A (rs1801282) was not significantly associated with diabetes or related traits.
Conclusion
Framingham 100K SNP data is a resource for association tests of known and novel genes with diabetes and related traits posted at . Framingham 100K data replicate the TCF7L2 association with diabetes.
doi:10.1186/1471-2350-8-S1-S16
PMCID: PMC1995610  PMID: 17903298
7.  Generalization and Dilution of Association Results from European GWAS in Populations of Non-European Ancestry: The PAGE Study 
PLoS Biology  2013;11(9):e1001661.
A multi-ethnic study demonstrates that the extrapolation of genetic disease risk models from European populations to other ethnicities is compromised more strongly by genetic structure than by environmental or global genetic background in differential genetic risk associations across ethnicities.
The vast majority of genome-wide association study (GWAS) findings reported to date are from populations with European Ancestry (EA), and it is not yet clear how broadly the genetic associations described will generalize to populations of diverse ancestry. The Population Architecture Using Genomics and Epidemiology (PAGE) study is a consortium of multi-ancestry, population-based studies formed with the objective of refining our understanding of the genetic architecture of common traits emerging from GWAS. In the present analysis of five common diseases and traits, including body mass index, type 2 diabetes, and lipid levels, we compare direction and magnitude of effects for GWAS-identified variants in multiple non-EA populations against EA findings. We demonstrate that, in all populations analyzed, a significant majority of GWAS-identified variants have allelic associations in the same direction as in EA, with none showing a statistically significant effect in the opposite direction, after adjustment for multiple testing. However, 25% of tagSNPs identified in EA GWAS have significantly different effect sizes in at least one non-EA population, and these differential effects were most frequent in African Americans where all differential effects were diluted toward the null. We demonstrate that differential LD between tagSNPs and functional variants within populations contributes significantly to dilute effect sizes in this population. Although most variants identified from GWAS in EA populations generalize to all non-EA populations assessed, genetic models derived from GWAS findings in EA may generate spurious results in non-EA populations due to differential effect sizes. Regardless of the origin of the differential effects, caution should be exercised in applying any genetic risk prediction model based on tagSNPs outside of the ancestry group in which it was derived. Models based directly on functional variation may generalize more robustly, but the identification of functional variants remains challenging.
Author Summary
The number of known associations between human diseases and common genetic variants has grown dramatically in the past decade, most being identified in large-scale genetic studies of people of Western European origin. But because the frequencies of genetic variants can differ substantially between continental populations, it's important to assess how well these associations can be extended to populations with different continental ancestry. Are the correlations between genetic variants, disease endpoints, and risk factors consistent enough for genetic risk models to be reliably applied across different ancestries? Here we describe a systematic analysis of disease outcome and risk-factor–associated variants (tagSNPs) identified in European populations, in which we test whether the effect size of a tagSNP is consistent across six populations with significant non-European ancestry. We demonstrate that although nearly all such tagSNPs have effects in the same direction across all ancestries (i.e., variants associated with higher risk in Europeans will also be associated with higher risk in other populations), roughly a quarter of the variants tested have significantly different magnitude of effect (usually lower) in at least one non-European population. We therefore advise caution in the use of tagSNP-based genetic disease risk models in populations that have a different genetic ancestry from the population in which original associations were first made. We then show that this differential strength of association can be attributed to population-dependent variations in the correlation between tagSNPs and the variant that actually determines risk—the so-called functional variant. Risk models based on functional variants are therefore likely to be more robust than tagSNP-based models.
doi:10.1371/journal.pbio.1001661
PMCID: PMC3775722  PMID: 24068893
8.  EMR-linked GWAS study: investigation of variation landscape of loci for body mass index in children 
Frontiers in Genetics  2013;4:268.
Common variations at the loci harboring the fat mass and obesity gene (FTO), MC4R, and TMEM18 are consistently reported as being associated with obesity and body mass index (BMI) especially in adult population. In order to confirm this effect in pediatric population five European ancestry cohorts from pediatric eMERGE-II network (CCHMC-BCH) were evaluated.
Method: Data on 5049 samples of European ancestry were obtained from the Electronic Medical Records (EMRs) of two large academic centers in five different genotyped cohorts. For all available samples, gender, age, height, and weight were collected and BMI was calculated. To account for age and sex differences in BMI, BMI z-scores were generated using 2000 Centers of Disease Control and Prevention (CDC) growth charts. A Genome-wide association study (GWAS) was performed with BMI z-score. After removing missing data and outliers based on principal components (PC) analyses, 2860 samples were used for the GWAS study. The association between each single nucleotide polymorphism (SNP) and BMI was tested using linear regression adjusting for age, gender, and PC by cohort. The effects of SNPs were modeled assuming additive, recessive, and dominant effects of the minor allele. Meta-analysis was conducted using a weighted z-score approach.
Results: The mean age of subjects was 9.8 years (range 2–19). The proportion of male subjects was 56%. In these cohorts, 14% of samples had a BMI ≥95 and 28 ≥ 85%. Meta analyses produced a signal at 16q12 genomic region with the best result of p = 1.43 × 10-7 [p(rec) = 7.34 × 10-8) for the SNP rs8050136 at the first intron of FTO gene (z = 5.26) and with no heterogeneity between cohorts (p = 0.77). Under a recessive model, another published SNP at this locus, rs1421085, generates the best result [z = 5.782, p(rec) = 8.21 × 10-9]. Imputation in this region using dense 1000-Genome and Hapmap CEU samples revealed 71 SNPs with p < 10-6, all at the first intron of FTO locus. When hetero-geneity was permitted between cohorts, signals were also obtained in other previously identified loci, including MC4R (rs12964056, p = 6.87 × 10-7, z = -4.98), cholecystokinin CCK (rs8192472, p = 1.33 × 10-6, z = -4.85), Interleukin 15 (rs2099884, p = 1.27 × 10-5, z = 4.34), low density lipoprotein receptor-related protein 1B [LRP1B (rs7583748, p = 0.00013, z = -3.81)] and near transmembrane protein 18 (TMEM18) (rs7561317, p = 0.001, z = -3.17). We also detected a novel locus at chromosome 3 at COL6A5 [best SNP = rs1542829, minor allele frequency (MAF) of 5% p = 4.35 × 10-9, z = 5.89].
Conclusion: An EMR linked cohort study demonstrates that the BMI-Z measurements can be successfully extracted and linked to genomic data with meaningful confirmatory results. We verified the high prevalence of childhood rate of overweight and obesity in our cohort (28%). In addition, our data indicate that genetic variants in the first intron of FTO, a known adult genetic risk factor for BMI, are also robustly associated with BMI in pediatric population.
doi:10.3389/fgene.2013.00268
PMCID: PMC3847941  PMID: 24348519
BMI; obesity; polymorphism; GWAS
9.  Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies 
PLoS Genetics  2008;4(7):e1000130.
Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants, which is a plausible scenario for many complex diseases. We show that simultaneous analysis of the entire set of SNPs from a genome-wide study to identify the subset that best predicts disease outcome is now feasible, thanks to developments in stochastic search methods. We used a Bayesian-inspired penalised maximum likelihood approach in which every SNP can be considered for additive, dominant, and recessive contributions to disease risk. Posterior mode estimates were obtained for regression coefficients that were each assigned a prior with a sharp mode at zero. A non-zero coefficient estimate was interpreted as corresponding to a significant SNP. We investigated two prior distributions and show that the normal-exponential-gamma prior leads to improved SNP selection in comparison with single-SNP tests. We also derived an explicit approximation for type-I error that avoids the need to use permutation procedures. As well as genome-wide analyses, our method is well-suited to fine mapping with very dense SNP sets obtained from re-sequencing and/or imputation. It can accommodate quantitative as well as case-control phenotypes, covariate adjustment, and can be extended to search for interactions. Here, we demonstrate the power and empirical type-I error of our approach using simulated case-control data sets of up to 500 K SNPs, a real genome-wide data set of 300 K SNPs, and a sequence-based dataset, each of which can be analysed in a few hours on a desktop workstation.
Author Summary
Tests of association with disease status are normally conducted one SNP at a time, ignoring the effects of all other genotyped SNPs. We developed a computationally efficient method to simultaneously analyse all SNPs, either in a genome-wide association (GWA) study, or a fine-mapping study based on re-sequencing and/or imputation. The method selects a subset of SNPs that best predicts disease status, while controlling the type-I error of the selected SNPs. This brings many advantages over standard single-SNP approaches, because the signal from a particular SNP can be more clearly assessed when other SNPs associated with disease status are already included in the model. Thus, in comparison with single-SNP analyses, power is increased and the false positive rate is reduced because of reduced residual variation. Localisation is also greatly improved. We demonstrate these advantages over the widely used single-SNP Armitage Trend Test using GWA simulation studies, a real GWA dataset, and a sequence-based fine-mapping simulation study.
doi:10.1371/journal.pgen.1000130
PMCID: PMC2464715  PMID: 18654633
10.  Replication of genetic loci for ages at menarche and menopause in the multi-ethnic Population Architecture using Genomics and Epidemiology (PAGE) study 
Human Reproduction (Oxford, England)  2013;28(6):1695-1706.
STUDY QUESTION
Do genetic associations identified in genome-wide association studies (GWAS) of age at menarche (AM) and age at natural menopause (ANM) replicate in women of diverse race/ancestry from the Population Architecture using Genomics and Epidemiology (PAGE) Study?
SUMMARY ANSWER
We replicated GWAS reproductive trait single nucleotide polymorphisms (SNPs) in our European descent population and found that many SNPs were also associated with AM and ANM in populations of diverse ancestry.
WHAT IS KNOWN ALREADY
Menarche and menopause mark the reproductive lifespan in women and are important risk factors for chronic diseases including obesity, cardiovascular disease and cancer. Both events are believed to be influenced by environmental and genetic factors, and vary in populations differing by genetic ancestry and geography. Most genetic variants associated with these traits have been identified in GWAS of European-descent populations.
STUDY DESIGN, SIZE, DURATION
A total of 42 251 women of diverse ancestry from PAGE were included in cross-sectional analyses of AM and ANM.
MATERIALS, SETTING, METHODS
SNPs previously associated with ANM (n = 5 SNPs) and AM (n = 3 SNPs) in GWAS were genotyped in American Indians, African Americans, Asians, European Americans, Hispanics and Native Hawaiians. To test SNP associations with ANM or AM, we used linear regression models stratified by race/ethnicity and PAGE sub-study. Results were then combined in race-specific fixed effect meta-analyses for each outcome. For replication and generalization analyses, significance was defined at P < 0.01 for ANM analyses and P < 0.017 for AM analyses.
MAIN RESULTS AND THE ROLE OF CHANCE
We replicated findings for AM SNPs in the LIN28B locus and an intergenic region on 9q31 in European Americans. The LIN28B SNPs (rs314277 and rs314280) were also significantly associated with AM in Asians, but not in other race/ethnicity groups. Linkage disequilibrium (LD) patterns at this locus varied widely among the ancestral groups. With the exception of an intergenic SNP at 13q34, all ANM SNPs replicated in European Americans. Three were significantly associated with ANM in other race/ethnicity populations: rs2153157 (6p24.2/SYCP2L), rs365132 (5q35/UIMC1) and rs16991615 (20p12.3/MCM8). While rs1172822 (19q13/BRSK1) was not significant in the populations of non-European descent, effect sizes showed similar trends.
LIMITATIONS, REASONS FOR CAUTION
Lack of association for the GWAS SNPs in the non-European American groups may be due to differences in locus LD patterns between these groups and the European-descent populations included in the GWAS discovery studies; and in some cases, lower power may also contribute to non-significant findings.
WIDER IMPLICATIONS OF THE FINDINGS
The discovery of genetic variants associated with the reproductive traits provides an important opportunity to elucidate the biological mechanisms involved with normal variation and disorders of menarche and menopause. In this study we replicated most, but not all reported SNPs in European descent populations and examined the epidemiologic architecture of these early reported variants, describing their generalizability and effect size across differing ancestral populations. Such data will be increasingly important for prioritizing GWAS SNPs for follow-up in fine-mapping and resequencing studies, as well as in translational research.
STUDY FUNDING/COMPETING INTEREST(S)
The Population Architecture Using Genomics and Epidemiology (PAGE) program is funded by the National Human Genome Research Institute (NHGRI), supported by U01HG004803 (CALiCo), U01HG004798 (EAGLE), U01HG004802 (MEC), U01HG004790 (WHI) and U01HG004801 (Coordinating Center), and their respective NHGRI ARRA supplements. The authors report no conflicts of interest.
doi:10.1093/humrep/det071
PMCID: PMC3657124  PMID: 23508249
menopause; menarche; genome-wide association study; race/ethnicity; single nucleotide polymorphism
11.  Novel Meta-Analysis-Derived Type 2 Diabetes Risk Loci Do Not Determine Prediabetic Phenotypes 
PLoS ONE  2008;3(8):e3019.
Background
Genome-wide association (GWA) studies identified a series of novel type 2 diabetes risk loci. Most of them were subsequently demonstrated to affect insulin secretion of pancreatic β-cells. Very recently, a meta-analysis of GWA data revealed nine additional risk loci with still undefined roles in the pathogenesis of type 2 diabetes. Using our thoroughly phenotyped cohort of subjects at an increased risk for type 2 diabetes, we assessed the association of the nine latest genetic variants with the predominant prediabetes traits, i.e., obesity, impaired insulin secretion, and insulin resistance.
Methodology/Principal Findings
One thousand five hundred and seventy-eight metabolically characterized non-diabetic German subjects were genotyped for the reported candidate single nucleotide polymorphisms (SNPs) JAZF1 rs864745, CDC123/CAMK1D rs12779790, TSPAN8/LGR5 rs7961581, THADA rs7578597, ADAMTS9 rs4607103, NOTCH2 rs10923931, DCD rs1153188, VEGFA rs9472138, and BCL11A rs10490072. Insulin sensitivity was derived from fasting glucose and insulin concentrations, oral glucose tolerance test (OGTT), and hyperinsulinemic-euglycemic clamp. Insulin secretion was estimated from OGTT data. After appropriate adjustment for confounding variables and Bonferroni correction for multiple comparisons (corrected α-level: p = 0.0014), none of the SNPs was reliably associated with adiposity, insulin sensitivity, or insulin secretion (all p≥0.0117, dominant inheritance model). The risk alleles of ADAMTS9 SNP rs4607103 and VEGFA SNP rs9472138 tended to associate with more than one measure of insulin sensitivity and insulin secretion, respectively, but did not reach formal statistical significance. The study was sufficiently powered (1-β = 0.8) to detect effect sizes of 0.19≤d≤0.25 (α = 0.0014) and 0.13≤d≤0.16 (α = 0.05).
Conclusions/Significance
In contrast to the first series of GWA-derived type 2 diabetes candidate SNPs, we could not detect reliable associations of the novel risk loci with prediabetic phenotypes. Possible weak effects of ADAMTS9 SNP rs4607103 and VEGFA SNP rs9472138 on insulin sensitivity and insulin secretion, respectively, await further confirmation by larger studies.
doi:10.1371/journal.pone.0003019
PMCID: PMC2500187  PMID: 18714373
12.  Re-Ranking Sequencing Variants in the Post-GWAS Era for Accurate Causal Variant Identification 
PLoS Genetics  2013;9(8):e1003609.
Next generation sequencing has dramatically increased our ability to localize disease-causing variants by providing base-pair level information at costs increasingly feasible for the large sample sizes required to detect complex-trait associations. Yet, identification of causal variants within an established region of association remains a challenge. Counter-intuitively, certain factors that increase power to detect an associated region can decrease power to localize the causal variant. First, combining GWAS with imputation or low coverage sequencing to achieve the large sample sizes required for high power can have the unintended effect of producing differential genotyping error among SNPs. This tends to bias the relative evidence for association toward better genotyped SNPs. Second, re-use of GWAS data for fine-mapping exploits previous findings to ensure genome-wide significance in GWAS-associated regions. However, using GWAS findings to inform fine-mapping analysis can bias evidence away from the causal SNP toward the tag SNP and SNPs in high LD with the tag. Together these factors can reduce power to localize the causal SNP by more than half. Other strategies commonly employed to increase power to detect association, namely increasing sample size and using higher density genotyping arrays, can, in certain common scenarios, actually exacerbate these effects and further decrease power to localize causal variants. We develop a re-ranking procedure that accounts for these adverse effects and substantially improves the accuracy of causal SNP identification, often doubling the probability that the causal SNP is top-ranked. Application to the NCI BPC3 aggressive prostate cancer GWAS with imputation meta-analysis identified a new top SNP at 2 of 3 associated loci and several additional possible causal SNPs at these loci that may have otherwise been overlooked. This method is simple to implement using R scripts provided on the author's website.
Author Summary
As next-generation sequencing (NGS) costs continue to fall and genome-wide association study (GWAS) platform coverage improves, the human genetics community is positioned to identify potentially causal variants. However, current NGS or imputation-based studies of either the whole genome or regions previously identified by GWAS have not yet been very successful in identifying causal variants. A major hurdle is the development of methods to distinguish disease-causing variants from their highly-correlated proxies within an associated region. We show that various common factors, such as differential sequencing or imputation accuracy rates and linkage disequilibrium patterns, with or without GWAS-informed region selection, can substantially decrease the probability of identifying the correct causal SNP, often by more than half. We then describe a novel and easy-to-implement re-ranking procedure that can double the probability that the causal SNP is top-ranked in many settings. Application to the NCI Breast and Prostate Cancer (BPC3) Cohort Consortium aggressive prostate cancer data identified new top SNPs within two associated loci previously established via GWAS, as well as several additional possible causal SNPs that had been previously overlooked.
doi:10.1371/journal.pgen.1003609
PMCID: PMC3738448  PMID: 23950724
13.  Meta-Analysis of Genome-Wide Association Studies in African Americans Provides Insights into the Genetic Architecture of Type 2 Diabetes 
Ng, Maggie C. Y. | Shriner, Daniel | Chen, Brian H. | Li, Jiang | Chen, Wei-Min | Guo, Xiuqing | Liu, Jiankang | Bielinski, Suzette J. | Yanek, Lisa R. | Nalls, Michael A. | Comeau, Mary E. | Rasmussen-Torvik, Laura J. | Jensen, Richard A. | Evans, Daniel S. | Sun, Yan V. | An, Ping | Patel, Sanjay R. | Lu, Yingchang | Long, Jirong | Armstrong, Loren L. | Wagenknecht, Lynne | Yang, Lingyao | Snively, Beverly M. | Palmer, Nicholette D. | Mudgal, Poorva | Langefeld, Carl D. | Keene, Keith L. | Freedman, Barry I. | Mychaleckyj, Josyf C. | Nayak, Uma | Raffel, Leslie J. | Goodarzi, Mark O. | Chen, Y-D Ida | Taylor, Herman A. | Correa, Adolfo | Sims, Mario | Couper, David | Pankow, James S. | Boerwinkle, Eric | Adeyemo, Adebowale | Doumatey, Ayo | Chen, Guanjie | Mathias, Rasika A. | Vaidya, Dhananjay | Singleton, Andrew B. | Zonderman, Alan B. | Igo, Robert P. | Sedor, John R. | Kabagambe, Edmond K. | Siscovick, David S. | McKnight, Barbara | Rice, Kenneth | Liu, Yongmei | Hsueh, Wen-Chi | Zhao, Wei | Bielak, Lawrence F. | Kraja, Aldi | Province, Michael A. | Bottinger, Erwin P. | Gottesman, Omri | Cai, Qiuyin | Zheng, Wei | Blot, William J. | Lowe, William L. | Pacheco, Jennifer A. | Crawford, Dana C. | Grundberg, Elin | Rich, Stephen S. | Hayes, M. Geoffrey | Shu, Xiao-Ou | Loos, Ruth J. F. | Borecki, Ingrid B. | Peyser, Patricia A. | Cummings, Steven R. | Psaty, Bruce M. | Fornage, Myriam | Iyengar, Sudha K. | Evans, Michele K. | Becker, Diane M. | Kao, W. H. Linda | Wilson, James G. | Rotter, Jerome I. | Sale, Michèle M. | Liu, Simin | Rotimi, Charles N. | Bowden, Donald W.
PLoS Genetics  2014;10(8):e1004517.
Type 2 diabetes (T2D) is more prevalent in African Americans than in Europeans. However, little is known about the genetic risk in African Americans despite the recent identification of more than 70 T2D loci primarily by genome-wide association studies (GWAS) in individuals of European ancestry. In order to investigate the genetic architecture of T2D in African Americans, the MEta-analysis of type 2 DIabetes in African Americans (MEDIA) Consortium examined 17 GWAS on T2D comprising 8,284 cases and 15,543 controls in African Americans in stage 1 analysis. Single nucleotide polymorphisms (SNPs) association analysis was conducted in each study under the additive model after adjustment for age, sex, study site, and principal components. Meta-analysis of approximately 2.6 million genotyped and imputed SNPs in all studies was conducted using an inverse variance-weighted fixed effect model. Replications were performed to follow up 21 loci in up to 6,061 cases and 5,483 controls in African Americans, and 8,130 cases and 38,987 controls of European ancestry. We identified three known loci (TCF7L2, HMGA2 and KCNQ1) and two novel loci (HLA-B and INS-IGF2) at genome-wide significance (4.15×10−94
Author Summary
Despite the higher prevalence of type 2 diabetes (T2D) in African Americans than in Europeans, recent genome-wide association studies (GWAS) were examined primarily in individuals of European ancestry. In this study, we performed meta-analysis of 17 GWAS in 8,284 cases and 15,543 controls to explore the genetic architecture of T2D in African Americans. Following replication in additional 6,061 cases and 5,483 controls in African Americans, and 8,130 cases and 38,987 controls of European ancestry, we identified two novel and three previous reported T2D loci reaching genome-wide significance. We also examined 158 loci previously reported to be associated with T2D or regulating glucose homeostasis. While 56% of these loci were shared between African Americans and the other populations, the strongest associations in African Americans are often found in nearby single nucleotide polymorphisms (SNPs) instead of the original SNPs reported in other populations due to differential genetic architecture across populations. Our results highlight the importance of performing genetic studies in non-European populations to fine map the causal genetic variants.
doi:10.1371/journal.pgen.1004517
PMCID: PMC4125087  PMID: 25102180
The lancet. Respiratory medicine  2013;1(4):309-317.
Summary
Background
Idiopathic pulmonary fibrosis (IPF) is a devastating disease that probably involves several genetic loci. Several rare genetic variants and one common single nucleotide polymorphism (SNP) of MUC5B have been associated with the disease. Our aim was to identify additional common variants associated with susceptibility and ultimately mortality in IPF.
Methods
First, we did a three-stage genome-wide association study (GWAS): stage one was a discovery GWAS; and stages two and three were independent case-control studies. DNA samples from European-American patients with IPF meeting standard criteria were obtained from several US centres for each stage. Data for European-American control individuals for stage one were gathered from the database of genotypes and phenotypes; additional control individuals were recruited at the University of Pittsburgh to increase the number. For controls in stages two and three, we gathered data for additional sex-matched European-American control individuals who had been recruited in another study. DNA samples from patients and from control individuals were genotyped to identify SNPs associated with IPF. SNPs identified in stage one were carried forward to stage two, and those that achieved genome-wide significance (p<5 × 10−8) in a meta-analysis were carried forward to stage three. Three case series with follow-up data were selected from stages one and two of the GWAS using samples with follow-up data. Mortality analyses were done in these case series to assess the SNPs associated with IPF that had achieved genome-wide significance in the meta-analysis of stages one and two. Finally, we obtained gene-expression profiling data for lungs of patients with IPF from the Lung Genomics Research Consortium and analysed correlation with SNP genotypes.
Findings
In stage one of the GWAS (542 patients with IPF, 542 control individuals matched one-by-one to cases by genetic ancestry estimates), we identified 20 loci. Six SNPs reached genome-wide significance in stage two (544 patients, 687 control individuals): three TOLLIP SNPs (rs111521887, rs5743894, rs5743890) and one MUC5B SNP (rs35705950) at 11p15.5; one MDGA2 SNP (rs7144383) at 14q21.3; and one SPPL2C SNP (rs17690703) at 17q21.31. Stage three (324 patients, 702 control individuals) confirmed the associations for all these SNPs, except for rs7144383. Linkage disequilibrium between the MUC5B SNP (rs35705950) and TOLLIP SNPs (rs111521887 [r2=0.07], rs5743894 [r2=0.16], and rs5743890 [r2=0.01]) was low. 683 patients from the GWAS were included in the mortality analysis. Individuals who developed IPF despite having the protective TOLLIP minor allele of rs5743890 carried an increased mortality risk (meta-analysis with fixed-effect model: hazard ratio 1.72 [95% CI 1.24–2.38]; p=0.0012). TOLLIP expression was decreased by 20% in individuals carrying the minor allele of rs5743890 (p=0.097), 40% in those with the minor allele of rs111521887 (p=3.0 × 10−4), and 50% in those with the minor allele of rs5743894 (p=2.93 × 10−5) compared with homozygous carriers of common alleles for these SNPs.
Interpretation
Novel variants in TOLLIP and SPPL2C are associated with IPF susceptibility. One novel variant of TOLLIP, rs5743890, is also associated with mortality. These associations and the reduced expression of TOLLIP in patients with IPF who carry TOLLIP SNPs emphasise the importance of this gene in the disease.
Funding
National Institutes of Health; National Heart, Lung, and Blood Institute; Pulmonary Fibrosis Foundation; Coalition for Pulmonary Fibrosis; and Instituto de Salud Carlos III.
doi:10.1016/S2213-2600(13)70045-6
PMCID: PMC3894577  PMID: 24429156
BMC Genomics  2009;10(Suppl 2):S2.
Background
The genome sequence and a high-density SNP map are now available for the chicken and can be used to identify genetic markers for use in marker-assisted selection (MAS). Effective MAS requires high linkage disequilibrium (LD) between markers and quantitative trait loci (QTL), and sustained marker-QTL LD over generations. This study used data from a 3,000 SNP panel to assess the level and consistency of LD between single nucleotide polymorphisms (SNPs) over consecutive years in two egg-layer chicken lines, and analyzed one line by two methods (SNP-wise association and genome-wise Bayesian analysis) to identify markers associated with egg-quality and egg-production phenotypes.
Results
The LD between markers pairs was high at short distances (r2 > 0.2 at < 2 Mb) and remained high after one generation (correlations of 0.80 to 0.92 at < 5 Mb) in both lines. Single- and 3-SNP regression analyses using a mixed model with SNP as fixed effect resulted in 159 and 76 significant tests (P < 0.01), respectively, across 12 traits. A Bayesian analysis called BayesB, that fits all SNPs simultaneously as random effects and uses model averaging procedures, identified 33 SNPs that were included in the model >20% of the time (φ > 0.2) and an additional ten 3-SNP windows that had a sum of φ greater than 0.35. Generally, SNPs included in the Bayesian model also had a small P-value in the 1-SNP analyses.
Conclusion
High LD correlations between markers at short distances across two generations indicate that such markers will retain high LD with linked QTL and be effective for MAS. The different association analysis methods used provided consistent results. Multiple single SNPs and 3-SNP windows were significantly associated with egg-related traits, providing genomic positions of QTL that can be useful for both MAS and to identify causal mutations.
doi:10.1186/1471-2164-10-S2-S2
PMCID: PMC2966334  PMID: 19607653
PLoS Genetics  2011;7(9):e1002293.
Diabetes impacts approximately 200 million people worldwide, of whom approximately 10% are affected by type 1 diabetes (T1D). The application of genome-wide association studies (GWAS) has robustly revealed dozens of genetic contributors to the pathogenesis of T1D, with the most recent meta-analysis identifying in excess of 40 loci. To identify additional genetic loci for T1D susceptibility, we examined associations in the largest meta-analysis to date between the disease and ∼2.54 million SNPs in a combined cohort of 9,934 cases and 16,956 controls. Targeted follow-up of 53 SNPs in 1,120 affected trios uncovered three new loci associated with T1D that reached genome-wide significance. The most significantly associated SNP (rs539514, P = 5.66×10−11) resides in an intronic region of the LMO7 (LIM domain only 7) gene on 13q22. The second most significantly associated SNP (rs478222, P = 3.50×10−9) resides in an intronic region of the EFR3B (protein EFR3 homolog B) gene on 2p23; however, the region of linkage disequilibrium is approximately 800 kb and harbors additional multiple genes, including NCOA1, C2orf79, CENPO, ADCY3, DNAJC27, POMC, and DNMT3A. The third most significantly associated SNP (rs924043, P = 8.06×10−9) lies in an intergenic region on 6q27, where the region of association is approximately 900 kb and harbors multiple genes including WDR27, C6orf120, PHF10, TCTE3, C6orf208, LOC154449, DLL1, FAM120B, PSMB1, TBP, and PCD2. These latest associated regions add to the growing repertoire of gene networks predisposing to T1D.
Author Summary
Despite the fact that there is clearly a large genetic component to type 1 diabetes (T1D), uncovering the genes contributing to this disease has proven challenging. However, in the past three years there has been relatively major progress in this regard, with advances in genetic screening technologies allowing investigators to scan the genome for variants conferring risk for disease without prior hypotheses. Such genome-wide association studies have revealed multiple regions of the genome to be robustly and consistently associated with T1D. More recent findings have been a consequence of combining of multiple datasets from independent investigators in meta-analyses, which have more power to pick up additional variants contributing to the trait. In the current study, we describe the largest meta-analysis of T1D genome-wide genotyped datasets to date, which combines six large studies. As a consequence, we have uncovered three new signals residing at the chromosomal locations 13q22, 2p23, and 6q27, which went on to be replicated in independent sample sets. These latest associated regions add to the growing repertoire of gene networks predisposing to T1D.
doi:10.1371/journal.pgen.1002293
PMCID: PMC3183083  PMID: 21980299
BMC Bioinformatics  2009;10(Suppl 2):S7.
Background
Bayesian networks are powerful instruments to learn genetic models from association studies data. They are able to derive the existing correlation between genetic markers and phenotypic traits and, at the same time, to find the relationships between the markers themselves. However, learning Bayesian networks is often non-trivial due to the high number of variables to be taken into account in the model with respect to the instances of the dataset. Therefore, it becomes very interesting to use an abstraction of the variable space that suitably reduces its dimensionality without losing information. In this paper we present a new strategy to achieve this goal by mapping the SNPs related to the same gene to one meta-variable. In order to assign states to the meta-variables we employ an approach based on classification trees.
Results
We applied our approach to data coming from a genome-wide scan on 288 individuals affected by arterial hypertension and 271 nonagenarians without history of hypertension. After pre-processing, we focused on a subset of 24 SNPs. We compared the performance of the proposed approach with the Bayesian network learned with SNPs as variables and with the network learned with haplotypes as meta-variables. The results were obtained by running a hold-out experiment five times. The mean accuracy of the new method was 64.28%, while the mean accuracy of the SNPs network was 58.99% and the mean accuracy of the haplotype network was 54.57%.
Conclusion
The new approach presented in this paper is able to derive a gene-based predictive model based on SNPs data. Such model is more parsimonious than the one based on single SNPs, while preserving the capability of highlighting predictive SNPs configurations. The prediction performance of this approach was consistently superior to the SNP-based and the haplotype-based one in all the test sets of the evaluation procedure. The method can be then considered as an alternative way to analyze the data coming from association studies.
doi:10.1186/1471-2105-10-S2-S7
PMCID: PMC2646249  PMID: 19208195
PLoS Medicine  2013;10(2):e1001383.
A mendelian randomization study based on data from multiple cohorts conducted by Karani Santhanakrishnan Vimaleswaran and colleagues re-examines the causal nature of the relationship between vitamin D levels and obesity.
Background
Obesity is associated with vitamin D deficiency, and both are areas of active public health concern. We explored the causality and direction of the relationship between body mass index (BMI) and 25-hydroxyvitamin D [25(OH)D] using genetic markers as instrumental variables (IVs) in bi-directional Mendelian randomization (MR) analysis.
Methods and Findings
We used information from 21 adult cohorts (up to 42,024 participants) with 12 BMI-related SNPs (combined in an allelic score) to produce an instrument for BMI and four SNPs associated with 25(OH)D (combined in two allelic scores, separately for genes encoding its synthesis or metabolism) as an instrument for vitamin D. Regression estimates for the IVs (allele scores) were generated within-study and pooled by meta-analysis to generate summary effects.
Associations between vitamin D scores and BMI were confirmed in the Genetic Investigation of Anthropometric Traits (GIANT) consortium (n = 123,864). Each 1 kg/m2 higher BMI was associated with 1.15% lower 25(OH)D (p = 6.52×10−27). The BMI allele score was associated both with BMI (p = 6.30×10−62) and 25(OH)D (−0.06% [95% CI −0.10 to −0.02], p = 0.004) in the cohorts that underwent meta-analysis. The two vitamin D allele scores were strongly associated with 25(OH)D (p≤8.07×10−57 for both scores) but not with BMI (synthesis score, p = 0.88; metabolism score, p = 0.08) in the meta-analysis. A 10% higher genetically instrumented BMI was associated with 4.2% lower 25(OH)D concentrations (IV ratio: −4.2 [95% CI −7.1 to −1.3], p = 0.005). No association was seen for genetically instrumented 25(OH)D with BMI, a finding that was confirmed using data from the GIANT consortium (p≥0.57 for both vitamin D scores).
Conclusions
On the basis of a bi-directional genetic approach that limits confounding, our study suggests that a higher BMI leads to lower 25(OH)D, while any effects of lower 25(OH)D increasing BMI are likely to be small. Population level interventions to reduce BMI are expected to decrease the prevalence of vitamin D deficiency.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Obesity—having an unhealthy amount of body fat—is increasing worldwide. In the US, for example, a third of the adult population is now obese. Obesity is defined as having a body mass index (BMI, an indicator of body fat calculated by dividing a person's weight in kilograms by their height in meters squared) of more than 30.0 kg/m2. Although there is a genetic contribution to obesity, people generally become obese by consuming food and drink that contains more energy than they need for their daily activities. Thus, obesity can be prevented by having a healthy diet and exercising regularly. Compared to people with a healthy weight, obese individuals have an increased risk of developing diabetes, heart disease and stroke, and tend to die younger. They also have a higher risk of vitamin D deficiency, another increasingly common public health concern. Vitamin D, which is essential for healthy bones as well as other functions, is made in the skin after exposure to sunlight but can also be obtained through the diet and through supplements.
Why Was This Study Done?
Observational studies cannot prove that obesity causes vitamin D deficiency because obese individuals may share other characteristics that reduce their circulating 25-hydroxy vitamin D [25(OH)D] levels (referred to as confounding). Moreover, observational studies cannot indicate whether the larger vitamin D storage capacity of obese individuals (vitamin D is stored in fatty tissues) lowers their 25(OH)D levels or whether 25(OH)D levels influence fat accumulation (reverse causation). If obesity causes vitamin D deficiency, monitoring and treating vitamin D deficiency might alleviate some of the adverse health effects of obesity. Conversely, if low vitamin D levels cause obesity, encouraging people to take vitamin D supplements might help to control the obesity epidemic. Here, the researchers use bi-directional “Mendelian randomization” to examine the direction and causality of the relationship between BMI and 25(OH)D. In Mendelian randomization, causality is inferred from associations between genetic variants that mimic the influence of a modifiable environmental exposure and the outcome of interest. Because gene variants do not change over time and are inherited randomly, they are not prone to confounding and are free from reverse causation. Thus, if a lower vitamin D status leads to obesity, genetic variants associated with lower 25(OH)D concentrations should be associated with higher BMI, and if obesity leads to a lower vitamin D status, then genetic variants associated with higher BMI should be associated with lower 25(OH)D concentrations.
What Did the Researchers Do and Find?
The researchers created a “BMI allele score” based on 12 BMI-related gene variants and two “25(OH)D allele scores,” which are based on gene variants that affect either 25(OH)D synthesis or breakdown. Using information on up to 42,024 participants from 21 studies, the researchers showed that the BMI allele score was associated with both BMI and with 25(OH)D levels among the study participants. Based on this information, they calculated that each 10% increase in BMI will lead to a 4.2% decrease in 25(OH)D concentrations. By contrast, although both 25(OH)D allele scores were strongly associated with 25(OH)D levels, neither score was associated with BMI. This lack of an association between 25(OH)D allele scores and obesity was confirmed using data from more than 100,000 individuals involved in 46 studies that has been collected by the GIANT (Genetic Investigation of Anthropometric Traits) consortium.
What Do These Findings Mean?
These findings suggest that a higher BMI leads to a lower vitamin D status whereas any effects of low vitamin D status on BMI are likely to be small. That is, these findings provide evidence for obesity as a causal factor in the development of vitamin D deficiency but not for vitamin D deficiency as a causal factor in the development of obesity. These findings suggest that population-level interventions to reduce obesity should lead to a reduction in the prevalence of vitamin D deficiency and highlight the importance of monitoring and treating vitamin D deficiency as a means of alleviating the adverse influences of obesity on health.
Additional Information
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001383.
The US Centers for Disease Control and Prevention provides information on all aspects of overweight and obesity (in English and Spanish); a data brief provides information about the vitamin D status of the US population
The World Health Organization provides information on obesity (in several languages)
The UK National Health Service Choices website provides detailed information about obesity and a link to a personal story about losing weight; it also provides information about vitamin D
The International Obesity Taskforce provides information about the global obesity epidemic
The US Department of Agriculture's ChooseMyPlate.gov website provides a personal healthy eating plan; the Weight-control Information Network is an information service provided for the general public and health professionals by the US National Institute of Diabetes and Digestive and Kidney Diseases (in English and Spanish)
The US Office of Dietary Supplements provides information about vitamin D (in English and Spanish)
MedlinePlus has links to further information about obesity and about vitamin D (in English and Spanish)
Wikipedia has a page on Mendelian randomization (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
Overview and details of the collaborative large-scale genetic association study (D-CarDia) provide information about vitamin D and the risk of cardiovascular disease, diabetes and related traits
doi:10.1371/journal.pmed.1001383
PMCID: PMC3564800  PMID: 23393431
PLoS Genetics  2011;7(9):e1002264.
Chronic kidney disease (CKD) is an increasing global public health concern, particularly among populations of African ancestry. We performed an interrogation of known renal loci, genome-wide association (GWA), and IBC candidate-gene SNP association analyses in African Americans from the CARe Renal Consortium. In up to 8,110 participants, we performed meta-analyses of GWA and IBC array data for estimated glomerular filtration rate (eGFR), CKD (eGFR <60 mL/min/1.73 m2), urinary albumin-to-creatinine ratio (UACR), and microalbuminuria (UACR >30 mg/g) and interrogated the 250 kb flanking region around 24 SNPs previously identified in European Ancestry renal GWAS analyses. Findings were replicated in up to 4,358 African Americans. To assess function, individually identified genes were knocked down in zebrafish embryos by morpholino antisense oligonucleotides. Expression of kidney-specific genes was assessed by in situ hybridization, and glomerular filtration was evaluated by dextran clearance. Overall, 23 of 24 previously identified SNPs had direction-consistent associations with eGFR in African Americans, 2 of which achieved nominal significance (UMOD, PIP5K1B). Interrogation of the flanking regions uncovered 24 new index SNPs in African Americans, 12 of which were replicated (UMOD, ANXA9, GCKR, TFDP2, DAB2, VEGFA, ATXN2, GATM, SLC22A2, TMEM60, SLC6A13, and BCAS3). In addition, we identified 3 suggestive loci at DOK6 (p-value = 5.3×10−7) and FNDC1 (p-value = 3.0×10−7) for UACR, and KCNQ1 with eGFR (p = 3.6×10−6). Morpholino knockdown of kcnq1 in the zebrafish resulted in abnormal kidney development and filtration capacity. We identified several SNPs in association with eGFR in African Ancestry individuals, as well as 3 suggestive loci for UACR and eGFR. Functional genetic studies support a role for kcnq1 in glomerular development in zebrafish.
Author Summary
Chronic kidney disease (CKD) is an increasing global public health problem and disproportionately affects populations of African ancestry. Many studies have shown that genetic variants are associated with the development of CKD; however, similar studies are lacking in African ancestry populations. The CARe consortium consists of more than 8,000 individuals of African ancestry; genome-wide association analysis for renal-related phenotypes was conducted. In cross-ethnicity analyses, we found that 23 of 24 previously identified SNPs in European ancestry populations have the same effect direction in our samples of African ancestry. We also identified 3 suggestive genetic variants associated with measurement of kidney function. We then tested these genes in zebrafish knockdown models and demonstrated that kcnq1 is involved in kidney development in zebrafish. These results highlight the similarity of genetic variants across ethnicities and show that cross-species modeling in zebrafish is feasible for genes associated with chronic human disease.
doi:10.1371/journal.pgen.1002264
PMCID: PMC3169523  PMID: 21931561
PLoS ONE  2012;7(5):e37878.
Objective
To investigate the association between common transforming growth factor beta (TGF-β) single nucleotide polymorphisms (SNP) and significant complications of coronary heart disease (CHD).
Method
We performed a meta-analysis of published case-control studies assessing the association of TGF-β SNPs with a range of CHD complications. A random effects model was used to calculate odds ratios and confidence intervals. Analyses were conducted for additive, dominant and recessive modes of inheritance.
Results
Six studies involving 5535 cases and 2970 controls examining the association of common SNPs in TGF-β1 with CHD were identified. Applying a dominant model of inheritance, three TGF-β1 SNPs were significantly associated with CHD complications: The T alleles of rs1800469 (OR = 1.125, 95% CI 1.016–1.247, p = 0.031) and rs1800470 (OR = 1.146, 95% CI 1.026–1.279, p = 0.021); and the C allele of rs1800471 (OR = 1.207, 95% CI 1.037–1.406, p = 0.021).
Conclusion
This meta-analysis suggests that common genetic polymorphisms in TGF-β1 are associated with complications of CHD.
doi:10.1371/journal.pone.0037878
PMCID: PMC3360665  PMID: 22662243
Human Molecular Genetics  2009;18(18):3496-3501.
Recently, the rs6232 (N221D) and rs6235 (S690T) SNPs in the PCSK1 gene were associated with obesity in a meta-analysis comprising more than 13 000 individuals of European ancestry. Each additional minor allele of rs6232 or rs6235 was associated with a 1.34- or 1.22-fold increase in the risk of obesity, respectively. So far, only one relatively small study has aimed to replicate these findings, but could not confirm the association of the rs6235 SNP and did not study the rs6232 variant. In the present study, we examined the associations of the rs6232 and rs6235 SNPs with obesity in a population-based cohort consisting of 20 249 individuals of European descent from Norfolk, UK. Logistic regression and generalized linear models were used to test the associations of the risk alleles with obesity and related quantitative traits, respectively. Neither of the SNPs was significantly associated with obesity, BMI or waist circumference under the additive genetic model (P > 0.05). However, we observed an interaction between rs6232 and age on the level of BMI (P = 0.010) and risk of obesity (P = 0.020). The rs6232 SNP was associated with BMI (P = 0.021) and obesity (P = 0.022) in the younger individuals [less than median age (59 years)], but not among the older age group (P = 0.81 and P = 0.68 for BMI and obesity, respectively). In conclusion, our data suggest that the PCSK1 rs6232 and rs6235 SNPs are not major contributors to common obesity in the general population. However, the effect of rs6232 may be age-dependent.
doi:10.1093/hmg/ddp280
PMCID: PMC2729665  PMID: 19528091
Molecular Vision  2007;13:534-544.
Purpose
To test the association between myocilin gene (MYOC) polymorphisms and high myopia in Hong Kong Chinese by using family-based association study.
Methods
A total of 162 Chinese nuclear families, consisting of 557 members, were recruited from an optometry clinic. Each family had two parents and at least one offspring with high myopia (defined as -6.00D or less for both eyes). All offspring were healthy with no clinical evidence of syndromic disease and other ocular abnormality. Genotyping was performed for two MYOC microsatellites (NGA17 and NGA19) and five tag single nucleotide polymorphisms (SNPs) spreading across the gene. The genotype data were analyzed with Family-Based Association Test (FBAT) software to check linkage and association between the genetic markers and myopia, and with GenAssoc to generate case and pseudocontrol subjects for investigating main effects of genetic markers and calculating the genotype relative risks (GRR).
Results
FBAT analysis showed linkage and association with high myopia for two microsatellites and two SNPs under one to three genetic models after correction for multiple comparisons by false discovery rate. NGA17 at the promoter was significant under an additive model (p=0.0084), while NGA19 at the 3' flanking region showed significant results under both additive (p=0.0172) and dominant (p=0.0053) models. SNP rs2421853 (C>T) exhibited both linkage and association under additive (p=0.0009) and dominant/recessive (p=0.0041) models. SNP rs235858 (T>C) was also significant under additive (p=4.0E-6) and dominant/recessive (p=2.5E-5) models. Both SNPs were downstream of NGA19 at the 3' flanking region. Positive results for these SNPs were novel findings. A stepwise conditional logistic regression analysis of the case-pseudocontrol dataset generated by GenAssoc from the families showed that both SNPs could separately account for the association of NGA17 or NGA19, and that both SNPs contributed separate main effects to high myopia. For rs2421853 and with C/C as the reference genotype, the GRR increased from 1.678 for G/A to 2.738 for A/A (p=9.0E-4, global Wald test). For rs235858 and with G/G as the reference, the GRR increased 2.083 for G/A to 3.931 for A/A (p=2.0E-2, global Wald test). GRR estimates thus suggested an additive model for both SNPs, which was consistent with the finding that, of the three models tested, the additive model gave the lowest p values in FBAT analysis.
Conclusions
Linkage and association was shown between the MYOC polymorphisms and high myopia in our family-based association study. The SNP rs235858 at the 3' flanking region showed the highest degree of confidence for association.
PMCID: PMC2652017  PMID: 17438518
PLoS Genetics  2010;6(5):e1000932.
Genome-wide association studies (GWAS) have demonstrated the ability to identify the strongest causal common variants in complex human diseases. However, to date, the massive data generated from GWAS have not been maximally explored to identify true associations that fail to meet the stringent level of association required to achieve genome-wide significance. Genetics of gene expression (GGE) studies have shown promise towards identifying DNA variations associated with disease and providing a path to functionally characterize findings from GWAS. Here, we present the first empiric study to systematically characterize the set of single nucleotide polymorphisms associated with expression (eSNPs) in liver, subcutaneous fat, and omental fat tissues, demonstrating these eSNPs are significantly more enriched for SNPs that associate with type 2 diabetes (T2D) in three large-scale GWAS than a matched set of randomly selected SNPs. This enrichment for T2D association increases as we restrict to eSNPs that correspond to genes comprising gene networks constructed from adipose gene expression data isolated from a mouse population segregating a T2D phenotype. Finally, by restricting to eSNPs corresponding to genes comprising an adipose subnetwork strongly predicted as causal for T2D, we dramatically increased the enrichment for SNPs associated with T2D and were able to identify a functionally related set of diabetes susceptibility genes. We identified and validated malic enzyme 1 (Me1) as a key regulator of this T2D subnetwork in mouse and provided support for the association of this gene to T2D in humans. This integration of eSNPs and networks provides a novel approach to identify disease susceptibility networks rather than the single SNPs or genes traditionally identified through GWAS, thereby extracting additional value from the wealth of data currently being generated by GWAS.
Author Summary
Genome-wide association studies (GWAS) seek to identify loci in which changes in DNA are correlated with disease. However, GWAS do not necessarily lead directly to genes associated with disease, and they do not typically inform the broader context in which disease genes operate, thereby providing limited insights into the mechanisms driving disease. One critical task to providing further insights into GWAS is developing an understanding of the genetics of gene expression (GGE). We present the first empiric study demonstrating that SNPs in human cohorts that associate with gene expression in liver and adipose tissues are enriched for associating with Type 2 Diabetes (T2D) in humans. By filtering “eSNPs” based on causal gene networks defined in an experimental cross population segregating T2D traits, we demonstrate a dramatically increased enrichment of T2D SNPs that enhance our ability to assess T2D risk. We demonstrate the utility of this approach by identifying malic enzyme 1 (ME1) as a novel T2D susceptibility gene in humans and then functionally validating the causal connection between ME1 and T2D in a mouse knockout model for Me1. This approach provides a path to identifying disease susceptibility networks rather than single SNPs or genes traditionally identified through GWAS.
doi:10.1371/journal.pgen.1000932
PMCID: PMC2865508  PMID: 20463879
Journal of Korean Medical Science  2013;28(6):840-847.
Lung cancer in never-smokers ranks as the seventh most common cause of cancer death worldwide, and the incidence of lung cancer in non-smoking Korean women appears to be steadily increasing. To identify the effect of genetic polymorphisms on lung cancer risk in non-smoking Korean women, we conducted a genome-wide association study of Korean female non-smokers with lung cancer. We analyzed 440,794 genotype data of 285 cases and 1,455 controls, and nineteen SNPs were associated with lung cancer development (P < 0.001). For external validation, nineteen SNPs were replicated in another sample set composed of 293 cases and 495 controls, and only rs10187911 on 2p16.3 was significantly associated with lung cancer development (dominant model, OR of TG or GG, 1.58, P = 0.025). We confirmed this SNP again in another replication set composed of 546 cases and 744 controls (recessive model, OR of GG, 1.32, P = 0.027). OR and P value in combined set were 1.37 and < 0.001 in additive model, 1.51 and < 0.001 in dominant model, and 1.54 and < 0.001 in recessive model. The effect of this SNP was found to be consistent only in adenocarcinoma patients (1.36 and < 0.001 in additive model, 1.49 and < 0.001 in dominant model, and 1.54 and < 0.001 in recessive model). Furthermore, after imputation with HapMap data, we found regional significance near rs10187911, and five SNPs showed P value less than that of rs10187911 (rs12478012, rs4377361, rs13005521, rs12475464, and rs7564130). Therefore, we concluded that a region on chromosome 2 is significantly associated with lung cancer risk in Korean non-smoking women.
doi:10.3346/jkms.2013.28.6.840
PMCID: PMC3677999  PMID: 23772147
Lung Neoplasms; Genome-Wide Association Study; Non-Smoking Women
BMC Genomics  2014;15(1):398.
Background
Genetic association studies are conducted to discover genetic loci that contribute to an inherited trait, identify the variants behind these associations and ascertain their functional role in determining the phenotype. To date, functional annotations of the genetic variants have rarely played more than an indirect role in assessing evidence for association. Here, we demonstrate how these data can be systematically integrated into an association study’s analysis plan.
Results
We developed a Bayesian statistical model for the prior probability of phenotype–genotype association that incorporates data from past association studies and publicly available functional annotation data regarding the susceptibility variants under study. The model takes the form of a binary regression of association status on a set of annotation variables whose coefficients were estimated through an analysis of associated SNPs in the GWAS Catalog (GC). The functional predictors examined included measures that have been demonstrated to correlate with the association status of SNPs in the GC and some whose utility in this regard is speculative: summaries of the UCSC Human Genome Browser ENCODE super–track data, dbSNP function class, sequence conservation summaries, proximity to genomic variants in the Database of Genomic Variants and known regulatory elements in the Open Regulatory Annotation database, PolyPhen–2 probabilities and RegulomeDB categories. Because we expected that only a fraction of the annotations would contribute to predicting association, we employed a penalized likelihood method to reduce the impact of non–informative predictors and evaluated the model’s ability to predict GC SNPs not used to construct the model. We show that the functional data alone are predictive of a SNP’s presence in the GC. Further, using data from a genome–wide study of ovarian cancer, we demonstrate that their use as prior data when testing for association is practical at the genome–wide scale and improves power to detect associations.
Conclusions
We show how diverse functional annotations can be efficiently combined to create ‘functional signatures’ that predict the a priori odds of a variant’s association to a trait and how these signatures can be integrated into a standard genome–wide–scale association analysis, resulting in improved power to detect truly associated variants.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-398) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2164-15-398
PMCID: PMC4041996  PMID: 24886216
Association study; GWAS; SNPs; Functional annotations; Bayesian analysis; ENCODE project

Results 1-25 (1659366)