PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1288915)

Clipboard (0)
None

Related Articles

1.  Population Stratification and Patterns of Linkage Disequilibrium 
Genetic epidemiology  2009;33(Suppl 1):S88-S92.
Although the importance of selecting cases and controls from the same population has been recognized for decades, the recent advent of genome-wide association studies has heightened awareness of this issue. Because these studies typically deal with large samples, small differences in allele frequencies between cases and controls can easily reach statistical significance. When, unbeknownst to a researcher, cases and controls have different substructures, the number of false-positive findings is inflated. There have been three recent developments of purely statistical approaches to assessing the ancestral comparability of case and control samples: genomic control, structured association, and multivariate reduction analyses. The widespread use of high-throughput technology has allowed the quick and accurate genotyping of the large number of markers required by these methods.
Group 13 dealt with four population stratification issues: single-nucleotide polymorphism marker selection, association testing, non-standard methods, and linkage disequilibrium calculations in stratified or mixed ethnicity samples. We demonstrated that there are continuous axes of ethnic variation in both datasets of Genetic Analysis Workshop 16. Furthermore, ignoring this structure created p-value inflation for a variety of phenotypes. Principal-components analysis (or multidimensional scaling) can control inflation as covariates in a logistic regression. One can weight for local ancestry estimation and allow the use of related individuals. Problems arise in the presence of extremely high association or unusually strong linkage disequilibrium (e.g., in chromosomal inversions). Our group also reported a method for performing an association test controlling for substructure when genome-wide markers are not available to explicitly compute stratification.
doi:10.1002/gepi.20478
PMCID: PMC3133943  PMID: 19924707
genetic association; genome-wide association study; principal components; multidimensional scaling; ethnic substructure
2.  Inflammation, Insulin Resistance, and Diabetes—Mendelian Randomization Using CRP Haplotypes Points Upstream 
PLoS Medicine  2008;5(8):e155.
Background
Raised C-reactive protein (CRP) is a risk factor for type 2 diabetes. According to the Mendelian randomization method, the association is likely to be causal if genetic variants that affect CRP level are associated with markers of diabetes development and diabetes. Our objective was to examine the nature of the association between CRP phenotype and diabetes development using CRP haplotypes as instrumental variables.
Methods and Findings
We genotyped three tagging SNPs (CRP + 2302G > A; CRP + 1444T > C; CRP + 4899T > G) in the CRP gene and measured serum CRP in 5,274 men and women at mean ages 49 and 61 y (Whitehall II Study). Homeostasis model assessment-insulin resistance (HOMA-IR) and hemoglobin A1c (HbA1c) were measured at age 61 y. Diabetes was ascertained by glucose tolerance test and self-report. Common major haplotypes were strongly associated with serum CRP levels, but unrelated to obesity, blood pressure, and socioeconomic position, which may confound the association between CRP and diabetes risk. Serum CRP was associated with these potential confounding factors. After adjustment for age and sex, baseline serum CRP was associated with incident diabetes (hazard ratio = 1.39 [95% confidence interval 1.29–1.51], HOMA-IR, and HbA1c, but the associations were considerably attenuated on adjustment for potential confounding factors. In contrast, CRP haplotypes were not associated with HOMA-IR or HbA1c (p = 0.52–0.92). The associations of CRP with HOMA-IR and HbA1c were all null when examined using instrumental variables analysis, with genetic variants as the instrument for serum CRP. Instrumental variables estimates differed from the directly observed associations (p = 0.007–0.11). Pooled analysis of CRP haplotypes and diabetes in Whitehall II and Northwick Park Heart Study II produced null findings (p = 0.25–0.88). Analyses based on the Wellcome Trust Case Control Consortium (1,923 diabetes cases, 2,932 controls) using three SNPs in tight linkage disequilibrium with our tagging SNPs also demonstrated null associations.
Conclusions
Observed associations between serum CRP and insulin resistance, glycemia, and diabetes are likely to be noncausal. Inflammation may play a causal role via upstream effectors rather than the downstream marker CRP.
Using a Mendelian randomization approach, Eric Brunner and colleagues show that the associations between serum C-reactive protein and insulin resistance, glycemia, and diabetes are likely to be noncausal.
Editors' Summary
Background.
Diabetes—a common, long-term (chronic) disease that causes heart, kidney, nerve, and eye problems and shortens life expectancy—is characterized by high levels of sugar (glucose) in the blood. In people without diabetes, blood sugar levels are controlled by the hormone insulin. Insulin is released by the pancreas after eating and “instructs” insulin-responsive muscle and fat cells to take up the glucose from the bloodstream that is produced by the digestion of food. In the early stages of type 2 diabetes (the commonest type of diabetes), the muscle and fat cells become nonresponsive to insulin (a condition called insulin resistance), and blood sugar levels increase. The pancreas responds by making more insulin—people with insulin resistance have high blood levels of both insulin and glucose. Eventually, however, the insulin-producing cells in the pancreas start to malfunction, insulin secretion decreases, and frank diabetes develops.
Why Was This Study Done?
Globally, about 200 million people have diabetes, but experts believe this number will double by 2030. Ways to prevent or delay the onset of diabetes are, therefore, urgently needed. One major risk factor for insulin resistance and diabetes is being overweight. According to one theory, increased body fat causes mild, chronic tissue inflammation, which leads to insulin resistance. Consistent with this idea, people with higher than normal amounts of the inflammatory protein C-reactive protein (CRP) in their blood have a high risk of developing diabetes. If inflammation does cause diabetes, then drugs that inhibit CRP might prevent diabetes. However, simply measuring CRP and determining whether the people with high levels develop diabetes cannot prove that CRP causes diabetes. Those people with high blood levels of CRP might have other unknown factors in common (confounding factors) that are the real causes of diabetes. In this study, the researchers use “Mendelian randomization” to examine whether increased blood CRP causes diabetes. Some variants of CRP (the gene that encodes CRP) increase the amount of CRP in the blood. Because these variants are inherited randomly, there is no likelihood of confounding factors, and an association between these variants and the development of insulin resistance and diabetes indicates, therefore, that increased CRP levels cause diabetes.
What Did the Researchers Do and Find?
The researchers measured blood CRP levels in more than 5,000 people enrolled in the Whitehall II study, which is investigating factors that affect disease development. They also used the “homeostasis model assessment-insulin resistance” (HOMA-IR) method to estimate insulin sensitivity from blood glucose and insulin measurements, and measured levels of hemoglobin A1c (HbA1c, hemoglobin with sugar attached—a measure of long-term blood sugar control) in these people. Finally, they looked at three “single polynucleotide polymorphisms” (SNPs, single nucleotide changes in a gene's DNA sequence; combinations of SNPs that are inherited as a block are called haplotypes) in CRP in each study participant. Common haplotypes of CRP were related to blood serum CRP levels and, as previously reported, increased blood CRP levels were associated with diabetes and with HOMA-IR and HbA1c values indicative of insulin resistance and poor blood sugar control, respectively. By contrast, CRP haplotypes were not related to HOMA-IR or HbA1c values. Similarly, pooled analysis of CRP haplotypes and diabetes in Whitehall II and another large study on health determinants (the Northwick Park Heart Study II) showed no association between CRP variants and diabetes risk. Finally, data from the Wellcome Trust Case Control Consortium also showed no association between CRP haplotypes and diabetes risk.
What Do These Findings Mean?
Together, these findings suggest that increased blood CRP levels are not responsible for the development of insulin resistance or diabetes, at least in European populations. It may be that there is a causal relationship between CRP levels and diabetes risk in other ethnic populations—further Mendelian randomization studies are needed to discover whether this is the case. For now, though, these findings suggest that drugs targeted against CRP are unlikely to prevent or delay the onset of diabetes. However, they do not discount the possibility that proteins involved earlier in the inflammatory process might cause diabetes and might thus represent good drug targets for diabetes prevention.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0050155.
This study is further discussed in a PLoS Medicine Perspective by Bernard Keavney
The MedlinePlus encyclopedia provides information about diabetes and about C-reactive protein (in English and Spanish)
US National Institute of Diabetes and Digestive and Kidney Diseases provides patient information on all aspects of diabetes, including information on insulin resistance (in English and Spanish)
The International Diabetes Federation provides information about diabetes, including information on the global diabetes epidemic
The US Centers for Disease Control and Prevention provides information for the public and professionals on all aspects of diabetes (in English and Spanish)
Wikipedia has a page on Mendelian randomization (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
doi:10.1371/journal.pmed.0050155
PMCID: PMC2504484  PMID: 18700811
3.  Trans-population Analysis of Genetic Mechanisms of Ethnic Disparities in Neuroblastoma Survival 
Background
Black patients with neuroblastoma have a higher prevalence of high-risk disease and worse outcome than white patients. We sought to investigate the relationship between genetic variation and the disparities in survival observed in neuroblastoma.
Methods
The analytic cohort was composed of 2709 patients. Principal components were used to assign patients to genomic ethnic clusters for survival analyses. Locus-specific ancestry was calculated for use in association analysis. The shorter spans of linkage disequilibrium in African populations may facilitate the fine mapping of causal variants in regions previously implicated by genome-wide association studies conducted primarily in patients of European descent. Thus, we evaluated 13 single nucleotide polymorphisms known to be associated with susceptibility to high-risk neuroblastoma from genome-wide association studies and all variants with highly divergent allele frequencies in reference African and European populations near the known susceptibility loci. All statistical tests were two-sided.
Results
African genomic ancestry was associated with high-risk neuroblastoma (P = .007) and lower event-free survival (P = .04, hazard ratio = 1.4, 95% confidence interval = 1.05 to 1.80). rs1033069 within SPAG16 (sperm associated antigen 16) was determined to have higher risk allele frequency in the African reference population and statistically significant association with high-risk disease in patients of European and African ancestry (P = 6.42×10−5, false discovery rate < 0.0015) in the overall cohort. Multivariable analysis using an additive model demonstrated that the SPAG16 single nucleotide polymorphism contributes to the observed ethnic disparities in high-risk disease and survival.
Conclusions
Our study demonstrates that common genetic variation influences neuroblastoma phenotype and contributes to the ethnic disparities in survival observed and illustrates the value of trans-population mapping.
doi:10.1093/jnci/djs503
PMCID: PMC3691940  PMID: 23243203
4.  Adjusting for Population Stratification in a Fine Scale with Principal Components and Sequencing Data 
Genetic epidemiology  2013;37(8):10.1002/gepi.21764.
Population stratification is of primary interest in genetic studies to infer human evolution history and to avoid spurious findings in association testing. Although it is well studied with high-density single nucleotide polymorphisms (SNPs) in genome-wide association studies (GWASs), next-generation sequencing brings both new opportunities and challenges to uncovering population structures in finer scales. Several recent studies have noticed different confounding effects from variants of different minor allele frequencies (MAFs). In this paper, using a low-coverage sequencing dataset from the 1000 Genomes Project, we compared a popular method, principal component analysis (PCA), with a recently proposed spectral clustering technique, called spectral dimensional reduction (SDR), in detecting and adjusting for population stratification at the level of ethnic subgroups. We investigated the varying performance of adjusting for population stratification with different types and sets of variants when testing on different types of variants. One main conclusion is that principal components based on all variants or common variants were generally most effective in controlling inflations caused by population stratification; in particular, contrary to many speculations on the effectiveness of rare variants, we did not find much added value with the use of only rare variants. In addition, SDR was confirmed to be more robust than PCA, especially when applied to rare variants.
doi:10.1002/gepi.21764
PMCID: PMC3864649  PMID: 24123217
1000 Genomes Project; Association testing; Common variants; Principal component analysis; Rare variants; Spectral analysis
5.  Genetic Predisposition to Increased Blood Cholesterol and Triglyceride Lipid Levels and Risk of Alzheimer Disease: A Mendelian Randomization Analysis 
PLoS Medicine  2014;11(9):e1001713.
In this study, Proitsi and colleagues use a Mendelian randomization approach to dissect the causal nature of the association between circulating lipid levels and late onset Alzheimer's Disease (LOAD) and find that genetic predisposition to increased plasma cholesterol and triglyceride lipid levels is not associated with elevated LOAD risk.
Please see later in the article for the Editors' Summary
Background
Although altered lipid metabolism has been extensively implicated in the pathogenesis of Alzheimer disease (AD) through cell biological, epidemiological, and genetic studies, the molecular mechanisms linking cholesterol and AD pathology are still not well understood and contradictory results have been reported. We have used a Mendelian randomization approach to dissect the causal nature of the association between circulating lipid levels and late onset AD (LOAD) and test the hypothesis that genetically raised lipid levels increase the risk of LOAD.
Methods and Findings
We included 3,914 patients with LOAD, 1,675 older individuals without LOAD, and 4,989 individuals from the general population from six genome wide studies drawn from a white population (total n = 10,578). We constructed weighted genotype risk scores (GRSs) for four blood lipid phenotypes (high-density lipoprotein cholesterol [HDL-c], low-density lipoprotein cholesterol [LDL-c], triglycerides, and total cholesterol) using well-established SNPs in 157 loci for blood lipids reported by Willer and colleagues (2013). Both full GRSs using all SNPs associated with each trait at p<5×10−8 and trait specific scores using SNPs associated exclusively with each trait at p<5×10−8 were developed. We used logistic regression to investigate whether the GRSs were associated with LOAD in each study and results were combined together by meta-analysis. We found no association between any of the full GRSs and LOAD (meta-analysis results: odds ratio [OR] = 1.005, 95% CI 0.82–1.24, p = 0.962 per 1 unit increase in HDL-c; OR = 0.901, 95% CI 0.65–1.25, p = 0.530 per 1 unit increase in LDL-c; OR = 1.104, 95% CI 0.89–1.37, p = 0.362 per 1 unit increase in triglycerides; and OR = 0.954, 95% CI 0.76–1.21, p = 0.688 per 1 unit increase in total cholesterol). Results for the trait specific scores were similar; however, the trait specific scores explained much smaller phenotypic variance.
Conclusions
Genetic predisposition to increased blood cholesterol and triglyceride lipid levels is not associated with elevated LOAD risk. The observed epidemiological associations between abnormal lipid levels and LOAD risk could therefore be attributed to the result of biological pleiotropy or could be secondary to LOAD. Limitations of this study include the small proportion of lipid variance explained by the GRS, biases in case-control ascertainment, and the limitations implicit to Mendelian randomization studies. Future studies should focus on larger LOAD datasets with longitudinal sampled peripheral lipid measures and other markers of lipid metabolism, which have been shown to be altered in LOAD.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Currently, about 44 million people worldwide have dementia, a group of brain disorders characterized by an irreversible decline in memory, communication, and other “cognitive” functions. Dementia mainly affects older people and, because people are living longer, experts estimate that more than 135 million people will have dementia by 2050. The commonest form of dementia is Alzheimer disease. In this type of dementia, protein clumps called plaques and neurofibrillary tangles form in the brain and cause its degeneration. The earliest sign of Alzheimer disease is usually increasing forgetfulness. As the disease progresses, affected individuals gradually lose their ability to deal with normal daily activities such as dressing. They may become anxious or aggressive or begin to wander. They may also eventually lose control of their bladder and of other physical functions. At present, there is no cure for Alzheimer disease although some of its symptoms can be managed with drugs. Most people with the disease are initially cared for at home by relatives and other unpaid carers, but many patients end their days in a care home or specialist nursing home.
Why Was This Study Done?
Several lines of evidence suggest that lipid metabolism (how the body handles cholesterol and other fats) is altered in patients whose Alzheimer disease develops after the age of 60 years (late onset Alzheimer disease, LOAD). In particular, epidemiological studies (observational investigations that examine the patterns and causes of disease in populations) have found an association between high amounts of cholesterol in the blood in midlife and the risk of LOAD. However, observational studies cannot prove that abnormal lipid metabolism (dyslipidemia) causes LOAD. People with dyslipidemia may share other characteristics that cause both dyslipidemia and LOAD (confounding) or LOAD might actually cause dyslipidemia (reverse causation). Here, the researchers use “Mendelian randomization” to examine whether lifetime changes in lipid metabolism caused by genes have a causal impact on LOAD risk. In Mendelian randomization, causality is inferred from associations between genetic variants that mimic the effect of a modifiable risk factor and the outcome of interest. Because gene variants are inherited randomly, they are not prone to confounding and are free from reverse causation. So, if dyslipidemia causes LOAD, genetic variants that affect lipid metabolism should be associated with an altered risk of LOAD.
What Did the Researchers Do and Find?
The researchers investigated whether genetic predisposition to raised lipid levels increased the risk of LOAD in 10,578 participants (3,914 patients with LOAD, 1,675 elderly people without LOAD, and 4,989 population controls) using data collected in six genome wide studies looking for gene variants associated with Alzheimer disease. The researchers constructed a genotype risk score (GRS) for each participant using genetic risk markers for four types of blood lipids on the basis of the presence of single nucleotide polymorphisms (SNPs, a type of gene variant) in their DNA. When the researchers used statistical methods to investigate the association between the GRS and LOAD among all the study participants, they found no association between the GRS and LOAD.
What Do These Findings Mean?
These findings suggest that the genetic predisposition to raised blood levels of four types of lipid is not causally associated with LOAD risk. The accuracy of this finding may be affected by several limitations of this study, including the small proportion of lipid variance explained by the GRS and the validity of several assumptions that underlie all Mendelian randomization studies. Moreover, because all the participants in this study were white, these findings may not apply to people of other ethnic backgrounds. Given their findings, the researchers suggest that the observed epidemiological associations between abnormal lipid levels in the blood and variation in lipid levels for reasons other than genetics, or to LOAD risk could be secondary to variation in lipid levels for reasons other than genetics, or to LOAD, a possibility that can be investigated by studying blood lipid levels and other markers of lipid metabolism over time in large groups of patients with LOAD. Importantly, however, these findings provide new information about the role of lipids in LOAD development that may eventually lead to new therapeutic and public-health interventions for Alzheimer disease.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001713.
The UK National Health Service Choices website provides information (including personal stories) about Alzheimer's disease
The UK not-for-profit organization Alzheimer's Society provides information for patients and carers about dementia, including personal experiences of living with Alzheimer's disease
The US not-for-profit organization Alzheimer's Association also provides information for patients and carers about dementia and personal stories about dementia
Alzheimer's Disease International is the international federation of Alzheimer disease associations around the world; it provides links to individual associations, information about dementia, and links to World Alzheimer Reports
MedlinePlus provides links to additional resources about Alzheimer's disease (in English and Spanish)
Wikipedia has a page on Mendelian randomization (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
doi:10.1371/journal.pmed.1001713
PMCID: PMC4165594  PMID: 25226301
6.  Methods for adjusting population structure and familial relatedness in association test for collective effect of multiple rare variants on quantitative traits 
BMC Proceedings  2011;5(Suppl 9):S35.
Because of the low frequency of rare genetic variants in observed data, the statistical power of detecting their associations with target traits is usually low. The collapsing test of collective effect of multiple rare variants is an important and useful strategy to increase the power; in addition, family data may be enriched with causal rare variants and therefore provide extra power. However, when family data are used, both population structure and familial relatedness need to be adjusted for the possible inflation of false positives. Using a unified mixed linear model and family data, we compared six methods to detect the association between multiple rare variants and quantitative traits. Through the analysis of 200 replications of the quantitative trait Q2 from the Genetic Analysis Workshop 17 data set simulated for 697 subjects from 8 extended families, and based on quantile-quantile plots under the null and receiver operating characteristic curves, we compared the false-positive rate and power of these methods. We observed that adjusting for pedigree-based kinship gives the best control for false-positive rate, whereas adjusting for marker-based identity by state slightly outperforms in terms of power. An adjustment based on a principal components analysis slightly improves the false-positive rate and power. Taking into account type-1 error, power, and computational efficiency, we find that adjusting for pedigree-based kinship seems to be a good choice for the collective test of association between multiple rare variants and quantitative traits using family data.
doi:10.1186/1753-6561-5-S9-S35
PMCID: PMC3287871  PMID: 22373066
7.  Self-reported Ethnicity, Genetic Structure and the Impact of Population Stratification in a Multiethnic Study 
Human genetics  2010;128(2):165-177.
It is well-known that population substructure may lead to confounding in case-control association studies. Here, we examined genetic structure in a large racially and ethnically diverse sample consisting of 5 ethnic groups of the Multiethnic Cohort study (African Americans, Japanese Americans, Latinos, European Americans and Native Hawaiians) using 2,509 SNPs distributed across the genome. Principal component analysis on 6,213 study participants, 18 Native Americans and 11 HapMap III populations revealed 4 important principal components (PCs): the first two separated Asians, Europeans and Africans, and the third and fourth corresponded to Native American and Native Hawaiian (Polynesian) ancestry, respectively. Individual ethnic composition derived from self-reported parental information matched well to genetic ancestry for Japanese and European Americans. STRUCTURE-estimated individual ancestral proportions for African Americans and Latinos are consistent with previous reports. We quantified the East Asian (mean 27%), European (mean 27%) and Polynesian (mean 46%) ancestral proportions for the first time, to our knowledge, for Native Hawaiians. Simulations based on realistic settings of case-control studies nested in the Multiethnic Cohort found that the effect of population stratification was modest and readily corrected by adjusting for race/ethnicity or by adjusting for top PCs derived from all SNPs or from ancestry informative markers; the power of these approaches was similar when averaged across causal variants simulated based on allele frequencies of the 2,509 genotyped markers. The bias may be large in case-only analysis of gene by gene interactions but it can be corrected by top PCs derived from all SNPs.
doi:10.1007/s00439-010-0841-4
PMCID: PMC3057055  PMID: 20499252
AIMs; African American; Native Hawaiian; Latino; admixture; principal component analysis
8.  Meta-Analysis of Genome-Wide Association Studies in African Americans Provides Insights into the Genetic Architecture of Type 2 Diabetes 
Ng, Maggie C. Y. | Shriner, Daniel | Chen, Brian H. | Li, Jiang | Chen, Wei-Min | Guo, Xiuqing | Liu, Jiankang | Bielinski, Suzette J. | Yanek, Lisa R. | Nalls, Michael A. | Comeau, Mary E. | Rasmussen-Torvik, Laura J. | Jensen, Richard A. | Evans, Daniel S. | Sun, Yan V. | An, Ping | Patel, Sanjay R. | Lu, Yingchang | Long, Jirong | Armstrong, Loren L. | Wagenknecht, Lynne | Yang, Lingyao | Snively, Beverly M. | Palmer, Nicholette D. | Mudgal, Poorva | Langefeld, Carl D. | Keene, Keith L. | Freedman, Barry I. | Mychaleckyj, Josyf C. | Nayak, Uma | Raffel, Leslie J. | Goodarzi, Mark O. | Chen, Y-D Ida | Taylor, Herman A. | Correa, Adolfo | Sims, Mario | Couper, David | Pankow, James S. | Boerwinkle, Eric | Adeyemo, Adebowale | Doumatey, Ayo | Chen, Guanjie | Mathias, Rasika A. | Vaidya, Dhananjay | Singleton, Andrew B. | Zonderman, Alan B. | Igo, Robert P. | Sedor, John R. | Kabagambe, Edmond K. | Siscovick, David S. | McKnight, Barbara | Rice, Kenneth | Liu, Yongmei | Hsueh, Wen-Chi | Zhao, Wei | Bielak, Lawrence F. | Kraja, Aldi | Province, Michael A. | Bottinger, Erwin P. | Gottesman, Omri | Cai, Qiuyin | Zheng, Wei | Blot, William J. | Lowe, William L. | Pacheco, Jennifer A. | Crawford, Dana C. | Grundberg, Elin | Rich, Stephen S. | Hayes, M. Geoffrey | Shu, Xiao-Ou | Loos, Ruth J. F. | Borecki, Ingrid B. | Peyser, Patricia A. | Cummings, Steven R. | Psaty, Bruce M. | Fornage, Myriam | Iyengar, Sudha K. | Evans, Michele K. | Becker, Diane M. | Kao, W. H. Linda | Wilson, James G. | Rotter, Jerome I. | Sale, Michèle M. | Liu, Simin | Rotimi, Charles N. | Bowden, Donald W.
PLoS Genetics  2014;10(8):e1004517.
Type 2 diabetes (T2D) is more prevalent in African Americans than in Europeans. However, little is known about the genetic risk in African Americans despite the recent identification of more than 70 T2D loci primarily by genome-wide association studies (GWAS) in individuals of European ancestry. In order to investigate the genetic architecture of T2D in African Americans, the MEta-analysis of type 2 DIabetes in African Americans (MEDIA) Consortium examined 17 GWAS on T2D comprising 8,284 cases and 15,543 controls in African Americans in stage 1 analysis. Single nucleotide polymorphisms (SNPs) association analysis was conducted in each study under the additive model after adjustment for age, sex, study site, and principal components. Meta-analysis of approximately 2.6 million genotyped and imputed SNPs in all studies was conducted using an inverse variance-weighted fixed effect model. Replications were performed to follow up 21 loci in up to 6,061 cases and 5,483 controls in African Americans, and 8,130 cases and 38,987 controls of European ancestry. We identified three known loci (TCF7L2, HMGA2 and KCNQ1) and two novel loci (HLA-B and INS-IGF2) at genome-wide significance (4.15×10−94
Author Summary
Despite the higher prevalence of type 2 diabetes (T2D) in African Americans than in Europeans, recent genome-wide association studies (GWAS) were examined primarily in individuals of European ancestry. In this study, we performed meta-analysis of 17 GWAS in 8,284 cases and 15,543 controls to explore the genetic architecture of T2D in African Americans. Following replication in additional 6,061 cases and 5,483 controls in African Americans, and 8,130 cases and 38,987 controls of European ancestry, we identified two novel and three previous reported T2D loci reaching genome-wide significance. We also examined 158 loci previously reported to be associated with T2D or regulating glucose homeostasis. While 56% of these loci were shared between African Americans and the other populations, the strongest associations in African Americans are often found in nearby single nucleotide polymorphisms (SNPs) instead of the original SNPs reported in other populations due to differential genetic architecture across populations. Our results highlight the importance of performing genetic studies in non-European populations to fine map the causal genetic variants.
doi:10.1371/journal.pgen.1004517
PMCID: PMC4125087  PMID: 25102180
BMC Proceedings  2011;5(Suppl 9):S66.
Statistical tests on rare variant data may well have type I error rates that differ from their nominal levels. Here, we use the Genetic Analysis Workshop 17 data to estimate type I error rates and powers of three models for identifying rare variants associated with a phenotype: (1) by using the number of minor alleles, age, and smoking status as predictor variables; (2) by using the number of minor alleles, age, smoking status, and the identity of the population of the subject as predictor variables; and (3) by using the number of minor alleles, age, smoking status, and ancestry adjustment using 10 principal component scores. We studied both quantitative phenotype and a dichotomized phenotype. The model with principal component adjustment has type I error rates that are closer to the nominal level of significance of 0.05 for single-nucleotide polymorphisms (SNPs) in noncausal genes for the selected phenotype than the model directly adjusting for population. The principal component adjustment model type I error rates are also closer to the nominal level of 0.05 for noncausal SNPs located in causal genes for the phenotype. The power for causal SNPs with the principal component adjustment model is comparable to the power of the other methods. The power using the underlying quantitative phenotype is greater than the power using the dichotomized phenotype.
doi:10.1186/1753-6561-5-S9-S66
PMCID: PMC3287905  PMID: 22373457
PLoS ONE  2011;6(7):e21591.
Understanding the role of genetic variation in human diseases remains an important problem to be solved in genomics. An important component of such variation consist of variations at single sites in DNA, or single nucleotide polymorphisms (SNPs). Typically, the problem of associating particular SNPs to phenotypes has been confounded by hidden factors such as the presence of population structure, family structure or cryptic relatedness in the sample of individuals being analyzed. Such confounding factors lead to a large number of spurious associations and missed associations. Various statistical methods have been proposed to account for such confounding factors such as linear mixed-effect models (LMMs) or methods that adjust data based on a principal components analysis (PCA), but these methods either suffer from low power or cease to be tractable for larger numbers of individuals in the sample. Here we present a statistical model for conducting genome-wide association studies (GWAS) that accounts for such confounding factors. Our method scales in runtime quadratic in the number of individuals being studied with only a modest loss in statistical power as compared to LMM-based and PCA-based methods when testing on synthetic data that was generated from a generalized LMM. Applying our method to both real and synthetic human genotype/phenotype data, we demonstrate the ability of our model to correct for confounding factors while requiring significantly less runtime relative to LMMs. We have implemented methods for fitting these models, which are available at http://www.microsoft.com/science.
doi:10.1371/journal.pone.0021591
PMCID: PMC3134455  PMID: 21765897
PLoS Medicine  2014;11(10):e1001751.
In this study, Richards and colleagues undertook a Mendelian randomization study to determine whether vitamin D binding protein (DBP) levels have a causal effect on common calcemic and cardiometabolic diseases. They concluded that DBP has no demonstrable causal effect on any of the diseases or traits investigated here, except Vit D levels.
Please see later in the article for the Editors' Summary
Background
Observational studies have shown that vitamin D binding protein (DBP) levels, a key determinant of 25-hydroxy-vitamin D (25OHD) levels, and 25OHD levels themselves both associate with risk of disease. If 25OHD levels have a causal influence on disease, and DBP lies in this causal pathway, then DBP levels should likewise be causally associated with disease. We undertook a Mendelian randomization study to determine whether DBP levels have causal effects on common calcemic and cardiometabolic disease.
Methods and Findings
We measured DBP and 25OHD levels in 2,254 individuals, followed for up to 10 y, in the Canadian Multicentre Osteoporosis Study (CaMos). Using the single nucleotide polymorphism rs2282679 as an instrumental variable, we applied Mendelian randomization methods to determine the causal effect of DBP on calcemic (osteoporosis and hyperparathyroidism) and cardiometabolic diseases (hypertension, type 2 diabetes, coronary artery disease, and stroke) and related traits, first in CaMos and then in large-scale genome-wide association study consortia. The effect allele was associated with an age- and sex-adjusted decrease in DBP level of 27.4 mg/l (95% CI 24.7, 30.0; n = 2,254). DBP had a strong observational and causal association with 25OHD levels (p = 3.2×10−19). While DBP levels were observationally associated with calcium and body mass index (BMI), these associations were not supported by causal analyses. Despite well-powered sample sizes from consortia, there were no associations of rs2282679 with any other traits and diseases: fasting glucose (0.00 mmol/l [95% CI −0.01, 0.01]; p = 1.00; n = 46,186); fasting insulin (0.01 pmol/l [95% CI −0.00, 0.01,]; p = 0.22; n = 46,186); BMI (0.00 kg/m2 [95% CI −0.01, 0.01]; p = 0.80; n = 127,587); bone mineral density (0.01 g/cm2 [95% CI −0.01, 0.03]; p = 0.36; n = 32,961); mean arterial pressure (−0.06 mm Hg [95% CI −0.19, 0.07]); p = 0.36; n = 28,775); ischemic stroke (odds ratio [OR] = 1.00 [95% CI 0.97, 1.04]; p = 0.92; n = 12,389/62,004 cases/controls); coronary artery disease (OR = 1.02 [95% CI 0.99, 1.05]; p = 0.31; n = 22,233/64,762); or type 2 diabetes (OR = 1.01 [95% CI 0.97, 1.05]; p = 0.76; n = 9,580/53,810).
Conclusions
DBP has no demonstrable causal effect on any of the diseases or traits investigated here, except 25OHD levels. It remains to be determined whether 25OHD has a causal effect on these outcomes independent of DBP.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Vitamin D deficiency is an increasingly common public health concern. According to some estimates, more than a billion people worldwide may be vitamin D deficient. Indeed, many people living in the US and Europe (in particular, elderly people, breastfed infants, people with dark skin, and obese individuals) have serum (circulating) 25-hydroxy-vitamin D (25OHD) levels below 50 nmol/l, the threshold for vitamin D deficiency. Vitamin D helps the body absorb calcium, a mineral that is essential for healthy bones. Consequently, vitamin D deficiency can lead to calcemic diseases such as rickets (a condition that affects bone development in children), osteomalacia (soft bones in adults), and osteoporosis (a condition in which the bones weaken and become susceptible to fracture). We get most of our vitamin D needs from our skin, which makes vitamin D after exposure to sunlight. Vitamin D is also found naturally in oily fish and eggs, and is added to some other foods, including cereals and milk, but some people need to take vitamin D supplements to avoid vitamin D deficiency.
Why Was This Study Done?
Observational studies have reported that the low levels of serum 25OHD and serum vitamin D binding protein (DBP, a key determinant of serum 25OHD level) are both associated with the risk of several common diseases and traits. Such studies have implicated vitamin D deficiency in cardiometabolic disease (cardiovascular diseases that affect the heart and/or blood vessels and metabolic diseases that affect the cellular chemical reactions needed to sustain life), in some cancers, and in Alzheimer disease. But observational studies cannot prove that vitamin D deficiency or DBP levels actually cause any of these diseases. So, for example, an observational study might report an association between vitamin D deficiency and type 2 diabetes (a metabolic disease), but the individuals who develop type 2 diabetes might share another unknown characteristic that is actually responsible for disease development (a confounding factor). Alternatively, type 2 diabetes might reduce circulating vitamin D levels (reverse causation). Here, the researchers undertake a Mendelian randomization study to determine whether circulating DBP levels have causal effects on calcemic and cardiometabolic diseases. In Mendelian randomization, causality is inferred from associations between genetic variants that mimic the influence of a modifiable environmental exposure and the outcome of interest. Because gene variants are inherited randomly, they are not prone to confounding and are free from reverse causation. So, if low DBP levels lead to low serum 25OHD levels, and vitamin D levels have a causal effect on common diseases, genetic variants associated with low DBP levels should be associated with the development of common diseases.
What Did the Researchers Do and Find?
The researchers analyzed the association between a genetic variant called single nucleotide polymorphism (SNP) rs2282679, which is known to alter DBP levels, and calcemic and cardiometabolic diseases and related traits in 2,254 participants in the Canadian Multicentre Osteoporosis Study (CaMos). The researchers report that there was a strong association between SNP rs2282679 and both serum DBP and 25OHD levels among the CaMos participants. However, there were no significant associations (associations unlikely to have occurred by chance) between SNP rs2282679 and calcium level, osteoporosis, or several cardiometabolic diseases, including heart attacks and diabetes. Moreover, when the researchers examined publically available genome-wide association study data collected by several international consortia investigating genetic influences on disease, they found no significant associations between rs2282679 and a wide range of calcemic and cardiometabolic diseases.
What Do These Findings Mean?
In this Mendelian randomization study, DBP level had no demonstrable causal effect on any of the calcemic or cardiometabolic diseases or traits investigated, except 25OHD level. Because most of the participants in CaMos and the international consortia were of European descent, these findings are applicable only to people of European ancestry. Moreover, like all Mendelian randomization studies, the reliability of these findings depends on several assumptions made by the researchers. Notably, although this study strongly suggests that DBP level does not have a causal influence on several common diseases, it remains to be determined whether 25OHD has a causal effect on any calcemic or cardiometabolic outcomes independent of DBP level.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001751.
The UK National Health Service Choices website provides information about vitamin D and about how to get vitamin D from sunshine; “Behind the Headlines” articles describe a recent observational study that reported an association between vitamin D deficiency and Alzheimer disease and the media coverage of this study, other health claims made for vitamin D, and a randomized control trial that questioned the role of vitamin D in disease
The US National Institutes of Health Office of Dietary Supplements provides information about vitamin D (in English and Spanish)
The US Centers for Disease Control and Prevention provides information about the vitamin D status of the US population
MedlinePlus has links to further information about vitamin D (in English and Spanish)
Information about the Canadian Multicentre Osteoporosis Study is available
Wikipedia has a page on Mendelian randomization (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
doi:10.1371/journal.pmed.1001751
PMCID: PMC4211663  PMID: 25350643
PLoS Medicine  2011;8(10):e1001112.
Using mendelian randomization, Roman Pfister and colleagues demonstrate a potentially causal link between low levels of B-type natriuretic peptide (BNP), a hormone released by damaged hearts, and the development of type 2 diabetes.
Background
Genetic and epidemiological evidence suggests an inverse association between B-type natriuretic peptide (BNP) levels in blood and risk of type 2 diabetes (T2D), but the prospective association of BNP with T2D is uncertain, and it is unclear whether the association is confounded.
Methods and Findings
We analysed the association between levels of the N-terminal fragment of pro-BNP (NT-pro-BNP) in blood and risk of incident T2D in a prospective case-cohort study and genotyped the variant rs198389 within the BNP locus in three T2D case-control studies. We combined our results with existing data in a meta-analysis of 11 case-control studies. Using a Mendelian randomization approach, we compared the observed association between rs198389 and T2D to that expected from the NT-pro-BNP level to T2D association and the NT-pro-BNP difference per C allele of rs198389. In participants of our case-cohort study who were free of T2D and cardiovascular disease at baseline, we observed a 21% (95% CI 3%–36%) decreased risk of incident T2D per one standard deviation (SD) higher log-transformed NT-pro-BNP levels in analysis adjusted for age, sex, body mass index, systolic blood pressure, smoking, family history of T2D, history of hypertension, and levels of triglycerides, high-density lipoprotein cholesterol, and low-density lipoprotein cholesterol. The association between rs198389 and T2D observed in case-control studies (odds ratio = 0.94 per C allele, 95% CI 0.91–0.97) was similar to that expected (0.96, 0.93–0.98) based on the pooled estimate for the log-NT-pro-BNP level to T2D association derived from a meta-analysis of our study and published data (hazard ratio = 0.82 per SD, 0.74–0.90) and the difference in NT-pro-BNP levels (0.22 SD, 0.15–0.29) per C allele of rs198389. No significant associations were observed between the rs198389 genotype and potential confounders.
Conclusions
Our results provide evidence for a potential causal role of the BNP system in the aetiology of T2D. Further studies are needed to investigate the mechanisms underlying this association and possibilities for preventive interventions.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Worldwide, nearly 250 million people have diabetes, and this number is increasing rapidly. Diabetes is characterized by dangerous amounts of sugar (glucose) in the blood. Blood sugar levels are normally controlled by insulin, a hormone that the pancreas releases after meals (digestion of food produces glucose). In people with type 2 diabetes (the most common form of diabetes), blood sugar control fails because the fat and muscle cells that usually respond to insulin by removing sugar from the blood become insulin resistant. Type 2 diabetes can be controlled with diet and exercise, and with drugs that help the pancreas make more insulin or that make cells more sensitive to insulin. The long-term complications of diabetes, which include kidney failure and an increased risk of cardiovascular problems such as heart disease and stroke, reduce the life expectancy of people with diabetes by about 10 years compared to people without diabetes.
Why Was This Study Done?
Because the causes of type 2 diabetes are poorly understood, it is hard to devise ways to prevent the condition. Recently, B-type natriuretic peptide (BNP, a hormone released by damaged hearts) has been implicated in type 2 diabetes development in cross-sectional studies (investigations in which data are collected at a single time point from a population to look for associations between an illness and potential risk factors). Although these studies suggest that high levels of BNP may protect against type 2 diabetes, they cannot prove a causal link between BNP levels and diabetes because the study participants with low BNP levels may share some another unknown factor (a confounding factor) that is the real cause of both diabetes and altered BNP levels. Here, the researchers use an approach called “Mendelian randomization” to examine whether reduced BNP levels contribute to causing type 2 diabetes. It is known that a common genetic variant (rs198389) within the genome region that encodes BNP is associated with a reduced risk of type 2 diabetes. Because gene variants are inherited randomly, they are not subject to confounding. So, by investigating the association between BNP gene variants that alter NT-pro-BNP (a molecule created when BNP is being produced) levels and the development of type 2 diabetes, the researchers can discover whether BNP is causally involved in this chronic condition.
What Did the Researchers Do and Find?
The researchers analyzed the association between blood levels of NT-pro-BNP at baseline in 440 participants of the EPIC-Norfolk study (a prospective population-based study of lifestyle factors and the risk of chronic diseases) who subsequently developed diabetes and in 740 participants who did not develop diabetes. In this prospective case-cohort study, the risk of developing type 2 diabetes was associated with lower NT-pro-BNP levels. They also genotyped (sequenced) rs198389 in the participants of three case-control studies of type 2 diabetes (studies in which potential risk factors for type 2 diabetes were examined in people with type 2 diabetes and matched controls living in the East of England), and combined these results with those of eight similar published case-control studies. Finally, the researchers showed that the association between rs198389 and type 2 diabetes measured in the case-control studies was similar to the expected association calculated from the association between NT-pro-BNP level and type 2 diabetes obtained from the prospective case-cohort study and the association between rs198389 and BNP levels obtained from the EPIC-Norfolk study and other published studies.
What Do These Findings Mean?
The results of this Mendelian randomization study provide evidence for a causal, protective role of the BNP hormone system in the development of type 2 diabetes. That is, these findings suggest that low levels of BNP are partly responsible for the development of type 2 diabetes. Because the participants in all the individual studies included in this analysis were of European descent, these findings may not be generalizable to other ethnicities. Moreover, they provide no explanation of how alterations in the BNP hormone system might affect the development of type 2 diabetes. Nevertheless, the demonstration of a causal link between the BNP hormone system and type 2 diabetes suggests that BNP may be a potential target for interventions designed to prevent type 2 diabetes, particularly since the feasibility of altering BNP levels with drugs has already been proven in patients with cardiovascular disease.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001112.
The International Diabetes Federation provides information about all aspects of diabetes
The US National Diabetes Information Clearinghouse provides detailed information about diabetes for patients, health-care professionals, and the general public (in English and Spanish)
The UK National Health Service Choices website also provides information for patients and carers about type 2 diabetes and includes people's stories about diabetes
MedlinePlus provides links to further resources and advice about diabetes (in English and Spanish)
Wikipedia has pages on BNP and on Mendelian randomization (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
The charity Healthtalkonline has interviews with people about their experiences of diabetes; the charity Diabetes UK has a further selection of stories from people with diabetes
doi:10.1371/journal.pmed.1001112
PMCID: PMC3201934  PMID: 22039354
Objective
Thrombosis is a serious complication of systemic lupus erythematosus (SLE). Studies that have investigated the genetics of thrombosis in SLE are limited. We undertook this study to assess the association of previously implicated candidate genes, particularly Toll-like receptor (TLR) genes, with pathogenesis of thrombosis.
Methods
We genotyped 3,587 SLE patients from 3 multiethnic populations for 77 single-nucleotide polymorphisms (SNPs) in 10 genes, primarily in TLRs 2, 4, 7, and 9, and we also genotyped 64 ancestry-informative markers (AIMs). We first analyzed association with arterial and venous thrombosis in the combined population via logistic regression, adjusting for top principal components of the AIMs and other covariates. We also subjected an associated SNP, rs893629, to meta-analysis (after stratification by ethnicity and study population) to confirm the association and to test for study population or ethnicity effects.
Results
In the combined analysis, the SNP rs893629 in the KIAA0922/TLR2 region was significantly associated with arterial thrombosis (logistic P = 6.4 × 10−5, false discovery rate P = 0.0044). Two additional SNPs in TLR2 were also suggestive: rs1816702 (logistic P = 0.002) and rs4235232 (logistic P = 0.009). In the meta-analysis by study population, the odds ratio (OR) for arterial thrombosis with rs893629 was 2.44 (95% confidence interval 1.58–3.76), without evidence for heterogeneity (P = 0.78). By ethnicity, the effect was most significant among African Americans (OR 2.42, P = 3.5 × 10−4) and European Americans (OR 3.47, P = 0.024).
Conclusion
TLR2 gene variation is associated with thrombosis in SLE, particularly among African Americans and European Americans. There was no evidence of association among Hispanics, and results in Asian Americans were limited due to insufficient sample size. These results may help elucidate the pathogenesis of this important clinical manifestation.
doi:10.1002/art.38520
PMCID: PMC4269184  PMID: 24578102
PLoS Medicine  2007;4(4):e125.
Background
The epidermal growth factor receptor (EGFR) gene is the prototype member of the type I receptor tyrosine kinase (TK) family and plays a pivotal role in cell proliferation and differentiation. There are three well described polymorphisms that are associated with increased protein production in experimental systems: a polymorphic dinucleotide repeat (CA simple sequence repeat 1 [CA-SSR1]) in intron one (lower number of repeats) and two single nucleotide polymorphisms (SNPs) in the promoter region, −216 (G/T or T/T) and −191 (C/A or A/A). The objective of this study was to examine distributions of these three polymorphisms and their relationships to each other and to EGFR gene mutations and allelic imbalance (AI) in non-small cell lung cancers.
Methods and Findings
We examined the frequencies of the three polymorphisms of EGFR in 556 resected lung cancers and corresponding non-malignant lung tissues from 336 East Asians, 213 individuals of Northern European descent, and seven of other ethnicities. We also studied the EGFR gene in 93 corresponding non-malignant lung tissue samples from European-descent patients from Italy and in peripheral blood mononuclear cells from 250 normal healthy US individuals enrolled in epidemiological studies including individuals of European descent, African–Americans, and Mexican–Americans. We sequenced the four exons (18–21) of the TK domain known to harbor activating mutations in tumors and examined the status of the CA-SSR1 alleles (presence of heterozygosity, repeat number of the alleles, and relative amplification of one allele) and allele-specific amplification of mutant tumors as determined by a standardized semiautomated method of microsatellite analysis. Variant forms of SNP −216 (G/T or T/T) and SNP −191 (C/A or A/A) (associated with higher protein production in experimental systems) were less frequent in East Asians than in individuals of other ethnicities (p < 0.001). Both alleles of CA-SSR1 were significantly longer in East Asians than in individuals of other ethnicities (p < 0.001). Expression studies using bronchial epithelial cultures demonstrated a trend towards increased mRNA expression in cultures having the variant SNP −216 G/T or T/T genotypes. Monoallelic amplification of the CA-SSR1 locus was present in 30.6% of the informative cases and occurred more often in individuals of East Asian ethnicity. AI was present in 44.4% (95% confidence interval: 34.1%–54.7%) of mutant tumors compared with 25.9% (20.6%–31.2%) of wild-type tumors (p = 0.002). The shorter allele in tumors with AI in East Asian individuals was selectively amplified (shorter allele dominant) more often in mutant tumors (75.0%, 61.6%–88.4%) than in wild-type tumors (43.5%, 31.8%–55.2%, p = 0.003). In addition, there was a strong positive association between AI ratios of CA-SSR1 alleles and AI of mutant alleles.
Conclusions
The three polymorphisms associated with increased EGFR protein production (shorter CA-SSR1 length and variant forms of SNPs −216 and −191) were found to be rare in East Asians as compared to other ethnicities, suggesting that the cells of East Asians may make relatively less intrinsic EGFR protein. Interestingly, especially in tumors from patients of East Asian ethnicity, EGFR mutations were found to favor the shorter allele of CA-SSR1, and selective amplification of the shorter allele of CA-SSR1 occurred frequently in tumors harboring a mutation. These distinct molecular events targeting the same allele would both be predicted to result in greater EGFR protein production and/or activity. Our findings may help explain to some of the ethnic differences observed in mutational frequencies and responses to TK inhibitors.
Masaharu Nomura and colleagues examine the distribution ofEGFR polymorphisms in different populations and find differences that might explain different responses to tyrosine kinase inhibitors in lung cancer patients.
Editors' Summary
Background.
Most cases of lung cancer—the leading cause of cancer deaths worldwide—are “non-small cell lung cancer” (NSCLC), which has a very low cure rate. Recently, however, “targeted” therapies have brought new hope to patients with NSCLC. Like all cancers, NSCLC occurs when cells begin to divide uncontrollably because of changes (mutations) in their genetic material. Chemotherapy drugs treat cancer by killing these rapidly dividing cells, but, because some normal tissues are sensitive to these agents, it is hard to kill the cancer completely without causing serious side effects. Targeted therapies specifically attack the changes in cancer cells that allow them to divide uncontrollably, so it might be possible to kill the cancer cells selectively without damaging normal tissues. Epidermal growth factor receptor (EGRF) was one of the first molecules for which a targeted therapy was developed. In normal cells, messenger proteins bind to EGFR and activate its “tyrosine kinase,” an enzyme that sticks phosphate groups on tyrosine (an amino acid) in other proteins. These proteins then tell the cell to divide. Alterations to this signaling system drive the uncontrolled growth of some cancers, including NSCLC.
Why Was This Study Done?
Molecules that inhibit the tyrosine kinase activity of EGFR (for example, gefitinib) dramatically shrink some NSCLCs, particularly those in East Asian patients. Tumors shrunk by tyrosine kinase inhibitors (TKIs) often (but not always) have mutations in EGFR's tyrosine kinase. However, not all tumors with these mutations respond to TKIs, and other genetic changes—for example, amplification (multiple copies) of the EGFR gene—also affect tumor responses to TKIs. It would be useful to know which genetic changes predict these responses when planning treatments for NSCLC and to understand why the frequency of these changes varies between ethnic groups. In this study, the researchers have examined three polymorphisms—differences in DNA sequences that occur between individuals—in the EGFR gene in people with and without NSCLC. In addition, they have looked for associations between these polymorphisms, which are present in every cell of the body, and the EGFR gene mutations and allelic imbalances (genes occur in pairs but amplification or loss of one copy, or allele, often causes allelic imbalance in tumors) that occur in NSCLCs.
What Did the Researchers Do and Find?
The researchers measured how often three EGFR polymorphisms (the length of a repeat sequence called CA-SSR1, and two single nucleotide variations [SNPs])—all of which probably affect how much protein is made from the EGFR gene—occurred in normal tissue and NSCLC tissue from East Asians and individuals of European descent. They also looked for mutations in the EGFR tyrosine kinase and allelic imbalance in the tumors, and then determined which genetic variations and alterations tended to occur together in people with the same ethnicity. Among many associations, the researchers found that shorter alleles of CA-SSR1 and the minor forms of the two SNPs occurred less often in East Asians than in individuals of European descent. They also confirmed that EGFR kinase mutations were more common in NSCLCs in East Asians than in European-descent individuals. Furthermore, mutations occurred more often in tumors with allelic imbalance, and in tumors where there was allelic imbalance and an EGFR mutation, the mutant allele was amplified more often than the wild-type allele.
What Do These Findings Mean?
The researchers use these associations between gene variants and tumor-associated alterations to propose a model to explain the ethnic differences in mutational frequencies and responses to TKIs seen in NSCLC. They suggest that because of the polymorphisms in the EGFR gene commonly seen in East Asians, people from this ethnic group make less EGFR protein than people from other ethnic groups. This would explain why, if a threshold level of EGFR is needed to drive cells towards malignancy, East Asians have a high frequency of amplified EGFR tyrosine kinase mutations in their tumors—mutation followed by amplification would be needed to activate EGFR signaling. This model, though speculative, helps to explain some clinical findings, such as the frequency of EGFR mutations and of TKI sensitivity in NSCLCs in East Asians. Further studies of this type in different ethnic groups and in different tumors, as well as with other genes for which targeted therapies are available, should help oncologists provide personalized cancer therapies for their patients.
Additional Information.
Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.0040125.
US National Cancer Institute information on lung cancer and on cancer treatment for patients and professionals
MedlinePlus encyclopedia entries on NSCLC
Cancer Research UK information for patients about all aspects of lung cancer, including treatment with TKIs
Wikipedia pages on lung cancer, EGFR, and gefitinib (note that Wikipedia is a free online encyclopedia that anyone can edit)
doi:10.1371/journal.pmed.0040125
PMCID: PMC1876407  PMID: 17455987
BMC Proceedings  2009;3(Suppl 7):S108.
Population structure occurs when a sample is composed of individuals with different ancestries and can result in excess type I error in genome-wide association studies. Genome-wide principal-component analysis (PCA) has become a popular method for identifying and adjusting for subtle population structure in association studies. Using the Genetic Analysis Workshop 16 (GAW16) NARAC data, we explore two unresolved issues concerning the use of genome-wide PCA to account for population structure in genetic associations studies: the choice of single-nucleotide polymorphism (SNP) subset and the choice of adjustment model. We computed PCs for subsets of genome-wide SNPs with varying levels of LD. The first two PCs were similar for all subsets and the first three PCs were associated with case status for all subsets. When the PCs associated with case status were included as covariates in an association model, the reduction in genomic inflation factor was similar for all SNP sets. Several models have been proposed to account for structure using PCs, but it is not yet clear whether the different methods will result in substantively different results for association studies with individuals of European descent. We compared genome-wide association p-values and results for two positive-control SNPs previously associated with rheumatoid arthritis using four PC adjustment methods as well as no adjustment and genomic control. We found that in this sample, adjusting for the continuous PCs or adjusting for discrete clusters identified using the PCs adequately accounts for the case-control population structure, but that a recently proposed randomization test performs poorly.
PMCID: PMC2795879  PMID: 20017972
In genetic association studies, it is necessary to correct for population structure to avoid inference bias. During the past decade, prevailing corrections often only involved adjustments of global ancestry differences between sampled individuals. Nevertheless, population structure may vary across local genomic regions due to the variability of local ancestries associated with natural selection, migration, or random genetic drift. Adjusting for global ancestry alone may be inadequate when local population structure is an important confounding factor. In contrast, adjusting for local ancestry can more effectively prevent false-positives due to local population structure. To more accurately locate disease genes, we recommend adjusting for local ancestries by interrogating local structure. In practice, locus-specific ancestries are usually unknown and cannot be accurately inferred when ancestral population information is not available. For such scenarios, we propose employing local principal components (PC) to represent local ancestries and adjusting for local PCs when testing for genotype–phenotype association. With an acceptable computation burden, the proposed algorithm successfully eliminates the known spurious association between SNPs in the LCT gene and height due to the population structure in European Americans.
doi:10.1007/978-1-61779-555-8_21
PMCID: PMC3589145  PMID: 22307710
Genome-wide association studies; Local ancestries; Local principal components; Migration; Random genetic drift; Natural selection; Genomic inflation factor; Genomic control; Local ancestry principal components correction; Fine mapping
BMC Proceedings  2011;5(Suppl 9):S103.
Identifying rare variants that are responsible for complex disease has been promoted by advances in sequencing technologies. However, statistical methods that can handle the vast amount of data generated and that can interpret the complicated relationship between disease and these variants have lagged. We apply a zero-inflated Poisson regression model to take into account the excess of zeros caused by the extremely low frequency of the 24,487 exonic variants in the Genetic Analysis Workshop 17 data. We grouped the 697 subjects in the data set as Europeans, Asians, and Africans based on principal components analysis and found the total number of rare variants per gene for each individual. We then analyzed these collapsed variants based on the assumption that rare variants are enriched in a group of people affected by a disease compared to a group of unaffected people. We also tested the hypothesis with quantitative traits Q1, Q2, and Q4. Analyses performed on the combined 697 individuals and on each ethnic group yielded different results. For the combined population analysis, we found that UGT1A1, which was not part of the simulation model, was associated with disease liability and that FLT1, which was a causal locus in the simulation model, was associated with Q1. Of the causal loci in the simulation models, FLT1 and KDR were associated with Q1 and VNN1 was correlated with Q2. No significant genes were associated with Q4. These results show the feasibility and capability of our new statistical model to detect multiple rare variants influencing disease risk.
doi:10.1186/1753-6561-5-S9-S103
PMCID: PMC3287826  PMID: 22373445
Fall, Tove | Hägg, Sara | Mägi, Reedik | Ploner, Alexander | Fischer, Krista | Horikoshi, Momoko | Sarin, Antti-Pekka | Thorleifsson, Gudmar | Ladenvall, Claes | Kals, Mart | Kuningas, Maris | Draisma, Harmen H. M. | Ried, Janina S. | van Zuydam, Natalie R. | Huikari, Ville | Mangino, Massimo | Sonestedt, Emily | Benyamin, Beben | Nelson, Christopher P. | Rivera, Natalia V. | Kristiansson, Kati | Shen, Huei-yi | Havulinna, Aki S. | Dehghan, Abbas | Donnelly, Louise A. | Kaakinen, Marika | Nuotio, Marja-Liisa | Robertson, Neil | de Bruijn, Renée F. A. G. | Ikram, M. Arfan | Amin, Najaf | Balmforth, Anthony J. | Braund, Peter S. | Doney, Alexander S. F. | Döring, Angela | Elliott, Paul | Esko, Tõnu | Franco, Oscar H. | Gretarsdottir, Solveig | Hartikainen, Anna-Liisa | Heikkilä, Kauko | Herzig, Karl-Heinz | Holm, Hilma | Hottenga, Jouke Jan | Hyppönen, Elina | Illig, Thomas | Isaacs, Aaron | Isomaa, Bo | Karssen, Lennart C. | Kettunen, Johannes | Koenig, Wolfgang | Kuulasmaa, Kari | Laatikainen, Tiina | Laitinen, Jaana | Lindgren, Cecilia | Lyssenko, Valeriya | Läärä, Esa | Rayner, Nigel W. | Männistö, Satu | Pouta, Anneli | Rathmann, Wolfgang | Rivadeneira, Fernando | Ruokonen, Aimo | Savolainen, Markku J. | Sijbrands, Eric J. G. | Small, Kerrin S. | Smit, Jan H. | Steinthorsdottir, Valgerdur | Syvänen, Ann-Christine | Taanila, Anja | Tobin, Martin D. | Uitterlinden, Andre G. | Willems, Sara M. | Willemsen, Gonneke | Witteman, Jacqueline | Perola, Markus | Evans, Alun | Ferrières, Jean | Virtamo, Jarmo | Kee, Frank | Tregouet, David-Alexandre | Arveiler, Dominique | Amouyel, Philippe | Ferrario, Marco M. | Brambilla, Paolo | Hall, Alistair S. | Heath, Andrew C. | Madden, Pamela A. F. | Martin, Nicholas G. | Montgomery, Grant W. | Whitfield, John B. | Jula, Antti | Knekt, Paul | Oostra, Ben | van Duijn, Cornelia M. | Penninx, Brenda W. J. H. | Davey Smith, George | Kaprio, Jaakko | Samani, Nilesh J. | Gieger, Christian | Peters, Annette | Wichmann, H.-Erich | Boomsma, Dorret I. | de Geus, Eco J. C. | Tuomi, TiinaMaija | Power, Chris | Hammond, Christopher J. | Spector, Tim D. | Lind, Lars | Orho-Melander, Marju | Palmer, Colin Neil Alexander | Morris, Andrew D. | Groop, Leif | Järvelin, Marjo-Riitta | Salomaa, Veikko | Vartiainen, Erkki | Hofman, Albert | Ripatti, Samuli | Metspalu, Andres | Thorsteinsdottir, Unnur | Stefansson, Kari | Pedersen, Nancy L. | McCarthy, Mark I. | Ingelsson, Erik | Prokopenko, Inga
PLoS Medicine  2013;10(6):e1001474.
In this study, Prokopenko and colleagues provide novel evidence for causal relationship between adiposity and heart failure and increased liver enzymes using a Mendelian randomization study design.
Please see later in the article for the Editors' Summary
Background
The association between adiposity and cardiometabolic traits is well known from epidemiological studies. Whilst the causal relationship is clear for some of these traits, for others it is not. We aimed to determine whether adiposity is causally related to various cardiometabolic traits using the Mendelian randomization approach.
Methods and Findings
We used the adiposity-associated variant rs9939609 at the FTO locus as an instrumental variable (IV) for body mass index (BMI) in a Mendelian randomization design. Thirty-six population-based studies of individuals of European descent contributed to the analyses.
Age- and sex-adjusted regression models were fitted to test for association between (i) rs9939609 and BMI (n = 198,502), (ii) rs9939609 and 24 traits, and (iii) BMI and 24 traits. The causal effect of BMI on the outcome measures was quantified by IV estimators. The estimators were compared to the BMI–trait associations derived from the same individuals. In the IV analysis, we demonstrated novel evidence for a causal relationship between adiposity and incident heart failure (hazard ratio, 1.19 per BMI-unit increase; 95% CI, 1.03–1.39) and replicated earlier reports of a causal association with type 2 diabetes, metabolic syndrome, dyslipidemia, and hypertension (odds ratio for IV estimator, 1.1–1.4; all p<0.05). For quantitative traits, our results provide novel evidence for a causal effect of adiposity on the liver enzymes alanine aminotransferase and gamma-glutamyl transferase and confirm previous reports of a causal effect of adiposity on systolic and diastolic blood pressure, fasting insulin, 2-h post-load glucose from the oral glucose tolerance test, C-reactive protein, triglycerides, and high-density lipoprotein cholesterol levels (all p<0.05). The estimated causal effects were in agreement with traditional observational measures in all instances except for type 2 diabetes, where the causal estimate was larger than the observational estimate (p = 0.001).
Conclusions
We provide novel evidence for a causal relationship between adiposity and heart failure as well as between adiposity and increased liver enzymes.
Please see later in the article for the Editors' Summary
Editors' Summary
Cardiovascular disease (CVD)—disease that affects the heart and/or the blood vessels—is a major cause of illness and death worldwide. In the US, for example, coronary heart disease—a CVD in which narrowing of the heart's blood vessels by fatty deposits slows the blood supply to the heart and may eventually cause a heart attack—is the leading cause of death, and stroke—a CVD in which the brain's blood supply is interrupted—is the fourth leading cause of death. Globally, both the incidence of CVD (the number of new cases in a population every year) and its prevalence (the proportion of the population with CVD) are increasing, particularly in low- and middle-income countries. This increasing burden of CVD is occurring in parallel with a global increase in the incidence and prevalence of obesity—having an unhealthy amount of body fat (adiposity)—and of metabolic diseases—conditions such as diabetes in which metabolism (the processes that the body uses to make energy from food) is disrupted, with resulting high blood sugar and damage to the blood vessels.
Why Was This Study Done?
Epidemiological studies—investigations that record the patterns and causes of disease in populations—have reported an association between adiposity (indicated by an increased body mass index [BMI], which is calculated by dividing body weight in kilograms by height in meters squared) and cardiometabolic traits such as coronary heart disease, stroke, heart failure (a condition in which the heart is incapable of pumping sufficient amounts of blood around the body), diabetes, high blood pressure (hypertension), and high blood cholesterol (dyslipidemia). However, observational studies cannot prove that adiposity causes any particular cardiometabolic trait because overweight individuals may share other characteristics (confounding factors) that are the real causes of both obesity and the cardiometabolic disease. Moreover, it is possible that having CVD or a metabolic disease causes obesity (reverse causation). For example, individuals with heart failure cannot do much exercise, so heart failure may cause obesity rather than vice versa. Here, the researchers use “Mendelian randomization” to examine whether adiposity is causally related to various cardiometabolic traits. Because gene variants are inherited randomly, they are not prone to confounding and are free from reverse causation. It is known that a genetic variant (rs9939609) within the genome region that encodes the fat-mass- and obesity-associated gene (FTO) is associated with increased BMI. Thus, an investigation of the associations between rs9939609 and cardiometabolic traits can indicate whether obesity is causally related to these traits.
What Did the Researchers Do and Find?
The researchers analyzed the association between rs9939609 (the “instrumental variable,” or IV) and BMI, between rs9939609 and 24 cardiometabolic traits, and between BMI and the same traits using genetic and health data collected in 36 population-based studies of nearly 200,000 individuals of European descent. They then quantified the strength of the causal association between BMI and the cardiometabolic traits by calculating “IV estimators.” Higher BMI showed a causal relationship with heart failure, metabolic syndrome (a combination of medical disorders that increases the risk of developing CVD), type 2 diabetes, dyslipidemia, hypertension, increased blood levels of liver enzymes (an indicator of liver damage; some metabolic disorders involve liver damage), and several other cardiometabolic traits. All the IV estimators were similar to the BMI–cardiovascular trait associations (observational estimates) derived from the same individuals, with the exception of diabetes, where the causal estimate was higher than the observational estimate, probably because the observational estimate is based on a single BMI measurement, whereas the causal estimate considers lifetime changes in BMI.
What Do These Findings Mean?
Like all Mendelian randomization studies, the reliability of the causal associations reported here depends on several assumptions made by the researchers. Nevertheless, these findings provide support for many previously suspected and biologically plausible causal relationships, such as that between adiposity and hypertension. They also provide new insights into the causal effect of obesity on liver enzyme levels and on heart failure. In the latter case, these findings suggest that a one-unit increase in BMI might increase the incidence of heart failure by 17%. In the US, this corresponds to 113,000 additional cases of heart failure for every unit increase in BMI at the population level. Although additional studies are needed to confirm and extend these findings, these results suggest that global efforts to reduce the burden of obesity will likely also reduce the occurrence of CVD and metabolic disorders.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001474.
The American Heart Association provides information on all aspects of cardiovascular disease and tips on keeping the heart healthy, including weight management (in several languages); its website includes personal stories about stroke and heart attacks
The US Centers for Disease Control and Prevention has information on heart disease, stroke, and all aspects of overweight and obesity (in English and Spanish)
The UK National Health Service Choices website provides information about cardiovascular disease and obesity, including a personal story about losing weight
The World Health Organization provides information on obesity (in several languages)
The International Obesity Taskforce provides information about the global obesity epidemic
Wikipedia has a page on Mendelian randomization (note: Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
MedlinePlus provides links to other sources of information on heart disease, on vascular disease, on obesity, and on metabolic disorders (in English and Spanish)
The International Association for the Study of Obesity provides maps and information about obesity worldwide
The International Diabetes Federation has a web page that describes types, complications, and risk factors of diabetes
doi:10.1371/journal.pmed.1001474
PMCID: PMC3692470  PMID: 23824655
Genetic epidemiology  2011;35(Suppl 1):S56-S60.
As part of Genetic Analysis Workshop 17 (GAW17), our group considered the application of novel and standard approaches to the analysis of genotype-phenotype association in next-generation sequencing data. Our group identified a major issue in the analysis of the GAW17 next-generation sequencing data: type I error and false-positive report probability rates higher than those expected based on empirical type I error levels (as high as 90%). Two main causes emerged: population stratification and long-range correlation (gametic phase disequilibrium) between rare variants. Population stratification was expected because of the diverse sample. Correlation between rare variants was attributable to both random causes (e.g., nearly 10,000 of 25,000 markers were private variants, and the sample size was small [n = 697]) and nonrandom causes (more correlation was observed than was expected by random chance). Principal components analysis was used to control for population structure and helped to minimize type I errors, but this was at the expense of identifying fewer causal variants. A novel multiple regression approach showed promise to handle correlation between markers. Further work is needed, first, to identify best practices for the control of type I errors in the analysis of sequencing data and then to explore and compare the many promising new aggregating approaches for identifying markers associated with disease phenotypes.
doi:10.1002/gepi.20650
PMCID: PMC3249221  PMID: 22128060
population structure; correlated markers; next-generation sequencing
Journal of Heredity  2009;100(Suppl 1):S28-S36.
Until recently, canine genetic research has not focused on population structure within breeds, which may confound the results of case–control studies by introducing spurious correlations between phenotype and genotype that reflect population history. Intrabreed structure may exist when geographical origin or divergent selection regimes influence the choices of potential mates for breeding dogs. We present evidence for intrabreed stratification from a genome-wide marker survey in a sample of unrelated dogs. We genotyped 76 Border Collies, 49 Australian Shepherds, 17 German Shepherd Dogs, and 17 Portuguese Water Dogs for our primary analyses using Affymetrix Canine v2.0 single-nucleotide polymorphism (SNP) arrays. Subsets of autosomal markers were examined using clustering algorithms to facilitate assignment of individuals to populations and estimation of the number of populations represented in the sample. SNPs passing stringent quality control filters were employed for explicitly phylogenetic analyses reconstructing relationships between individuals using maximum parsimony and Bayesian methods. We used simulation studies to explore the possible effects of intrabreed stratification on genome-wide association studies. These analyses demonstrate significant stratification in at least one of our primary breeds of interest, the Border Collie. Demographic and pedigree data suggest that this population substructure may result from geographic isolation or divergent selection regimes practiced by breeders with different breeding program goals. Simulation studies indicate that such stratification could result in false discovery rates significant enough to confound genome-wide association analyses. Intrabreed stratification should be accounted for when designing and interpreting the results of case–control association studies using purebred dogs.
doi:10.1093/jhered/esp012
PMCID: PMC4176315
Bayesian analysis; canine genetics; maximum parsimony; phylogenetics; population stratification; purebred dogs
BMC Proceedings  2009;3(Suppl 7):S107.
Background
To account for population stratification in association studies, principal-components analysis is often performed on single-nucleotide polymorphisms (SNPs) across the genome. Here, we use Framingham Heart Study (FHS) Genetic Analysis Workshop 16 data to compare the performance of local ancestry adjustment for population stratification based on principal components (PCs) estimated from SNPs in a local chromosomal region with global ancestry adjustment based on PCs estimated from genome-wide SNPs.
Methods
Standardized height residuals from unrelated adults from the FHS Offspring Cohort were averaged from longitudinal data. PCs of SNP genotype data were calculated to represent individual's ancestry either 1) globally using all SNPs across the genome or 2) locally using SNPs in adjacent 20-Mbp regions within each chromosome. We assessed the extent to which there were differences in association studies of height depending on whether PCs for global, local, or both global and local ancestry were included as covariates.
Results
The correlations between local and global PCs were low (r < 0.12), suggesting variability between local and global ancestry estimates. Genome-wide association tests without any ancestry adjustment demonstrated an inflated type I error rate that decreased with adjustment for local ancestry, global ancestry, or both. A known spurious association was replicated for SNPs within the lactase gene, and this false-positive association was abolished by adjustment with local or global ancestry PCs.
Conclusion
Population stratification is a potential source of bias in this seemingly homogenous FHS population. However, local and global PCs derived from SNPs appear to provide adequate information about ancestry.
PMCID: PMC2795878  PMID: 20017971
PLoS ONE  2008;3(8):e3013.
Background
C-reactive protein (CRP), a marker of systemic inflammation, is associated with risk of coronary events and sub-clinical measures of atherosclerosis. Evidence in support of this link being causal would include an association robust to adjustments for confounders (multivariable standard regression analysis) and the association of CRP gene polymorphisms with atherosclerosis (Mendelian randomization analysis).
Methodology/Principal Findings
We genotyped 3 tag single nucleotide polymorphisms (SNPs) [+1444T>C (rs1130864); +2303G>A (rs1205) and +4899T>G (rs 3093077)] in the CRP gene and assessed CRP and carotid intima-media thickness (CIMT), a structural marker of atherosclerosis, in 4941 men and women aged 50–74 (mean 61) years (the Whitehall II Study). The 4 major haplotypes from the SNPs were consistently associated with CRP level, but not with other risk factors that might confound the association between CRP and CIMT. CRP, assessed both at mean age 49 and at mean age 61, was associated both with CIMT in age and sex adjusted standard regression analyses and with potential confounding factors. However, the association of CRP with CIMT attenuated to the null with adjustment for confounding factors in both prospective and cross-sectional analyses. When examined using genetic variants as the instrument for serum CRP, there was no inferred association between CRP and CIMT.
Conclusions/Significance
Both multivariable standard regression analysis and Mendelian randomization analysis suggest that the association of CRP with carotid atheroma indexed by CIMT may not be causal.
doi:10.1371/journal.pone.0003013
PMCID: PMC2507732  PMID: 18714381
PLoS ONE  2013;8(12):e82885.
Background
Escalating weight gain among the Malaysian paediatric population necessitates identifying modifiable behaviours in the obesity pathway.
Objectives
This study describes the adaptation and validation of the Children’s Eating Behaviour Questionnaire (CEBQ) as a self-report for adolescents, investigates gender and ethnic differences in eating behaviour and examines associations between eating behaviour and body mass index (BMI) z-scores among multi-ethnic Malaysian adolescents.
Methodology
This two-phase study involved validation of the Malay self-reported CEBQ in Phase 1 (n = 362). Principal Axis Factoring with Promax rotation, confirmatory factor analysis and reliability tests were performed. In Phase 2, adolescents completed the questionnaire (n = 646). Weight and height were measured. Gender and ethnic differences in eating behaviour were investigated. Associations between eating behaviour and BMI z-scores were examined with complex samples general linear model (GLM) analyses, adjusted for gender, ethnicity and maternal educational level.
Results
Exploratory factor analysis revealed a 35-item, 9-factor structure with ‘food fussiness’ scale split into two. In confirmatory factor analysis, a 30-item, 8-factor structure yielded an improved model fit. Reliability estimates of the eight factors were acceptable. Eating behaviours did not differ between genders. Malay adolescents reported higher Food Responsiveness, Enjoyment of Food, Emotional Overeating, Slowness in Eating, Emotional Undereating and Food Fussiness 1 scores (p<0.05) compared to Chinese and Indians. A significant negative association was observed between BMI z-scores and Food Fussiness 1 (‘dislike towards food’) when adjusted for confounders.
Conclusion
Although CEBQ is a valuable psychometric instrument, adjustments were required due to age and cultural differences in our sample. With the self-report, our findings present that gender, ethnic and weight status influenced eating behaviours. Obese adolescents were found to display a lack of dislike towards food. Future longitudinal and qualitative studies are warranted to further understand behavioural phenotypes of obesity to guide prevention and intervention strategies.
doi:10.1371/journal.pone.0082885
PMCID: PMC3857802  PMID: 24349385
PLoS ONE  2013;8(10):e77720.
Complex human diseases commonly differ in their phenotypic characteristics, e.g., Crohn’s disease (CD) patients are heterogeneous with regard to disease location and disease extent. The genetic susceptibility to Crohn’s disease is widely acknowledged and has been demonstrated by identification of over 100 CD associated genetic loci. However, relating CD subphenotypes to disease susceptible loci has proven to be a difficult task. In this paper we discuss the use of cluster analysis on genetic markers to identify genetic-based subgroups while taking into account possible confounding by population stratification. We show that it is highly relevant to consider the confounding nature of population stratification in order to avoid that detected clusters are strongly related to population groups instead of disease-specific groups. Therefore, we explain the use of principal components to correct for population stratification while clustering affected individuals into genetic-based subgroups. The principal components are obtained using 30 ancestry informative markers (AIM), and the first two PCs are determined to discriminate between continental origins of the affected individuals. Genotypes on 51 CD associated single nucleotide polymorphisms (SNPs) are used to perform latent class analysis, hierarchical and Partitioning Around Medoids (PAM) cluster analysis within a sample of affected individuals with and without the use of principal components to adjust for population stratification. It is seen that without correction for population stratification clusters seem to be influenced by population stratification while with correction clusters are unrelated to continental origin of individuals.
doi:10.1371/journal.pone.0077720
PMCID: PMC3798408  PMID: 24147066
PLoS Genetics  2014;10(12):e1004818.
A large fraction of human genes are regulated by genetic variation near the transcribed sequence (cis-eQTL, expression quantitative trait locus), and many cis-eQTLs have implications for human disease. Less is known regarding the effects of genetic variation on expression of distant genes (trans-eQTLs) and their biological mechanisms. In this work, we use genome-wide data on SNPs and array-based expression measures from mononuclear cells obtained from a population-based cohort of 1,799 Bangladeshi individuals to characterize cis- and trans-eQTLs and determine if observed trans-eQTL associations are mediated by expression of transcripts in cis with the SNPs showing trans-association, using Sobel tests of mediation. We observed 434 independent trans-eQTL associations at a false-discovery rate of 0.05, and 189 of these trans-eQTLs were also cis-eQTLs (enrichment P<0.0001). Among these 189 trans-eQTL associations, 39 were significantly attenuated after adjusting for a cis-mediator based on Sobel P<10-5. We attempted to replicate 21 of these mediation signals in two European cohorts, and while only 7 trans-eQTL associations were present in one or both cohorts, 6 showed evidence of cis-mediation. Analyses of simulated data show that complete mediation will be observed as partial mediation in the presence of mediator measurement error or imperfect LD between measured and causal variants. Our data demonstrates that trans-associations can become significantly stronger or switch directions after adjusting for a potential mediator. Using simulated data, we demonstrate that this phenomenon is expected in the presence of strong cis-trans confounding and when the measured cis-transcript is correlated with the true (unmeasured) mediator. In conclusion, by applying mediation analysis to eQTL data, we show that a substantial fraction of observed trans-eQTL associations can be explained by cis-mediation. Future studies should focus on understanding the mechanisms underlying widespread cis-mediation and their relevance to disease biology, as well as using mediation analysis to improve eQTL discovery.
Author Summary
Expression quantitative trait locus (eQTL) studies have demonstrated that human genes can be regulated by genetic variation residing close to the gene (cis-eQTLs) or in a distant region or on a different chromosome (trans-eQTLs). While cis-eQTL variants are likely to affect transcription factor binding or chromatin structure, our understanding of the mechanisms underlying trans-eQTLs is incomplete. We hypothesize that a substantial fraction of trans-eQTLs influence expression of distant genes through mediation by expression levels of a cis-transcript. In this paper, we use genome-wide SNPs and expression data for 1,799 South Asians to identify cis- and trans-eQTLs and to test our hypothesis using Sobel tests of mediation. Among 189 observed trans-eQTL associations, we provide evidence of cis-mediation for 39, 6 of which show mediation in an independent European cohort. We used simulated data to demonstrate that complete mediation will be observed as partial mediation in the presence of mediator measurement error or imperfect LD between measured and causal variants. We also demonstrate how unobserved confounding variables and incorrect mediator selection can bias mediation estimates. In conclusion, we have identified cis-mediators for many trans-eQTLs and described a mediation analysis approach that can be used to validate, characterize, and enhance discovery of trans-eQTLs.
doi:10.1371/journal.pgen.1004818
PMCID: PMC4256471  PMID: 25474530

Results 1-25 (1288915)