Genetic variants in 296 genes in regions identified through admixture mapping of hypertension, BMI, and lipids were assessed for association with hypertension, blood pressure, BMI, and HDL-C.
This study identified coding SNPs identified from HapMap2 data that were located in genes on chromosomes 5, 6, 8, and 21, where ancestry association evidence for hypertension, BMI or HDL-C was identified in previous admixture mapping studies. Genotyping was performed in 1,733 unrelated African-Americans from the National Heart, Lung and Blood Institute’s (NHLBI) Family Blood Pressure Project, and gene-based association analyses were conducted for hypertension, systolic blood pressure (SBP), diastolic blood pressure (DBP), BMI, and HDL-C. A gene score based on the number of minor alleles of each SNP in a gene was created and used for gene-based regression analyses, adjusting for age, age2, sex, local marker ancestry, and BMI, as applicable. An individual’s African ancestry estimated from 2,507 ancestry-informative markers was also adjusted for to eliminate any confounding due to population stratification.
CXADR (rs437470) on chromosome 21 was associated with SBP and DBP with or without adjusting for local ancestry (p < 0.0006). F2RL1 (rs631465) on chromosome 5 was associated with BMI (p = 0.0005). Local ancestry in these regions was associated with the respective traits as well.
This study suggests that CXADR and F2RL1 likely play important roles in blood pressure and obesity variation, respectively; and these findings are consistent with other studies, so replication and functional analyses are necessary.
Blood pressure; Obesity; African Americans; Genetic Association Studies
When dense markers are available, one can interrogate almost every common variant across the genome via imputation and single nucleotide polymorphism (SNP) test, which has become a routine in current genome-wide association studies (GWASs). As a complement, admixture mapping exploits the long-range linkage disequilibrium (LD) generated by admixture between genetically distinct ancestral populations. It is then questionable whether admixture mapping analysis is still necessary in detecting the disease associated variants in admixed populations. We argue that admixture mapping is able to reduce the burden of massive comparisons in GWASs; it therefore can be a powerful tool to locate the disease variants with substantial allele frequency differences between ancestral populations. In this report we studied a two-stage approach, where candidate regions are defined by conducting admixture mapping at stage 1, and single SNP association tests are followed at stage 2 within the candidate regions defined at stage 1. We first established the genome-wide significance levels corresponding to the criteria to define the candidate regions at stage 1 by simulations. We next compared the power of the two-stage approach with direct association analysis. Our simulations suggest that the two-stage approach can be more powerful than the standard genome-wide association analysis when the allele frequency difference of a causal variant in ancestral populations, is larger than 0.4. Our conclusion is consistent with a theoretical prediction by Risch and Tang ( Am J Hum Genet 79:S254). Surprisingly, our study also suggests that power can be improved when we use less strict criteria to define the candidate regions at stage 1.
genome-wide association studies; admixture mapping; permutation based significance thresholds; two-stage approach
In response to the increased organ shortage, organs derived from donation after cardiac death (DCD) donors are becoming an acceptable option once again for clinical use in transplantation. However, transplant outcomes in cases where DCD organs are used are not as favorable as those from donation after brain death or living donors. Different methods of organ preservation are a key factor that may influence the outcomes of DCD kidney transplantation.
We compared the transplant outcomes in patients receiving DCD kidneys preserved by machine perfusion (MP) or by static cold storage (CS) preservation by conducting a meta-analysis. The MEDLINE, EMBASE and Cochrane Library databases were searched. All studies reporting outcomes for MP versus CS preserved DCD kidneys were further considered for inclusion in this meta-analysis. Odds ratios and 95% confidence intervals (CI) were calculated to compare the pooled data between groups that were transplanted with kidneys that were preserved by MP or CS.
Four prospective, randomized, controlled trials, involving 175 MP and 176 CS preserved DCD kidney transplant recipients, were included. MP preserved DCD kidney transplant recipients had a decreased incidence of delayed graft function (DGF) with an odd ration of 0.56 (95% CI = 0.36–0.86, P = 0.008) compared to CS. However, no significant differences were seen between the two technologies in incidence of primary non-function, one year graft survival, or one year patient survival.
MP preservation of DCD kidneys is superior to CS in terms of reducing DGF rate post-transplant. However, primary non-function, one year graft survival, and one year patient survival were not affected by the use of MP or CS for preservation.
In genetic association studies, it is necessary to correct for population structure to avoid inference bias. During the past decade, prevailing corrections often only involved adjustments of global ancestry differences between sampled individuals. Nevertheless, population structure may vary across local genomic regions due to the variability of local ancestries associated with natural selection, migration, or random genetic drift. Adjusting for global ancestry alone may be inadequate when local population structure is an important confounding factor. In contrast, adjusting for local ancestry can more effectively prevent false-positives due to local population structure. To more accurately locate disease genes, we recommend adjusting for local ancestries by interrogating local structure. In practice, locus-specific ancestries are usually unknown and cannot be accurately inferred when ancestral population information is not available. For such scenarios, we propose employing local principal components (PC) to represent local ancestries and adjusting for local PCs when testing for genotype–phenotype association. With an acceptable computation burden, the proposed algorithm successfully eliminates the known spurious association between SNPs in the LCT gene and height due to the population structure in European Americans.
Genome-wide association studies; Local ancestries; Local principal components; Migration; Random genetic drift; Natural selection; Genomic inflation factor; Genomic control; Local ancestry principal components correction; Fine mapping
Incorporating family data in genetic association studies has become increasingly appreciated, especially for its potential value in testing rare variants. We introduce here a variance-component based association test that can test multiple common or rare variants jointly using both family and unrelated samples.
The proposed approach implemented in our R package aggregates or collapses the information across a region based on genetic similarity instead of genotype scores, which avoids the power loss when the effects are in different directions or have different association strengths. The method is also able to effectively leverage the LD information in a region and it can produce a test statistic with an adaptively estimated number of degrees of freedom. Our method can readily allow for the adjustment of non-genetic contributions to the familial similarity, as well as multiple covariates.
We demonstrate through simulations that the proposed method achieves good performance in terms of Type I error control and statistical power. The method is implemented in the R package “fassoc”, which provides a useful tool for data analysis and exploration.
Association studies; Family data; Score test; Multi-marker test
It has been postulated that multiple-marker methods may have added ability, over single-marker methods, to detect genetic variants associated with disease. The Wellcome Trust Case Control Consortium (WTCCC) provided the first successful large genome-wide association studies (GWAS) which included single-marker association analyses for seven common complex diseases. Of those signals detected, only one was associated with coronary artery disease (CAD), and none were identified for hypertension (HTN). Our objective was to find additional genetic associations and pathways for cardiovascular disease by examining the WTCCC data for variants associated with CAD and HTN using two-marker testing methods. We applied two-marker association testing to the WTCCC dataset, which includes ~2,000 affected individuals with each disorder, and a shared pool of ~3,000 controls, all genotyped using Affymetrix GeneChip 500 K arrays. For CAD, we detected single nucleotide polymorphisms (SNP) pairs in three genes showing genome-wide significance: HFE2, STK32B, and DIPC2. The most notable SNP pairs in a non-protein-coding region were at 9p21, a known major CAD-associated region. For HTN, we detected SNP pairs in five genes: GPR39, XRCC4, MYO6, ZFAT, and MACROD2. Four further associated SNP pair regions were at least 70 kb from any known gene. We have shown that novel, multiple-marker, statistical methods can be of use in finding variants in GWAS. We describe many new, associated variants for both CAD and HTN and describe their known genetic mechanisms.
Populations of ethnic mixtures can be useful in genetic studies. Admixture mapping, or mapping by admixture linkage disequilibrium (MALD), is specially developed for admixed populations and can supplement traditional genome-wide association analyses in the search for genetic variants underlying complex traits. Admixture mapping tests the association between a trait and locus-specific ancestries. The locus-specific ancestries are in linkage disequilibrium (LD) which is generated by the admixture process between genetically distinct ancestral populations. Because of highly correlated locus-specific ancestries, admixture mapping performs many fewer independent tests across the genome than current genome-wide association analysis. Therefore, admixture mapping can be more powerful because of the smaller penalty due to multiple tests. In this chapter, I introduce the theory behind admixture mapping and how we conduct the analysis in practice.
Admixture mapping; Population admixture; Ancestry information marker; Hidden Markov model
Interactions among genomic loci (also known as epistasis) have been suggested as one of the potential sources of missing heritability in single locus analysis of genome-wide association studies (GWAS). The computational burden of searching for interactions is compounded by the extremely low threshold for identifying significant p-values due to multiple hypothesis testing corrections. Utilizing prior biological knowledge to restrict the set of candidate SNP pairs to be tested can alleviate this problem, but systematic studies that investigate the relative merits of integrating different biological frameworks and GWAS data have not been conducted.
We developed four biologically based frameworks to identify pairwise interactions among candidate SNP pairs as follows: (1) for each human protein-coding gene, a set of SNPs associated with that gene was constructed providing a gene-based interaction model, (2) for each known biological pathway, a set of SNPs associated with the genes in the pathway was constructed providing a pathway-based interaction model, (3) a set of SNPs associated with genes in a disease-related subnetwork provides a network-based interaction model, and (4) a framework is based on the function of SNPs. The last approach uses expression SNPs (eSNPs or eQTLs), which are SNPs or loci that have defined effects on the abundance of transcripts of other genes. We constructed pairs of eSNPs and SNPs located in the target genes whose expression is regulated by eSNPs. For all four frameworks the SNP sets were exhaustively tested for pairwise interactions within the sets using a traditional logistic regression model after excluding genes that were previously identified to associate with the trait. Using previously published GWAS data for type 2 diabetes (T2D) and the biologically based pair-wise interaction modeling, we identify twelve genes not seen in the previous single locus analysis.
We present four approaches to detect interactions associated with complex diseases. The results show our approaches outperform the traditional single locus approaches in detecting genes that previously did not reach significance; the results also provide novel drug targets and biomarkers relevant to the underlying mechanisms of disease.
Although obstructive sleep apnea (OSA) is known to have a strong familial basis, no genetic polymorphisms influencing apnea risk have been identified in cross-cohort analyses. We utilized the National Heart, Lung, and Blood Institute (NHLBI) Candidate Gene Association Resource (CARe) to identify sleep apnea susceptibility loci. Using a panel of 46,449 polymorphisms from roughly 2,100 candidate genes on a customized Illumina iSelect chip, we tested for association with the apnea hypopnea index (AHI) as well as moderate to severe OSA (AHI≥15) in 3,551 participants of the Cleveland Family Study and two cohorts participating in the Sleep Heart Health Study.
Among 647 African-Americans, rs11126184 in the pleckstrin (PLEK) gene was associated with OSA while rs7030789 in the lysophosphatidic acid receptor 1 (LPAR1) gene was associated with AHI using a chip-wide significance threshold of p-value<2×10−6. Among 2,904 individuals of European ancestry, rs1409986 in the prostaglandin E2 receptor (PTGER3) gene was significantly associated with OSA. Consistency of effects between rs7030789 and rs1409986 in LPAR1 and PTGER3 and apnea phenotypes were observed in independent clinic-based cohorts.
Novel genetic loci for apnea phenotypes were identified through the use of customized gene chips and meta-analyses of cohort data with replication in clinic-based samples. The identified SNPs all lie in genes associated with inflammation suggesting inflammation may play a role in OSA pathogenesis.
Genome-wide genotyping of a cohort using pools rather than individual samples has long been proposed as a cost-saving alternative for performing genome-wide association (GWA) studies. However, successful disease gene mapping using pooled genotyping has thus far been limited to detecting common variants with large effect sizes, which tend not to exist for many complex common diseases or traits. Therefore, for DNA pooling to be a viable strategy for conducting GWA studies, it is important to determine whether commonly used genome-wide SNP array platforms such as the Affymetrix 6.0 array can reliably detect common variants of small effect sizes using pooled DNA. Taking obesity and age at menarche as examples of human complex traits, we assessed the feasibility of genome-wide genotyping of pooled DNA as a single-stage design for phenotype association. By individually genotyping the top associations identified by pooling, we obtained a 14- to 16-fold enrichment of SNPs nominally associated with the phenotype, but we likely missed the top true associations. In addition, we assessed whether genotyping pooled DNA can serve as an inexpensive screen as the second stage of a multi-stage design with a large number of samples by comparing the most cost-effective 3-stage designs with 80% power to detect common variants with genotypic relative risk of 1.1, with and without pooling. Given the current state of the specific technology we employed and the associated genotyping costs, we showed through simulation that a design involving pooling would be 1.07 times more expensive than a design without pooling. Thus, while a significant amount of information exists within the data from pooled DNA, our analysis does not support genotyping pooled DNA as a means to efficiently identify common variants contributing small effects to phenotypes of interest. While our conclusions were based on the specific technology and study design we employed, the approach presented here will be useful for evaluating the utility of other or future genome-wide genotyping platforms in pooled DNA studies.
Many complex diseases are influenced by genetic variations in multiple genes, each with only a small marginal effect on disease susceptibility. Pathway analysis, which identifies biological pathways associated with disease outcome, has become increasingly popular for genome-wide association studies (GWAS). In addition to combining weak signals from a number of SNPs in the same pathway, results from pathway analysis also shed light on the biological processes underlying disease. We propose a new pathway-based analysis method for GWAS, the supervised principal component analysis (SPCA) model. In the proposed SPCA model, a selected subset of SNPs most associated with disease outcome is used to estimate the latent variable for a pathway. The estimated latent variable for each pathway is an optimal linear combination of a selected subset of SNPs; therefore, the proposed SPCA model provides the ability to borrow strength across the SNPs in a pathway. In addition to identifying pathways associated with disease outcome, SPCA also carries out additional within-category selection to identify the most important SNPs within each gene set. The proposed model operates in a well-established statistical framework and can handle design information such as covariate adjustment and matching information in GWAS. We compare the proposed method with currently available methods using data with realistic linkage disequilibrium structures and we illustrate the SPCA method using the Wellcome Trust Case-Control Consortium Crohn Disease (CD) dataset.
SNPs; genome-wide association; pathway analysis; principal component analysis
Hyper-phosphorylation at the Y705 residue of signal transducer and activator of transcription 3 (STAT3) is implicated in tumorigenesis of leukemia and some solid tumors. However, its role in the development of colorectal cancer (CRC) is not well defined. To rigorously test the impact of this phosphorylation on colorectal tumorigenesis, we engineered a STAT3 Y705F knock-in to interrupt STAT3 activity in HCT116 and RKO CRC cells. These STAT3 Y705F mutant cells fail to respond to cytokine stimulation and grow slower than parental cells. These mutant cells are also greatly diminished in their abilities to form colonies in culture, to exhibit anchorage-independent growth in soft agar, and to grow as xenografts in nude mice. These observations strongly support the premise that STAT3 Y705 phosphorylation is crucial in colorectal tumorigenesis. Although it is generally believed that STAT3 functions as a transcription factor, recent studies indicate that transcription-independent functions of STAT3 also play an important role in tumorigenesis. We show here that wild-type STAT3, but not STAT3 Y705F mutant protein, associates with PLCγ1. PLCγ1 is a central signal transducer of growth factor and cytokine signaling pathways that are involved in tumorigenesis. In STAT3 Y705F mutant CRC cells, PLCγ1 activity is reduced. Moreover, over-expression of a constitutively active form of PLC γ1 rescues the transformation defect of STAT3 Y705F mutant cells. In aggregate, our study identifies previously unknown cross-talk between STAT3 and the PLCγ signaling pathways that may play a critical role in colorectal tumorigenesis.
STAT3; PLC; colorectal cancer; phosphorylation; PTPRT
It is generally known that risk variants segregate together with a disease within families but this information has not been used in the existing statistical methods for detecting rare variants. Here we introduce two weighted sum statistics that can apply to either genome-wide association data or resequencing data for identifying rare disease variants: weights calculated based on sibpairs and odd ratios, respectively. We evaluated the two methods via extensive simulations under different disease models. We compared the proposed methods with the weighted sum statistic (WSS) proposed by Madsen and Browning, keeping the same genotyping or resequencing cost. Our methods clearly demonstrate more statistical power than the WSS. In addition, we found using sibpair information can increase power over using only unrelated samples by more than 40%. We applied our methods to the Framingham Heart Study (FHS) and Wellcome Trust Case Control Consortium (WTCCC) hypertension datasets. Although we did not identify any genes as reaching a genome-wide significance level, we found variants in the candidate gene angiotensinogen (AGT) significantly associated with hypertension at P=6.9×10-4, whereas the most significant single SNP association evidence is P=0.063. We further applied the odds ratio weighted method to the IFIH1 gene for type 1 diabetes in the WTCCC data. Our method yielded a P value of 4.82×10-4, much more significant than that obtained by haplotype-based methods. We demonstrated that family data are extremely informative in searching for rare variants underlying complex traits, and the odds ratio weighted sum statistic is more efficient than currently existing methods.
Admixture mapping based on recently admixed populations is a powerful method to detect disease variants with substantial allele frequency differences in ancestral populations. We performed admixture mapping analysis for systolic blood pressure (SBP) and diastolic blood pressure (DBP), followed by trait-marker association analysis, in 6303 unrelated African-American participants of the Candidate Gene Association Resource (CARe) consortium. We identified five genomic regions (P< 0.001) harboring genetic variants contributing to inter-individual BP variation. In follow-up association analyses, correcting for all tests performed in this study, three loci were significantly associated with SBP and one significantly associated with DBP (P< 10−5). Further analyses suggested that six independent single-nucleotide polymorphisms (SNPs) contributed to the phenotypic variation observed in the admixture mapping analysis. These six SNPs were examined for replication in multiple, large, independent studies of African-Americans [Women's Health Initiative (WHI), Maywood, Genetic Epidemiology Network of Arteriopathy (GENOA) and Howard University Family Study (HUFS)] as well as one native African sample (Nigerian study), with a total replication sample size of 11 882. Meta-analysis of the replication set identified a novel variant (rs7726475) on chromosome 5 between the SUB1 and NPR3 genes, as being associated with SBP and DBP (P< 0.0015 for both); in meta-analyses combining the CARe samples with the replication data, we observed P-values of 4.45 × 10−7 for SBP and 7.52 × 10−7 for DBP for rs7726475 that were significant after accounting for all the tests performed. Our study highlights that admixture mapping analysis can help identify genetic variants missed by genome-wide association studies because of drastically reduced number of tests in the whole genome.
The structure of 3-methyladenine DNA glycosylase I in complex with 3-methyladenine is reported.
The removal of chemically damaged DNA bases such as 3-methyladenine (3-MeA) is an essential process in all living organisms and is catalyzed by the enzyme 3-MeA DNA glycosylase I. A key question is how the enzyme selectively recognizes the alkylated 3-MeA over the much more abundant adenine. The crystal structures of native and Y16F-mutant 3-MeA DNA glycosylase I from Staphylococcus aureus in complex with 3-MeA are reported to 1.8 and 2.2 Å resolution, respectively. Isothermal titration calorimetry shows that protonation of 3-MeA decreases its binding affinity, confirming previous fluorescence studies that show that charge–charge recognition is not critical for the selection of 3-MeA over adenine. It is hypothesized that the hydrogen-bonding pattern of Glu38 and Tyr16 of 3-MeA DNA glycosylase I with a particular tautomer unique to 3-MeA contributes to recognition and selection.
3-methyladenine DNA glycosylase I; fluorescence measurements; ITC; DNA repair; recognition
Recently a fluorination enzyme was identified and isolated from Streptomyces cattleya, as the first committed step on the metabolic pathway to the fluorinated metabolites, fluoroacetate and 4-fluorothreonine. This enzyme, 5′-fluoro-5′-deoxy adenosine synthetase (FDAS), has been shown to catalyze C-F bond formation by nucleophilic attack of fluoride ion to S-adenosyl-L-methionine (SAM) with the concomitant displacement of L-methionine to generate 5′-fluoro-5′-deoxy adenosine (5′-FDA). Although the structures of FDAS bound to both SAM and products have been solved, the molecular mechanism remained to be elucidated. We now report site directed mutagenesis studies, structural analyses and isothermal calorimetry (ITC) experiments. The data establish the key residues required for catalysis and the order of substrate binding. Fluoride ion is not readily distinguished from water by protein X-ray crystallography, however using chloride ion (also a substrate) with mutants of low activity has enabled the halide ion to be located in non-productive co-complexes with SAH and SAM. The kinetic data suggest the positively charged sulfur of SAM is a key requirement in stabilizing the transition state. We propose a molecular mechanism for FDAS in which fluoride weakly associates with the enzyme exchanging two water molecules for protein ligation. The binding of SAM expels remaining water associated with fluoride ion and traps the ion in a pocket positioned to react with SAM, generating L-methionine and 5′-FDA. L-SAM then dissociates from the enzyme followed by 5′-FDA.
Inadequate liver regeneration (LR) is still an unsolved problem in major liver resection and small-for-size syndrome post-living donor liver transplantation. A number of microRNAs have been shown to play important roles in cell proliferation. Herein, we investigated the role of miR-26a as a pivotal regulator of hepatocyte proliferation in LR.
Adult male C57BL/6J mice, undergoing 70% partial hepatectomy (PH), were treated with Ad5-anti-miR-26a-LUC or Ad5-miR-26a-LUC or Ad5-LUC vector via portal vein. The animals were subjected to in vivo bioluminescence imaging. Serum and liver samples were collected to test liver function, calculate liver-to-body weight ratio (LBWR), document hepatocyte proliferation (Ki-67 staining), and investigate potential targeted gene expression of miR-26a by quantitative real-time PCR and Western blot. The miR-26a level declined during LR after 70% PH. Down-regulation of miR-26a by anti-miR-26a expression led to enhanced proliferation of hepatocytes, and both LBWR and hepatocyte proliferation (Ki-67+ cells %) showed an increased tendency, while liver damage, indicated by aspartate aminotransferase (AST), alanine aminotransferase (ALT) and total bilirubin (T-Bil), was reduced. Furthermore, CCND2 and CCNE2, as possible targeted genes of miR-26a, were up-regulated. In addition, miR-26a over-expression showed converse results.
MiR-26a plays crucial role in regulating the proliferative phase of LR, probably by repressing expressions of cell cycle proteins CCND2 and CCNE2. The current study reveals a novel miRNA-mediated regulation pattern during the proliferative phase of LR.
Motivation: Admixed populations offer a unique opportunity for mapping diseases that have large disease allele frequency differences between ancestral populations. However, association analysis in such populations is challenging because population stratification may lead to association with loci unlinked to the disease locus.
Methods and results: We show that local ancestry at a test single nucleotide polymorphism (SNP) may confound with the association signal and ignoring it can lead to spurious association. We demonstrate theoretically that adjustment for local ancestry at the test SNP is sufficient to remove the spurious association regardless of the mechanism of population stratification, whether due to local or global ancestry differences among study subjects; however, global ancestry adjustment procedures may not be effective. We further develop two novel association tests that adjust for local ancestry. Our first test is based on a conditional likelihood framework which models the distribution of the test SNP given disease status and flanking marker genotypes. A key advantage of this test lies in its ability to incorporate different directions of association in the ancestral populations. Our second test, which is computationally simpler, is based on logistic regression, with adjustment for local ancestry proportion. We conducted extensive simulations and found that the Type I error rates of our tests are under control; however, the global adjustment procedures yielded inflated Type I error rates when stratification is due to local ancestry difference.
Contact: firstname.lastname@example.org; email@example.com.
Supplementary information: Supplementary data are available at Bioinformatics online.
Although recent studies have attempted to dispel the confusion that exists in regard to the definition, analysis and interpretation of interaction in genetics, there still remain aspects that are poorly understood by non-statisticians. After a brief discussion of the definition of gene-gene interaction, the main part of this study addresses the fundamental meaning of statistical interaction and its relationship to measurement scale, disproportionate sample sizes in the cells of a two-way table and gametic phase disequilibrium.
Epistasis; Gametic phase disequilibrium; Interaction; Transformation
Next-generation sequencing technology provides new opportunities and challenges in the search for genetic variants that underlie complex traits. It will also presumably uncover many new rare variants, but exactly how these variants should be incorporated into the data analysis remains a question. Several papers in our group from Genetic Analysis Workshop 17 evaluated different methods of rare variant analysis, including single-variant, gene-based, and pathway-based analyses and analyses that incorporated biological information. Although the performance of some of these methods strongly depends on the underlying disease model, integration of known biological information is helpful in detecting causal genes. Two work groups demonstrated that use of a Bayesian network and a collapsing receiver operating characteristic curve approach improves risk prediction when a disease is caused by many rare variants. Another work group suggested that modeling local rather than global ancestry may be beneficial when controlling the effect of population structure in rare variant association analysis.
rare variant; association analysis; risk prediction model; population structure; biological information; receiver operating characteristic; Bayesian network
Motivation: Adjustment for population structure is necessary to avoid bias in genetic association studies of susceptibility variants for complex diseases. Population structure may differ from one genomic region to another due to the variability of individual ancestry associated with migration, random genetic drift or natural selection. Current association methods for correcting population stratification usually involve adjustment of global ancestry between study subjects.
Results: We suggest interrogating local population structure for fine mapping to more accurately locate true casual genes by better adjusting the confounding effect due to local ancestry. By extensive simulations on genome-wide datasets, we show that adjusting global ancestry may lead to false positives when local population structure is an important confounding factor. In contrast, adjusting local ancestry can effectively prevent false positives due to local population structure and thus can improve fine mapping for disease gene localization. We applied the local and global adjustments to the analysis of datasets from three genome-wide association studies, including European Americans, African Americans and Nigerians. Both European Americans and African Americans demonstrate greater variability in local ancestry than Nigerians. Adjusting local ancestry successfully eliminated the known spurious association between SNPs in the LCT gene and height due to the population structure existed in European Americans.
Supplementary information: Supplementary data are available at Bioinformatics online.
Because obstructive sleep apnea (OSA) is associated with increased levels of inflammatory cytokines, we examined the relationship between OSA and polymorphisms for interleukin-6 (IL-6).
6 single nucleotide polymorphisms (SNPs) within IL-6 were genotyped in 259 African-Americans from the Cleveland Family Study with replication conducted in the Cardiovascular Health Study (n=124). OSA was dichotomized into apnea hypopnea index (AHI)>15 or on treatment vs. absent: AHI<5. Logistic regression was conducted, adjusting for age and sex in models with and without body mass index (BMI).
SNP IL6-6021 was associated with a decreased risk of OSA after adjusting for BMI (Odds Ratio for T allele 0.24; 95%CI [0.09–0.67]; p=0.006; q=0.07) under an additive model. This same allele was associated with increased BMI. The results from the replication sample were consistent in direction though not statistically significant (p=0.23). The SNPs were studied in European-Americans, although the minor allele frequency in IL6-6021 was too low (4%) for meaningful comparisons.
A synonymous SNP within the IL-6 coding region was protective of OSA in African-Americans; with qualitatively similar findings observed in another cohort. This suggests that variants in IL-6 may influence the risk of OSA in a pathway that is not explained by obesity.
Genetic Analysis Workshop 17 (GAW17) provided a platform for evaluating existing statistical genetic methods and for developing novel methods to analyze rare variants that modulate complex traits. In this article, we present an overview of the 1000 Genomes Project exome data and simulated phenotype data that were distributed to GAW17 participants for analyses, the different issues addressed by the participants, and the process of preparation of manuscripts resulting from the discussions during the workshop.
The Genetic Analysis Workshop 17 data we used comprise 697 unrelated individuals genotyped at 24,487 single-nucleotide polymorphisms (SNPs) from a mini-exome scan, using real sequence data for 3,205 genes annotated by the 1000 Genomes Project and simulated phenotypes. We studied 200 sets of simulated phenotypes of trait Q2. An important feature of this data set is that most SNPs are rare, with 87% of the SNPs having a minor allele frequency less than 0.05. For rare SNP detection, in this study we performed a least absolute shrinkage and selection operator (LASSO) regression and F tests at the gene level and calculated the generalized degrees of freedom to avoid any selection bias. For comparison, we also carried out linear regression and the collapsing method, which sums the rare SNPs, modified for a quantitative trait and with two different allele frequency thresholds. The aim of this paper is to evaluate these four approaches in this mini-exome data and compare their performance in terms of power and false positive rates. In most situations the LASSO approach is more powerful than linear regression and collapsing methods. We also note the difficulty in determining the optimal threshold for the collapsing method and the significant role that linkage disequilibrium plays in detecting rare causal SNPs. If a rare causal SNP is in strong linkage disequilibrium with a common marker in the same gene, power will be much improved.