Admixture mapping is a powerful method of gene mapping for diseases or traits that show differential risk by ancestry. Admixture mapping has been applied most often to African Americans who trace ancestry to Europeans and West Africans. Recent developments in admixture mapping include improvements in methods to take advantage of higher densities of genetic variants as well as extensions to admixed populations with three or more ancestral populations, such as Latino Americans. In this unit, I outline the key concepts of admixture mapping. I describe several approaches for inferring local ancestry and provide strategies for performing admixture mapping depending on the study design. Finally, I compare and contrast linkage analysis, association analysis, and admixture mapping, with an emphasis on integrating admixture mapping and association testing.
admixture; admixture mapping; ancestry
Insulin resistance (IR) is a key determinant of type 2 diabetes (T2D) and other metabolic disorders. This genome-wide association study (GWAS) was designed to shed light on the genetic basis of fasting insulin (FI) and IR in 927 non-diabetic African Americans. 5 396 838 single-nucleotide polymorphisms (SNPs) were tested for associations with FI or IR with adjustments for age, sex, body mass index, hypertension status and first two principal components. Genotyped SNPs (n = 12) with P < 5 × 10−6 in African Americans were carried forward for de novo genotyping in 570 non-diabetic West Africans. We replicated SNPs in or near SC4MOL and TCERG1L in West Africans. The meta-analysis of 1497 African Americans and West Africans yielded genome-wide significant associations for SNPs in the SC4MOL gene: rs17046216 (P = 1.7 × 10−8 and 2.9 × 10−8 for FI and IR, respectively); and near the TCERG1L gene with rs7077836 as the top scoring (P = 7.5 × 10−9 and 4.9 × 10−10 for FI and IR, respectively). In silico replication in the MAGIC study (n = 37 037) showed weak but significant association (adjusted P-value of 0.0097) for rs34602777 in the MYO5A gene. In addition, we replicated previous GWAS findings for IR and FI in Europeans for GCKR, and for variants in four T2D loci (FTO, IRS1, KLF14 and PPARG) which exert their action via IR. In summary, variants in/near SC4MOL, and TCERG1L were associated with FI and IR in this cohort of African Americans and were replicated in West Africans. SC4MOL is under-expressed in an animal model of T2D and plays a key role in lipid biosynthesis, with implications for the regulation of energy metabolism, obesity and dyslipidemia. TCERG1L is associated with plasma adiponectin, a key modulator of obesity, inflammation, IR and diabetes.
Central obesity, measured by waist circumference (WC) or waist-hip ratio (WHR), is a marker of body fat distribution. Although obesity disproportionately affects minority populations, few studies have conducted genome-wide association study (GWAS) of fat distribution among those of predominantly African ancestry (AA). We performed GWAS of WC and WHR, adjusted and unadjusted for BMI, in up to 33,591 and 27,350 AA individuals, respectively. We identified loci associated with fat distribution in AA individuals using meta-analyses of GWA results for WC and WHR (stage 1). Overall, 25 SNPs with single genomic control (GC)-corrected p-values<5.0×10−6 were followed-up (stage 2) in AA with WC and with WHR. Additionally, we interrogated genomic regions of previously identified European ancestry (EA) WHR loci among AA. In joint analysis of association results including both Stage 1 and 2 cohorts, 2 SNPs demonstrated association, rs2075064 at LHX2, p = 2.24×10−8 for WC-adjusted-for-BMI, and rs6931262 at RREB1, p = 2.48×10−8 for WHR-adjusted-for-BMI. However, neither signal was genome-wide significant after double GC-correction (LHX2: p = 6.5×10−8; RREB1: p = 5.7×10−8). Six of fourteen previously reported loci for waist in EA populations were significant (p<0.05 divided by the number of independent SNPs within the region) in AA studied here (TBX15-WARS2, GRB14, ADAMTS9, LY86, RSPO3, ITPR2-SSPN). Further, we observed associations with metabolic traits: rs13389219 at GRB14 associated with HDL-cholesterol, triglycerides, and fasting insulin, and rs13060013 at ADAMTS9 with HDL-cholesterol and fasting insulin. Finally, we observed nominal evidence for sexual dimorphism, with stronger results in AA women at the GRB14 locus (p for interaction = 0.02). In conclusion, we identified two suggestive loci associated with fat distribution in AA populations in addition to confirming 6 loci previously identified in populations of EA. These findings reinforce the concept that there are fat distribution loci that are independent of generalized adiposity.
Central obesity is a marker of body fat distribution and is known to have a genetic underpinning. Few studies have reported genome-wide association study (GWAS) results among individuals of predominantly African ancestry (AA). We performed a collaborative meta-analysis in order to identify genetic loci associated with body fat distribution in AA individuals using waist circumference (WC) and waist to hip ratio (WHR) as measures of fat distribution, with and without adjustment for body mass index (BMI). We uncovered 2 genetic loci potentially associated with fat distribution: LHX2 in association with WC-adjusted-for-BMI and at RREB1 for WHR-adjusted-for-BMI. Six of fourteen previously reported loci for waist in EA populations were significant in AA studied here (TBX15-WARS2, GRB14, ADAMTS9, LY86, RSPO3, ITPR2-SSPN). These findings reinforce the concept that there are loci for body fat distribution that are independent of generalized adiposity.
C-reactive protein (CRP) is an acute phase reactant protein produced primarily by the liver. Circulating CRP levels are influenced by genetic and non-genetic factors, including infection and obesity. Genome-wide association studies (GWAS) provide an unbiased approach towards identifying loci influencing CRP levels. None of the six GWAS for CRP levels has been conducted in an African ancestry population. The present study aims to: (i) identify genetic variants that influence serum CRP in African Americans (AA) using a genome-wide association approach and replicate these findings in West Africans (WA), (ii) assess transferability of major signals for CRP reported in European ancestry populations (EA) to AA and (iii) use the weak linkage disequilibrium (LD) structure characteristic of African ancestry populations to fine-map the previously reported CRP locus. The discovery cohort comprised 837 unrelated AA, with the replication of significant single-nucleotide polymorphisms (SNPs) assessed in 486 WA. The association analysis was conducted with 2 366 856 genotyped and imputed SNPs under an additive genetic model with adjustment for appropriate covariates. Genome-wide and replication significances were set at P < 5 × 10−8 and P < 0.05, respectively. Ten SNPs in (CRP pseudogene-1) CRPP1 and CRP genes were associated with serum CRP (P = 2.4 × 10−09 to 4.3 × 10−11). All but one of the top-scoring SNPs associated with CRP in AA were successfully replicated in WA. CRP signals previously identified in EA samples were transferable to AAs, and we were able to fine-map this signal, reducing the region of interest from the 25 kb of LD around the locus in the HapMap CEU sample to only 8 kb in our AA sample.
Principal components analysis of genetic data has benefited from advances in random matrix theory. The Tracy-Widom distribution has been identified as the limiting distribution of the lead eigenvalue, enabling formal hypothesis testing of population structure. Additionally, a phase change exists between small and large eigenvalues, such that population divergence below a threshold of FST is impossible to detect and above which it is always detectable. I show that the plug-in estimate of the effective number of markers in the EIGEN-SOFT software often exceeds the rank of the sample covariance matrix, leading to a systematic overestimation of the number of significant principal components. I describe an alternative plug-in estimate that eliminates the problem. This improvement is not just an asymptotic result but is directly applicable to finite samples. The minimum average partial test, based on minimizing the average squared partial correlation between individuals, can detect population structure at smaller FST values than the corrected test. The minimum average partial test is applicable to both unadmixed and admixed samples, with arbitrary numbers of discrete subpopulations or parental populations, respectively. Application of the minimum average partial test to the 11 HapMap Phase III samples, comprising 8 unadmixed samples and 3 admixed samples, revealed 13 significant principal components.
Admixture; Population stratification; Population structure; Principal components analysis
Interleukins (ILs) are key mediators of the immune response and inflammatory process. Plasma levels of IL-10, IL-1Ra, and IL-6 are associated with metabolic conditions, show large inter-individual variations, and are under strong genetic control. Therefore, elucidation of the genetic variants that influence levels of these ILs provides useful insights into mechanisms of immune response and pathogenesis of diseases. We conducted a genome-wide association study (GWAS) of IL-10, IL-1Ra, and IL-6 levels in 707 non-diabetic African Americans using 5,396,780 imputed and directly genotyped single nucleotide polymorphisms (SNPs) with adjustment for gender, age, and body mass index. IL-10 levels showed genome-wide significant associations (p<5×10−8) with eight SNPs, the most significant of which was rs5743185 in thePMS1 gene (p=2.30×10−10). We tested replication of SNPs that showed genome-wide significance in 425 non-diabetic individuals from West Africa, and successfully replicated SNP rs17365948 in the YWHAZ gene (p=0.02). IL-1Ra levels showed suggestive associations with two SNPs in the ASB3 gene (p=2.55×10−7), 10 SNPs in the IL-1 gene family (IL1F5, IL1F8, IL1F10, and IL1Ra, p=1.04×10−6 to 1.75×10−6), and 23 SNPs near the IL1A gene (p=1.22×10−6 to 1.63×10−6). We also successfully replicated rs4251961 (p=0.009); this SNP was reported to be associated with IL-1Ra levels in a candidate gene study of Europeans. IL-6 levels showed genome-wide significant association with one SNP (RP11-314E23.1; chr6:133397598; p=8.63×10−9). To our knowledge, this is the first GWAS on IL-10, IL-1Ra, and IL-6 levels. Follow-up of these findings may provide valuable insight into the pathobiology of IL actions and dysregulations in inflammation and human diseases.
interleukin; interleukin-10; interleukin-1Ra; interleukin-6; genome-wide association study; African American
Serum urate concentrations are highly heritable and elevated serum urate is a key risk factor for gout. Genome-wide association studies (GWAS) of serum urate in African American (AA) populations are lacking. We conducted a meta-analysis of GWAS of serum urate levels and gout among 5820 AA and a large candidate gene study among 6890 AA and 21 708 participants of European ancestry (EA) within the Candidate Gene Association Resource Consortium. Findings were tested for replication among 1996 independent AA individuals, and evaluated for their association among 28 283 EA participants of the CHARGE Consortium. Functional studies were conducted using 14C-urate transport assays in mammalian Chinese hamster ovary cells. In the discovery GWAS of serum urate, three loci achieved genome-wide significance (P< 5.0 × 10−8): a novel locus near SGK1/SLC2A12 on chromosome 6 (rs9321453, P= 1.0 × 10−9), and two loci previously identified in EA participants, SLC2A9 (P= 3.8 × 10−32) and SLC22A12 (P= 2.1 × 10−10). A novel rare non-synonymous variant of large effect size in SLC22A12, rs12800450 (minor allele frequency 0.01, G65W), was identified and replicated (beta −1.19 mg/dl, P= 2.7 × 10−16). 14C-urate transport assays showed reduced urate transport for the G65W URAT1 mutant. Finally, in analyses of 11 loci previously associated with serum urate in EA individuals, 10 of 11 lead single-nucleotide polymorphisms showed direction-consistent association with urate among AA. In summary, we identified and replicated one novel locus in association with serum urate levels and experimentally characterize the novel G65W variant in URAT1 as a functional allele. Our data support the importance of multi-ethnic GWAS in the identification of novel risk loci as well as functional variants.
A recent, large genome-wide association study (GWAS) of European ancestry individuals has identified multiple genetic variants influencing serum lipids. Studies of the transferability of these associations to African Americans remain few, an important limitation given interethnic differences in serum lipids and the disproportionate burden of lipid-associated metabolic diseases among African Americans.
We attempted to evaluate the transferability of 95 lipid-associated loci recently identified in European ancestry individuals to 887 non-diabetic, unrelated African Americans from a population-based sample in the Washington, DC area. Additionally, we took advantage of the generally reduced linkage disequilibrium among African ancestry populations in comparison to European ancestry populations to fine-map replicated GWAS signals.
We successfully replicated reported associations for 10 loci (CILP2/SF4, STARD3, LPL, CYP7A1, DOCK7/ANGPTL3, APOE, SORT1, IRS1, CETP, and UBASH3B). Through trans-ethnic fine-mapping, we were able to reduce associated regions around 75% of the loci that replicated.
Between this study and previous work in African Americans, 40 of the 95 loci reported in a large GWAS of European ancestry individuals also influence lipid levels in African Americans. While there is now evidence that the lipid-influencing role of a number of genetic variants is observed in both European and African ancestry populations, the still considerable lack of concordance highlights the importance of continued ancestry-specific studies to elucidate the genetic underpinnings of these traits.
Lipids; Genetics; African Americans; Genome-wide association study; Ethnicity
To identify genetic loci that regulate spontaneous arthritis in interleukin-1 receptor antagonist (IL-1ra)-deficient mice, an F2 population was created from a cross between Balb/c IL-1ra-deficient mice and DBA/1 IL-1ra-deficient mice. Spontaneous arthritis in the F2 population was examined and recorded. Genotypes of those F2 mice were determined using microsatellite markers. Quantitative trail locus (QTL) analysis was conducted with R/qtlbim. Functions of genes within QTL chromosomal regions were evaluated using a bioinformatics tool, PGMapper, and microarray analysis. Potential candidate genes were further evaluated using GeneNetwork. A total of 137 microsatellite markers with an average of 12 cM spacing along the whole genome were used for determining the correlation of arthritis phenotypes with genotypes of 191 F2 progenies. By whole-genome mapping, we obtained QTLs on chromosomes 1 and 6 that were above the significance threshold for strong Bayesian evidence. The QTL on chromosome 1 had a peak near D1Mit55 and D1Mit425 at 82·6 cM. It may account for as much as 12% of the phenotypic variation in susceptibility to spontaneous arthritis. The QTL region contained 208 known transcripts. According to their functions, Mr1, Pla2g4a and Fasl are outstanding candidate genes. From microarray analysis, 11 genes were selected as favourable candidates based on their function and expression profiles. Three of those 11 genes, Prg4, Ptgs2 and Mr1, correlated with the IL-1ra pathway. Those genes were considered to be the best candidates.
The incidence of chronic kidney disease varies by ethnic group in the USA, with African Americans displaying a two-fold higher rate than European Americans. One of the two defining variables underlying staging of chronic kidney disease is the glomerular filtration rate. Meta-analysis in individuals of European ancestry has identified 23 genetic loci associated with the estimated glomerular filtration rate (eGFR). We conducted a follow-up study of these 23 genetic loci using a population-based sample of 1,018 unrelated admixed African Americans. We included in our follow-up study two variants in APOL1 associated with end-stage kidney disease discovered by admixture mapping in admixed African Americans. To address confounding due to admixture, we estimated local ancestry at each marker and global ancestry. We performed regression analysis stratified by local ancestry and combined the resulting regression estimates across ancestry strata using an inverse variance-weighted fixed effects model. We found that 11 of the 24 loci were significantly associated with eGFR in our sample. The effect size estimates were not significantly different between the subgroups of individuals with two copies of African ancestry vs. two copies of European ancestry for any of the 11 loci. In contrast, allele frequencies were significantly different at 10 of the 11 loci. Collectively, the 11 loci, including four secondary signals revealed by conditional analyses, explained 14.2% of the phenotypic variance in eGFR, in contrast to the 1.4% explained by the 24 loci in individuals of European ancestry. Our findings provide insight into the genetic basis of variation in renal function among admixed African Americans.
Low levels of high-density cholesterol (HDLc) accompany chronic kidney disease, but the association between HDLc and the estimated glomerular filtration rate (eGFR) in the general population is unclear. We investigated the HDLc-eGFR association in nondiabetic Han Chinese (HC, n = 1100), West Africans (WA, n = 1497), and African Americans (AA, n = 1539).
There were significant differences by ancestry: HDLc was positively associated with eGFR in HC (β = 0.13, P < 0.0001), but negatively associated among African ancestry populations (WA: −0.19, P < 0.0001; AA: −0.09, P = 0.02). These differences were also seen in nationally-representative NHANES data (among European Americans: 0.09, P = 0.005; among African Americans −0.14, P = 0.03). To further explore the findings in African ancestry populations, we investigated the role of an African ancestry-specific nephropathy risk variant, rs73885319, in the gene encoding HDL-associated APOL1. Among AA, an inverse HDLc-eGFR association was observed only with the risk genotype (−0.38 versus 0.001; P = 0.03). This interaction was not seen in WA.
In summary, counter to expectation, an inverse HDLc-eGFR association was observed among those of African ancestry. Given the APOL1 × HDLc interaction among AA, genetic factors may contribute to this paradoxical association. Notably, these findings suggest that the unexplained mechanism by which APOL1 affects kidney-disease risk may involve HDLc.
Advances in technology and reduced costs are facilitating large-scale sequencing of genes and exomes as well as entire genomes. Recently, we described an approach based on haplotypes called SCARVA1 that enables the simultaneous analysis of the association between rare and common variants in disease etiology. Here, we describe an extension of SCARVA that evaluates individual markers instead of haplotypes. This modified method (SCARVAsnp) is implemented in four stages. First, all common variants in a pre-specified region (eg, gene) are evaluated individually. Second, a union procedure is used to combined all rare variants (RVs) in the index region, and the ratio of the log likelihood with one RV excluded to the log likelihood of a model with all the collapsed RVs is calculated. On the basis of previously-reported simulation studies,1 a likelihood ratio ≥1.3 is considered statistically significant. Third, the direction of the association of the removed RV is determined by evaluating the change in λ values with the inclusion and exclusion of that RV. Lastly, significant common and rare variants, along with covariates, are included in a final regression model to evaluate the association between the trait and variants in that region. We apply simulated and real data sets to show that the method is simple to use, computationally effcient, and that it can accurately identify both common and rare risk variants. This method overcomes several limitations of existing methods. For example, SCARVAsnp limits loss of statistical power by not including variants that are not associated with the trait of interest in the final model. Also, SCARVAsnp takes into consideration the direction of association by effectively modelling positively and negatively associated variants.
complex traits; rare and common variants
Recent developments in high-throughput genotyping and whole-genome sequencing will enhance the identification of disease loci in admixed populations. We discuss how a more refined estimation of ancestry benefits both admixture mapping and association mapping, making disease loci identification in admixed populations more powerful.
High-throughput genotyping and sequencing will enable refined estimation of ancestry, thus enhancing disease loci identification in admixed populations
Principal components analysis of genetic data is used to avoid inflation in type I error rates in association testing due to population stratification by covariate adjustment using the top eigenvectors and to estimate cluster or group membership independent of self-reported or ethnic identities. Eigendecomposition transforms correlated variables into an equal number of uncorrelated variables. Numerous stopping rules have been developed to identify which principal components should be retained. Recent developments in random matrix theory have led to a formal hypothesis test of the top eigenvalue, providing another way to achieve dimension reduction. In this study, I compare Velicer’s minimum average partial test to a test based on the Tracy-Widom distribution as implemented in EIGENSOFT, the most widely used implementation of principal components analysis in genome-wide association analysis. By computer simulation of vicariance based on coalescent theory, EIGENSOFT systematically overestimates the number of significant principal components. Furthermore, this overestimation is larger for samples of admixed individuals than for samples of unadmixed individuals. Overestimating the number of significant principal components can potentially lead to a loss of power in association testing by adjusting for unnecessary covariates and may lead to incorrect inferences about group differentiation. Velicer’s minimum average partial test is shown to have both smaller bias and smaller variance, often with a mean squared error of zero, in estimating the number of principal components to retain. Velicer’s minimum average partial test is implemented in R code and is suitable for genome-wide genotype data with or without population labels.
admixture; population stratification; principal components; stopping rule; vicariance
Total serum bilirubin is associated with several clinical outcomes, including cardiovascular disease, diabetes and drug metabolism. We conducted a genome-wide association study in 619 healthy unrelated African Americans in an attempt to replicate reported findings in Europeans and Asians and to identify novel loci influencing total serum bilirubin levels. We analyzed a dense panel of over two million genotyped and imputed SNPs in additive genetic models adjusting for age, sex, and the first two significant principal components from the sample covariance matrix of genotypes. Thirty-nine SNPs spanning a 78 kb region within the UGT1A1 displayed P-values <5 × 10−8. The lowest P-value was 1.7 × 10−22 for SNP rs887829. None of SNPs in the UGT1A1 remained statistically significant in conditional association analyses that adjusted for rs887829. In addition, SNP rs10929302 located in phenobarbital response enhancer module was significantly associated with bilirubin level with a P-value of 1.37 × 10−11; this enhancer module is believed to have a critical role in phenobarbital treatment of hyperbilirubinemia. Interestingly, the lead SNP, rs887829, is in strong linkage disequilibrium (LD) (r2≥0.74) with rs10929302. Taking advantage of the lower LD and shorter haplotypes in African-ancestry populations, we identified rs887829 as a more refined proxy for the causative variant influencing bilirubin levels. Also, we replicated the reported association between variants in SEMA3C and bilirubin levels. In summary, UGT1A1 is a major locus influencing bilirubin levels and the results of this study promise to contribute to understanding of the etiology and treatment of hyperbilirubinaemia in African-ancestry populations.
GWAS; replications; bilirubin; African Americans
Association studies are a staple of genotype–phenotype mapping studies, whether they are based on single markers, haplotypes, candidate genes, genome-wide genotypes, or whole genome sequences. Although genetic epidemiological studies typically contain data collected on multiple traits which themselves are often correlated, most analyses have been performed on single traits. Here, I review several methods that have been developed to perform multiple trait analysis. These methods range from traditional multivariate models for systems of equations to recently developed graphical approaches based on network theory. The application of network theory to genetics is termed systems genetics and has the potential to address long-standing questions in genetics about complex processes such as coordinate regulation, homeostasis, and pleiotropy.
multivariate analysis; pleiotropy; systems genetics
African Americans are disproportionately affected by type 2 diabetes (T2DM) yet few studies have examined T2DM using genome-wide association approaches in this ethnicity. The aim of this study was to identify genes associated with T2DM in the African American population. We performed a Genome Wide Association Study (GWAS) using the Affymetrix 6.0 array in 965 African-American cases with T2DM and end-stage renal disease (T2DM-ESRD) and 1029 population-based controls. The most significant SNPs (n = 550 independent loci) were genotyped in a replication cohort and 122 SNPs (n = 98 independent loci) were further tested through genotyping three additional validation cohorts followed by meta-analysis in all five cohorts totaling 3,132 cases and 3,317 controls. Twelve SNPs had evidence of association in the GWAS (P<0.0071), were directionally consistent in the Replication cohort and were associated with T2DM in subjects without nephropathy (P<0.05). Meta-analysis in all cases and controls revealed a single SNP reaching genome-wide significance (P<2.5×10−8). SNP rs7560163 (P = 7.0×10−9, OR (95% CI) = 0.75 (0.67–0.84)) is located intergenically between RND3 and RBM43. Four additional loci (rs7542900, rs4659485, rs2722769 and rs7107217) were associated with T2DM (P<0.05) and reached more nominal levels of significance (P<2.5×10−5) in the overall analysis and may represent novel loci that contribute to T2DM. We have identified novel T2DM-susceptibility variants in the African-American population. Notably, T2DM risk was associated with the major allele and implies an interesting genetic architecture in this population. These results suggest that multiple loci underlie T2DM susceptibility in the African-American population and that these loci are distinct from those identified in other ethnic populations.
For samples of admixed individuals, it is possible to test for both ancestry effects via admixture mapping and genotype effects via association mapping. Here, we describe a joint test called BMIX that combines admixture and association statistics at single markers. We first perform high-density admixture mapping using local ancestry. We then perform association mapping using stratified regression, wherein for each marker genotypes are stratified by local ancestry. In both stages, we use generalized linear models, providing the advantage that the joint test can be used with any phenotype distribution with an appropriate link function. To define the alternative densities for admixture mapping and association mapping, we describe a method based on autocorrelation to empirically estimate the testing burdens of admixture mapping and association mapping. We then describe a joint test that uses the posterior probabilities from admixture mapping as prior probabilities for association mapping, capitalizing on the reduced testing burden of admixture mapping relative to association mapping. By simulation, we show that BMIX is potentially orders-of-magnitude more powerful than the MIX score, which is currently the most powerful frequentist joint test. We illustrate the gain in power through analysis of fasting plasma glucose among 922 unrelated, non-diabetic, admixed African Americans from the Howard University Family Study. We detected loci at 1q24 and 6q26 as genome-wide significant via admixture mapping; both loci have been independently reported from linkage analysis. Using the association data, we resolved the 1q24 signal into two regions. One region, upstream of the gene FAM78B, contains three binding sites for the transcription factor PPARG and two binding sites for HNF1A, both previously implicated in the pathology of type 2 diabetes. The fact that both loci showed ancestry effects may provide novel insight into the genetic architecture of fasting plasma glucose in individuals of African ancestry.
Most genome-wide association studies performed to date have focused on individuals with European ancestry. Admixed African Americans tend to have disproportionately higher risk for many common, complex diseases. Disease or trait mapping in admixed individuals can benefit from joint analysis of ancestry and genotype effects. We developed a joint test that is more powerful than either admixture mapping of ancestry effects or association mapping of genotype effects performed separately. Our joint test fully capitalizes on the reduced testing burden of admixture mapping relative to association mapping. The test is based on generalized linear models and can be performed using standard statistical software. We illustrate the increased power of the joint test by detecting two loci for fasting plasma glucose in a sample of unrelated African American individuals, neither of which loci was detected as significant by traditional association analysis.
The genetic architecture of body weight and body composition is complex because these traits are normally influenced by multiple genes and their interactions, even after controlling for the environment. Bayesian methodology provides an efficient way of estimating these interactions.
Subjects and measurements
We used Bayesian model selection techniques to estimate the effect of epistatic interactions on age-related body weight (at 3, 6, and 10 weeks) and body composition (organ weights and fat-related traits) in an F2 sample obtained from a cross between high-growth (M16i) mice and low-growth (L6) mice.
We observed epistatic and main-effect quantitative trait loci (QTL) that controlled both body weight and body composition. Epistatic effects were generally more significant for WK3 and WK6 than WK10. Chromosomes 5 and 13 interacted strongly to control body weight at 3 weeks. A pleiotropic QTL on chromosome 2 was associated with body weight and some body composition phenotypes. Testis weight was regulated by a QTL on chromosome 13 with a significantly large main effect.
By analyzing epistatic interactions, we detected QTL not found in a previous analysis of this mouse population. Hence, the detection of gene-gene interactions may provide new information about the genetic architecture of complex obesity-related traits and may lead to the detection of additional obesity genes.
Bayesian methods; body weight; epistasis; obesity; quantitative trait loci
In mapping of quantitative trait loci (QTLs), performing hypothesis tests of linkage to a phenotype of interest across an entire genome involves multiple comparisons. Furthermore, linkage among loci induces correlation among tests. Under many multiple comparison frameworks, these problems are exacerbated when mapping multiple QTLs. Traditionally, significance thresholds have been subjectively set to control the probability of detecting at least one false positive outcome, although such thresholds are known to result in excessively low power to detect true positive outcomes. Recently, false discovery rate (FDR)-controlling procedures have been developed that yield more power both by relaxing the stringency of the significance threshold and by retaining more power for a given significance threshold. However, these procedures have been shown to perform poorly for mapping QTLs, principally because they ignore recombination fractions between markers. Here, I describe a procedure that accounts for recombination fractions and extends FDR control to include simultaneous control of the false non-discovery rate, i.e. the overall error rate is controlled. This procedure is developed in the Bayesian framework using a direct posterior probability approach. Data-driven significance thresholds are determined by minimizing the expected loss. The procedure is equivalent to jointly maximizing positive and negative predictive values. In the context of mapping QTLs for experimental crosses, the procedure is applicable to mapping main effects, gene–gene interactions and gene–environment interactions.
Chronic kidney disease (CKD) is an increasing global public health concern, particularly among populations of African ancestry. We performed an interrogation of known renal loci, genome-wide association (GWA), and IBC candidate-gene SNP association analyses in African Americans from the CARe Renal Consortium. In up to 8,110 participants, we performed meta-analyses of GWA and IBC array data for estimated glomerular filtration rate (eGFR), CKD (eGFR <60 mL/min/1.73 m2), urinary albumin-to-creatinine ratio (UACR), and microalbuminuria (UACR >30 mg/g) and interrogated the 250 kb flanking region around 24 SNPs previously identified in European Ancestry renal GWAS analyses. Findings were replicated in up to 4,358 African Americans. To assess function, individually identified genes were knocked down in zebrafish embryos by morpholino antisense oligonucleotides. Expression of kidney-specific genes was assessed by in situ hybridization, and glomerular filtration was evaluated by dextran clearance. Overall, 23 of 24 previously identified SNPs had direction-consistent associations with eGFR in African Americans, 2 of which achieved nominal significance (UMOD, PIP5K1B). Interrogation of the flanking regions uncovered 24 new index SNPs in African Americans, 12 of which were replicated (UMOD, ANXA9, GCKR, TFDP2, DAB2, VEGFA, ATXN2, GATM, SLC22A2, TMEM60, SLC6A13, and BCAS3). In addition, we identified 3 suggestive loci at DOK6 (p-value = 5.3×10−7) and FNDC1 (p-value = 3.0×10−7) for UACR, and KCNQ1 with eGFR (p = 3.6×10−6). Morpholino knockdown of kcnq1 in the zebrafish resulted in abnormal kidney development and filtration capacity. We identified several SNPs in association with eGFR in African Ancestry individuals, as well as 3 suggestive loci for UACR and eGFR. Functional genetic studies support a role for kcnq1 in glomerular development in zebrafish.
Chronic kidney disease (CKD) is an increasing global public health problem and disproportionately affects populations of African ancestry. Many studies have shown that genetic variants are associated with the development of CKD; however, similar studies are lacking in African ancestry populations. The CARe consortium consists of more than 8,000 individuals of African ancestry; genome-wide association analysis for renal-related phenotypes was conducted. In cross-ethnicity analyses, we found that 23 of 24 previously identified SNPs in European ancestry populations have the same effect direction in our samples of African ancestry. We also identified 3 suggestive genetic variants associated with measurement of kidney function. We then tested these genes in zebrafish knockdown models and demonstrated that kcnq1 is involved in kidney development in zebrafish. These results highlight the similarity of genetic variants across ethnicities and show that cross-species modeling in zebrafish is feasible for genes associated with chronic human disease.
Genome-wide association (GWA) studies have identified common variants that are associated with a variety of traits and diseases, but most studies have been performed in European-derived populations. Here, we describe the first genome-wide analyses of imputed genotype and copy number variants (CNVs) for anthropometric measures in African-derived populations: 1188 Nigerians from Igbo-Ora and Ibadan, Nigeria, and 743 African-Americans from Maywood, IL. To improve the reach of our study, we used imputation to estimate genotypes at ∼2.1 million single-nucleotide polymorphisms (SNPs) and also tested CNVs for association. No SNPs or common CNVs reached a genome-wide significance level for association with height or body mass index (BMI), and the best signals from a meta-analysis of the two cohorts did not replicate in ∼3700 African-Americans and Jamaicans. However, several loci previously confirmed in European populations showed evidence of replication in our GWA panel of African-derived populations, including variants near IHH and DLEU7 for height and MC4R for BMI. Analysis of global burden of rare CNVs suggested that lean individuals possess greater total burden of CNVs, but this finding was not supported in an independent European population. Our results suggest that there are not multiple loci with strong effects on anthropometric traits in African-derived populations and that sample sizes comparable to those needed in European GWA studies will be required to identify replicable associations. Meta-analysis of this data set with additional studies in African-ancestry populations will be helpful to improve power to detect novel associations.
The FTO gene is one of the most consistently replicated loci for obesity. However, data from populations of African ancestry are limited. We evaluated genetic variation in the FTO gene and investigated associations with obesity in West Africans and African Americans.
RESEARCH DESIGN AND METHODS
The study samples comprised 968 African Americans (59% female, mean age 49 years, mean BMI 30.8 kg/m2) and 517 West Africans (58% female, mean age 54 years, mean BMI 25.5 kg/m2). FTO genetic variation was evaluated by genotyping 262 tag single nucleotide polymorphisms (SNPs) across the entire gene. Association of each SNP with BMI, waist circumference, and percent fat mass was investigated under an additive model.
As expected, both African-ancestry samples showed weaker linkage disequilibrium (LD) patterns compared with other continental (e.g., European) populations. Several intron 8 SNPs, in addition to intron 1 SNPs, showed significant associations in both study samples. The combined effect size for BMI for the top SNPs from meta-analysis was 0.77 kg/m2 (P = 0.009, rs9932411) and 0.70 kg/m2 (P = 0.006, rs7191513). Two previously reported associations with intron 1 SNPs (rs1121980 and rs7204609, r2 = 0.001) were replicated among the West Africans.
The FTO gene shows significant differences in allele frequency and LD patterns in populations of African ancestry compared with other continental populations. Despite these differences, we observed evidence of associations with obesity in African Americans and West Africans, as well as evidence of heterogeneity in association. More studies of FTO in multiple ethnic groups are needed.
Imputation of genotypes for markers untyped in a study sample has become a standard approach to increase genome coverage in genome-wide association studies at practically zero cost. Most methods for imputing missing genotypes extend previously described algorithms for inferring haplotype phase. These algorithms generally fall into three classes based on the underlying model for estimating the conditional distribution of haplotype frequencies: a cluster-based model, a multinomial model, or a population genetics-based model. We compared BEAGLE, PLINK, and MACH, representing the three classes of models, respectively, with specific attention to measures of imputation success and selection of the reference panel for an admixed study sample of African Americans. Based on analysis of chromosome 22 and after calibration to a fixed level of 90% concordance between experimentally determined and imputed genotypes, MACH yielded the largest absolute number of successfully imputed markers and the largest gain in coverage of the variation captured by HapMap reference panels. Following the common practice of performing imputation once, the Yoruba in Ibadan, Nigeria (YRI) reference panel outperformed other HapMap reference panels, including 1) African ancestry from Southwest USA (ASW) data, 2) an unweighted combination of the Northern and Western Europe (CEU) and YRI data into a single reference panel, and 3) a combination of the CEU and YRI data into a single reference panel with weights matching estimates of admixture proportions. For our admixed study sample, the optimal strategy involved imputing twice with the HapMap CEU and YRI reference panels separately and then merging the data sets.
admixture; African American; coverage; reference panel
Uric acid is the primary byproduct of purine metabolism. Hyperuricemia is associated with body mass index (BMI), sex, and multiple complex diseases including gout, hypertension (HTN), renal disease, and type 2 diabetes (T2D). Multiple genome-wide association studies (GWAS) in individuals of European ancestry (EA) have reported associations between serum uric acid levels (SUAL) and specific genomic loci. The purposes of this study were: 1) to replicate major signals reported in EA populations; and 2) to use the weak LD pattern in African ancestry population to better localize (fine-map) reported loci and 3) to explore the identification of novel findings cognizant of the moderate sample size.
African American (AA) participants (n = 1,017) from the Howard University Family Study were included in this study. Genotyping was performed using the Affymetrix® Genome-wide Human SNP Array 6.0. Imputation was performed using MACH and the HapMap reference panels for CEU and YRI. A total of 2,400,542 single nucleotide polymorphisms (SNPs) were assessed for association with serum uric acid under the additive genetic model with adjustment for age, sex, BMI, glomerular filtration rate, HTN, T2D, and the top two principal components identified in the assessment of admixture and population stratification.
Four variants in the gene SLC2A9 achieved genome-wide significance for association with SUAL (p-values ranging from 8.88 × 10-9 to 1.38 × 10-9). Fine-mapping of the SLC2A9 signals identified a 263 kb interval of linkage disequilibrium in the HapMap CEU sample. This interval was reduced to 37 kb in our AA and the HapMap YRI samples.
The most strongly associated locus for SUAL in EA populations was also the most strongly associated locus in this AA sample. This finding provides evidence for the role of SLC2A9 in uric acid metabolism across human populations. Additionally, our findings demonstrate the utility of following-up EA populations GWAS signals in African-ancestry populations with weaker linkage disequilibrium.