Despite evidence of an association between variants at the apolipoprotein L1 gene (APOL1) locus and a spectrum of related kidney diseases, underlying biological mechanisms remain unknown. An earlier preliminary study published by our group showed that an APOL1 variant (rs73885319) modified the association between high-density lipoprotein cholesterol (HDLC) and estimated glomerular filtration rate (eGFR) in African Americans. To further understand this relationship, we evaluated the interaction in two additional large cohorts of African Americans for a total of 3,592 unrelated individuals from the Howard University Family Study (HUFS), the Natural History of APOL1-Associated Nephropathy Study (NHAAN), and the Atherosclerosis Risk in Communities Study (ARIC). The association between HDLC and eGFR was determined using linear mixed models, and the interaction between rs73885319 genotype and HDLC was evaluated using a multiplicative term.
Among individuals homozygous for the risk genotype, a strong inverse HDLC-eGFR association was observed, with a positive association in others (p for the interaction of the rs73885319 × HDLC =0.0001). The interaction was similar in HUFS and NHAAN, and attenuated in ARIC. Given that ARIC participants were older, we investigated an age effect; age was a significant modifier of the observed interaction. When older individuals were excluded, the interaction in ARIC was similar to that in the other studies.
Based on these findings, it is clear that the relationship between HDLC and eGFR is strongly influenced by the APOL1 rs73885319 kidney risk genotype. Moreover, the degree to which this variant modifies the association may depend on the age of the individual. More detailed physiological studies are warranted to understand how rs73885319 may affect the relationship between HDLC and eGFR in individuals with and without disease and across the lifespan.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1645-7) contains supplementary material, which is available to authorized users.
Apolipoprotein L1; High-density lipoprotein cholesterol; African ancestry; Glomerular filtration rate
Type 2 diabetes (T2D) is more prevalent in African Americans than in Europeans. However, little is known about the genetic risk in African Americans despite the recent identification of more than 70 T2D loci primarily by genome-wide association studies (GWAS) in individuals of European ancestry. In order to investigate the genetic architecture of T2D in African Americans, the MEta-analysis of type 2 DIabetes in African Americans (MEDIA) Consortium examined 17 GWAS on T2D comprising 8,284 cases and 15,543 controls in African Americans in stage 1 analysis. Single nucleotide polymorphisms (SNPs) association analysis was conducted in each study under the additive model after adjustment for age, sex, study site, and principal components. Meta-analysis of approximately 2.6 million genotyped and imputed SNPs in all studies was conducted using an inverse variance-weighted fixed effect model. Replications were performed to follow up 21 loci in up to 6,061 cases and 5,483 controls in African Americans, and 8,130 cases and 38,987 controls of European ancestry. We identified three known loci (TCF7L2, HMGA2 and KCNQ1) and two novel loci (HLA-B and INS-IGF2) at genome-wide significance (4.15×10−94
Despite the higher prevalence of type 2 diabetes (T2D) in African Americans than in Europeans, recent genome-wide association studies (GWAS) were examined primarily in individuals of European ancestry. In this study, we performed meta-analysis of 17 GWAS in 8,284 cases and 15,543 controls to explore the genetic architecture of T2D in African Americans. Following replication in additional 6,061 cases and 5,483 controls in African Americans, and 8,130 cases and 38,987 controls of European ancestry, we identified two novel and three previous reported T2D loci reaching genome-wide significance. We also examined 158 loci previously reported to be associated with T2D or regulating glucose homeostasis. While 56% of these loci were shared between African Americans and the other populations, the strongest associations in African Americans are often found in nearby single nucleotide polymorphisms (SNPs) instead of the original SNPs reported in other populations due to differential genetic architecture across populations. Our results highlight the importance of performing genetic studies in non-European populations to fine map the causal genetic variants.
Although a considerable proportion of serum lipids loci identified in European ancestry individuals (EA) replicate in African Americans (AA), interethnic differences in the distribution of serum lipids suggest that some genetic determinants differ by ethnicity. We conducted a comprehensive evaluation of five lipid candidate genes to identify variants with ethnicity-specific effects. We sequenced ABCA1, LCAT, LPL, PON1, and SERPINE1 in 48 AA individuals with extreme serum lipid concentrations (high HDLC/low TG or low HDLC/high TG). Identified variants were genotyped in the full population-based sample of AA (n = 1694) and tested for an association with serum lipids. rs328 (LPL) and correlated variants were associated with higher HDLC and lower TG. Interestingly, a stronger effect was observed on a “European” vs. “African” genetic background at this locus. To investigate this effect, we evaluated the region among West Africans (WA). For TG, the effect size among WA was the same in AA with only African local ancestry (2–3% lower TG), while the larger association among AA with local European ancestry matched previous reports in EA (10%). For HDLC, there was no association with rs328 in AA with only African local ancestry or in WA, while the association among AA with European local ancestry was much greater than what has been observed for EA (15 vs. ∼5 mg/dl), suggesting an interaction with an environmental or genetic factor that differs by ethnicity. Beyond this ancestry effect, the importance of African ancestry-focused, sequence-based work was also highlighted by serum lipid associations of variants that were in higher frequency (or present only) among those of African ancestry. By beginning our study with the sequence variation present in AA individuals, investigating local ancestry effects, and seeking replication in WA, we were able to comprehensively evaluate the role of a set of candidate genes in serum lipids in AA.
Most of the work on the genetic epidemiology of serum lipids in African Americans (AA) has focused on replicating findings that were identified in European ancestry individuals. While this can be very informative about the generalizability of lipids loci across populations, African ancestry-specific variation will be missed using this approach. Our aim was to comprehensively evaluate five lipid candidate genes in an AA population, from the identification of variants of interest to population-level analysis of high-density lipoprotein cholesterol (HDLC) and triglycerides (TG). We sequenced five genes in individuals with extreme lipids (n = 48) drawn from a population-based study of AA. The variants identified were genotyped in 1,694 AA and analyzed. Notable among the findings were the observation of ancestry specific effect for several variants in the LPL gene among these admixed individuals, with a greater effect observed among those with European ancestry in this region. These associations were further elucidated by replication in West Africans. By beginning with the sequence variation present among AA, investigating ancestry effects, and seeking replication in West Africans, we were able to comprehensively evaluate these candidate genes with a focus on African ancestry individuals.
Admixture mapping is a powerful method of gene mapping for diseases or traits that show differential risk by ancestry. Admixture mapping has been applied most often to African Americans who trace ancestry to Europeans and West Africans. Recent developments in admixture mapping include improvements in methods to take advantage of higher densities of genetic variants as well as extensions to admixed populations with three or more ancestral populations, such as Latino Americans. In this unit, I outline the key concepts of admixture mapping. I describe several approaches for inferring local ancestry and provide strategies for performing admixture mapping depending on the study design. Finally, I compare and contrast linkage analysis, association analysis, and admixture mapping, with an emphasis on integrating admixture mapping and association testing.
admixture; admixture mapping; ancestry
Genome-wide association studies (GWAS) have identified 36 loci associated with body mass index (BMI), predominantly in populations of European ancestry. We conducted a meta-analysis to examine the association of >3.2 million SNPs with BMI in 39,144 men and women of African ancestry, and followed up the most significant associations in an additional 32,268 individuals of African ancestry. We identified one novel locus at 5q33 (GALNT10, rs7708584, p=3.4×10−11) and another at 7p15 when combined with data from the Giant consortium (MIR148A/NFE2L3, rs10261878, p=1.2×10−10). We also found suggestive evidence of an association at a third locus at 6q16 in the African ancestry sample (KLHL32, rs974417, p=6.9×10−8). Thirty-two of the 36 previously established BMI variants displayed directionally consistent effect estimates in our GWAS (binomial p=9.7×10−7), of which five reached genome-wide significance. These findings provide strong support for shared BMI loci across populations as well as for the utility of studying ancestrally diverse populations.
Insulin resistance (IR) is a key determinant of type 2 diabetes (T2D) and other metabolic disorders. This genome-wide association study (GWAS) was designed to shed light on the genetic basis of fasting insulin (FI) and IR in 927 non-diabetic African Americans. 5 396 838 single-nucleotide polymorphisms (SNPs) were tested for associations with FI or IR with adjustments for age, sex, body mass index, hypertension status and first two principal components. Genotyped SNPs (n = 12) with P < 5 × 10−6 in African Americans were carried forward for de novo genotyping in 570 non-diabetic West Africans. We replicated SNPs in or near SC4MOL and TCERG1L in West Africans. The meta-analysis of 1497 African Americans and West Africans yielded genome-wide significant associations for SNPs in the SC4MOL gene: rs17046216 (P = 1.7 × 10−8 and 2.9 × 10−8 for FI and IR, respectively); and near the TCERG1L gene with rs7077836 as the top scoring (P = 7.5 × 10−9 and 4.9 × 10−10 for FI and IR, respectively). In silico replication in the MAGIC study (n = 37 037) showed weak but significant association (adjusted P-value of 0.0097) for rs34602777 in the MYO5A gene. In addition, we replicated previous GWAS findings for IR and FI in Europeans for GCKR, and for variants in four T2D loci (FTO, IRS1, KLF14 and PPARG) which exert their action via IR. In summary, variants in/near SC4MOL, and TCERG1L were associated with FI and IR in this cohort of African Americans and were replicated in West Africans. SC4MOL is under-expressed in an animal model of T2D and plays a key role in lipid biosynthesis, with implications for the regulation of energy metabolism, obesity and dyslipidemia. TCERG1L is associated with plasma adiponectin, a key modulator of obesity, inflammation, IR and diabetes.
Central obesity, measured by waist circumference (WC) or waist-hip ratio (WHR), is a marker of body fat distribution. Although obesity disproportionately affects minority populations, few studies have conducted genome-wide association study (GWAS) of fat distribution among those of predominantly African ancestry (AA). We performed GWAS of WC and WHR, adjusted and unadjusted for BMI, in up to 33,591 and 27,350 AA individuals, respectively. We identified loci associated with fat distribution in AA individuals using meta-analyses of GWA results for WC and WHR (stage 1). Overall, 25 SNPs with single genomic control (GC)-corrected p-values<5.0×10−6 were followed-up (stage 2) in AA with WC and with WHR. Additionally, we interrogated genomic regions of previously identified European ancestry (EA) WHR loci among AA. In joint analysis of association results including both Stage 1 and 2 cohorts, 2 SNPs demonstrated association, rs2075064 at LHX2, p = 2.24×10−8 for WC-adjusted-for-BMI, and rs6931262 at RREB1, p = 2.48×10−8 for WHR-adjusted-for-BMI. However, neither signal was genome-wide significant after double GC-correction (LHX2: p = 6.5×10−8; RREB1: p = 5.7×10−8). Six of fourteen previously reported loci for waist in EA populations were significant (p<0.05 divided by the number of independent SNPs within the region) in AA studied here (TBX15-WARS2, GRB14, ADAMTS9, LY86, RSPO3, ITPR2-SSPN). Further, we observed associations with metabolic traits: rs13389219 at GRB14 associated with HDL-cholesterol, triglycerides, and fasting insulin, and rs13060013 at ADAMTS9 with HDL-cholesterol and fasting insulin. Finally, we observed nominal evidence for sexual dimorphism, with stronger results in AA women at the GRB14 locus (p for interaction = 0.02). In conclusion, we identified two suggestive loci associated with fat distribution in AA populations in addition to confirming 6 loci previously identified in populations of EA. These findings reinforce the concept that there are fat distribution loci that are independent of generalized adiposity.
Central obesity is a marker of body fat distribution and is known to have a genetic underpinning. Few studies have reported genome-wide association study (GWAS) results among individuals of predominantly African ancestry (AA). We performed a collaborative meta-analysis in order to identify genetic loci associated with body fat distribution in AA individuals using waist circumference (WC) and waist to hip ratio (WHR) as measures of fat distribution, with and without adjustment for body mass index (BMI). We uncovered 2 genetic loci potentially associated with fat distribution: LHX2 in association with WC-adjusted-for-BMI and at RREB1 for WHR-adjusted-for-BMI. Six of fourteen previously reported loci for waist in EA populations were significant in AA studied here (TBX15-WARS2, GRB14, ADAMTS9, LY86, RSPO3, ITPR2-SSPN). These findings reinforce the concept that there are loci for body fat distribution that are independent of generalized adiposity.
C-reactive protein (CRP) is an acute phase reactant protein produced primarily by the liver. Circulating CRP levels are influenced by genetic and non-genetic factors, including infection and obesity. Genome-wide association studies (GWAS) provide an unbiased approach towards identifying loci influencing CRP levels. None of the six GWAS for CRP levels has been conducted in an African ancestry population. The present study aims to: (i) identify genetic variants that influence serum CRP in African Americans (AA) using a genome-wide association approach and replicate these findings in West Africans (WA), (ii) assess transferability of major signals for CRP reported in European ancestry populations (EA) to AA and (iii) use the weak linkage disequilibrium (LD) structure characteristic of African ancestry populations to fine-map the previously reported CRP locus. The discovery cohort comprised 837 unrelated AA, with the replication of significant single-nucleotide polymorphisms (SNPs) assessed in 486 WA. The association analysis was conducted with 2 366 856 genotyped and imputed SNPs under an additive genetic model with adjustment for appropriate covariates. Genome-wide and replication significances were set at P < 5 × 10−8 and P < 0.05, respectively. Ten SNPs in (CRP pseudogene-1) CRPP1 and CRP genes were associated with serum CRP (P = 2.4 × 10−09 to 4.3 × 10−11). All but one of the top-scoring SNPs associated with CRP in AA were successfully replicated in WA. CRP signals previously identified in EA samples were transferable to AAs, and we were able to fine-map this signal, reducing the region of interest from the 25 kb of LD around the locus in the HapMap CEU sample to only 8 kb in our AA sample.
Principal components analysis of genetic data has benefited from advances in random matrix theory. The Tracy-Widom distribution has been identified as the limiting distribution of the lead eigenvalue, enabling formal hypothesis testing of population structure. Additionally, a phase change exists between small and large eigenvalues, such that population divergence below a threshold of FST is impossible to detect and above which it is always detectable. I show that the plug-in estimate of the effective number of markers in the EIGEN-SOFT software often exceeds the rank of the sample covariance matrix, leading to a systematic overestimation of the number of significant principal components. I describe an alternative plug-in estimate that eliminates the problem. This improvement is not just an asymptotic result but is directly applicable to finite samples. The minimum average partial test, based on minimizing the average squared partial correlation between individuals, can detect population structure at smaller FST values than the corrected test. The minimum average partial test is applicable to both unadmixed and admixed samples, with arbitrary numbers of discrete subpopulations or parental populations, respectively. Application of the minimum average partial test to the 11 HapMap Phase III samples, comprising 8 unadmixed samples and 3 admixed samples, revealed 13 significant principal components.
Admixture; Population stratification; Population structure; Principal components analysis
Interleukins (ILs) are key mediators of the immune response and inflammatory process. Plasma levels of IL-10, IL-1Ra, and IL-6 are associated with metabolic conditions, show large inter-individual variations, and are under strong genetic control. Therefore, elucidation of the genetic variants that influence levels of these ILs provides useful insights into mechanisms of immune response and pathogenesis of diseases. We conducted a genome-wide association study (GWAS) of IL-10, IL-1Ra, and IL-6 levels in 707 non-diabetic African Americans using 5,396,780 imputed and directly genotyped single nucleotide polymorphisms (SNPs) with adjustment for gender, age, and body mass index. IL-10 levels showed genome-wide significant associations (p<5×10−8) with eight SNPs, the most significant of which was rs5743185 in thePMS1 gene (p=2.30×10−10). We tested replication of SNPs that showed genome-wide significance in 425 non-diabetic individuals from West Africa, and successfully replicated SNP rs17365948 in the YWHAZ gene (p=0.02). IL-1Ra levels showed suggestive associations with two SNPs in the ASB3 gene (p=2.55×10−7), 10 SNPs in the IL-1 gene family (IL1F5, IL1F8, IL1F10, and IL1Ra, p=1.04×10−6 to 1.75×10−6), and 23 SNPs near the IL1A gene (p=1.22×10−6 to 1.63×10−6). We also successfully replicated rs4251961 (p=0.009); this SNP was reported to be associated with IL-1Ra levels in a candidate gene study of Europeans. IL-6 levels showed genome-wide significant association with one SNP (RP11-314E23.1; chr6:133397598; p=8.63×10−9). To our knowledge, this is the first GWAS on IL-10, IL-1Ra, and IL-6 levels. Follow-up of these findings may provide valuable insight into the pathobiology of IL actions and dysregulations in inflammation and human diseases.
interleukin; interleukin-10; interleukin-1Ra; interleukin-6; genome-wide association study; African American
Serum urate concentrations are highly heritable and elevated serum urate is a key risk factor for gout. Genome-wide association studies (GWAS) of serum urate in African American (AA) populations are lacking. We conducted a meta-analysis of GWAS of serum urate levels and gout among 5820 AA and a large candidate gene study among 6890 AA and 21 708 participants of European ancestry (EA) within the Candidate Gene Association Resource Consortium. Findings were tested for replication among 1996 independent AA individuals, and evaluated for their association among 28 283 EA participants of the CHARGE Consortium. Functional studies were conducted using 14C-urate transport assays in mammalian Chinese hamster ovary cells. In the discovery GWAS of serum urate, three loci achieved genome-wide significance (P< 5.0 × 10−8): a novel locus near SGK1/SLC2A12 on chromosome 6 (rs9321453, P= 1.0 × 10−9), and two loci previously identified in EA participants, SLC2A9 (P= 3.8 × 10−32) and SLC22A12 (P= 2.1 × 10−10). A novel rare non-synonymous variant of large effect size in SLC22A12, rs12800450 (minor allele frequency 0.01, G65W), was identified and replicated (beta −1.19 mg/dl, P= 2.7 × 10−16). 14C-urate transport assays showed reduced urate transport for the G65W URAT1 mutant. Finally, in analyses of 11 loci previously associated with serum urate in EA individuals, 10 of 11 lead single-nucleotide polymorphisms showed direction-consistent association with urate among AA. In summary, we identified and replicated one novel locus in association with serum urate levels and experimentally characterize the novel G65W variant in URAT1 as a functional allele. Our data support the importance of multi-ethnic GWAS in the identification of novel risk loci as well as functional variants.
A recent, large genome-wide association study (GWAS) of European ancestry individuals has identified multiple genetic variants influencing serum lipids. Studies of the transferability of these associations to African Americans remain few, an important limitation given interethnic differences in serum lipids and the disproportionate burden of lipid-associated metabolic diseases among African Americans.
We attempted to evaluate the transferability of 95 lipid-associated loci recently identified in European ancestry individuals to 887 non-diabetic, unrelated African Americans from a population-based sample in the Washington, DC area. Additionally, we took advantage of the generally reduced linkage disequilibrium among African ancestry populations in comparison to European ancestry populations to fine-map replicated GWAS signals.
We successfully replicated reported associations for 10 loci (CILP2/SF4, STARD3, LPL, CYP7A1, DOCK7/ANGPTL3, APOE, SORT1, IRS1, CETP, and UBASH3B). Through trans-ethnic fine-mapping, we were able to reduce associated regions around 75% of the loci that replicated.
Between this study and previous work in African Americans, 40 of the 95 loci reported in a large GWAS of European ancestry individuals also influence lipid levels in African Americans. While there is now evidence that the lipid-influencing role of a number of genetic variants is observed in both European and African ancestry populations, the still considerable lack of concordance highlights the importance of continued ancestry-specific studies to elucidate the genetic underpinnings of these traits.
Lipids; Genetics; African Americans; Genome-wide association study; Ethnicity
To identify genetic loci that regulate spontaneous arthritis in interleukin-1 receptor antagonist (IL-1ra)-deficient mice, an F2 population was created from a cross between Balb/c IL-1ra-deficient mice and DBA/1 IL-1ra-deficient mice. Spontaneous arthritis in the F2 population was examined and recorded. Genotypes of those F2 mice were determined using microsatellite markers. Quantitative trail locus (QTL) analysis was conducted with R/qtlbim. Functions of genes within QTL chromosomal regions were evaluated using a bioinformatics tool, PGMapper, and microarray analysis. Potential candidate genes were further evaluated using GeneNetwork. A total of 137 microsatellite markers with an average of 12 cM spacing along the whole genome were used for determining the correlation of arthritis phenotypes with genotypes of 191 F2 progenies. By whole-genome mapping, we obtained QTLs on chromosomes 1 and 6 that were above the significance threshold for strong Bayesian evidence. The QTL on chromosome 1 had a peak near D1Mit55 and D1Mit425 at 82·6 cM. It may account for as much as 12% of the phenotypic variation in susceptibility to spontaneous arthritis. The QTL region contained 208 known transcripts. According to their functions, Mr1, Pla2g4a and Fasl are outstanding candidate genes. From microarray analysis, 11 genes were selected as favourable candidates based on their function and expression profiles. Three of those 11 genes, Prg4, Ptgs2 and Mr1, correlated with the IL-1ra pathway. Those genes were considered to be the best candidates.
The incidence of chronic kidney disease varies by ethnic group in the USA, with African Americans displaying a two-fold higher rate than European Americans. One of the two defining variables underlying staging of chronic kidney disease is the glomerular filtration rate. Meta-analysis in individuals of European ancestry has identified 23 genetic loci associated with the estimated glomerular filtration rate (eGFR). We conducted a follow-up study of these 23 genetic loci using a population-based sample of 1,018 unrelated admixed African Americans. We included in our follow-up study two variants in APOL1 associated with end-stage kidney disease discovered by admixture mapping in admixed African Americans. To address confounding due to admixture, we estimated local ancestry at each marker and global ancestry. We performed regression analysis stratified by local ancestry and combined the resulting regression estimates across ancestry strata using an inverse variance-weighted fixed effects model. We found that 11 of the 24 loci were significantly associated with eGFR in our sample. The effect size estimates were not significantly different between the subgroups of individuals with two copies of African ancestry vs. two copies of European ancestry for any of the 11 loci. In contrast, allele frequencies were significantly different at 10 of the 11 loci. Collectively, the 11 loci, including four secondary signals revealed by conditional analyses, explained 14.2% of the phenotypic variance in eGFR, in contrast to the 1.4% explained by the 24 loci in individuals of European ancestry. Our findings provide insight into the genetic basis of variation in renal function among admixed African Americans.
Low levels of high-density cholesterol (HDLc) accompany chronic kidney disease, but the association between HDLc and the estimated glomerular filtration rate (eGFR) in the general population is unclear. We investigated the HDLc-eGFR association in nondiabetic Han Chinese (HC, n = 1100), West Africans (WA, n = 1497), and African Americans (AA, n = 1539).
There were significant differences by ancestry: HDLc was positively associated with eGFR in HC (β = 0.13, P < 0.0001), but negatively associated among African ancestry populations (WA: −0.19, P < 0.0001; AA: −0.09, P = 0.02). These differences were also seen in nationally-representative NHANES data (among European Americans: 0.09, P = 0.005; among African Americans −0.14, P = 0.03). To further explore the findings in African ancestry populations, we investigated the role of an African ancestry-specific nephropathy risk variant, rs73885319, in the gene encoding HDL-associated APOL1. Among AA, an inverse HDLc-eGFR association was observed only with the risk genotype (−0.38 versus 0.001; P = 0.03). This interaction was not seen in WA.
In summary, counter to expectation, an inverse HDLc-eGFR association was observed among those of African ancestry. Given the APOL1 × HDLc interaction among AA, genetic factors may contribute to this paradoxical association. Notably, these findings suggest that the unexplained mechanism by which APOL1 affects kidney-disease risk may involve HDLc.
Advances in technology and reduced costs are facilitating large-scale sequencing of genes and exomes as well as entire genomes. Recently, we described an approach based on haplotypes called SCARVA1 that enables the simultaneous analysis of the association between rare and common variants in disease etiology. Here, we describe an extension of SCARVA that evaluates individual markers instead of haplotypes. This modified method (SCARVAsnp) is implemented in four stages. First, all common variants in a pre-specified region (eg, gene) are evaluated individually. Second, a union procedure is used to combined all rare variants (RVs) in the index region, and the ratio of the log likelihood with one RV excluded to the log likelihood of a model with all the collapsed RVs is calculated. On the basis of previously-reported simulation studies,1 a likelihood ratio ≥1.3 is considered statistically significant. Third, the direction of the association of the removed RV is determined by evaluating the change in λ values with the inclusion and exclusion of that RV. Lastly, significant common and rare variants, along with covariates, are included in a final regression model to evaluate the association between the trait and variants in that region. We apply simulated and real data sets to show that the method is simple to use, computationally effcient, and that it can accurately identify both common and rare risk variants. This method overcomes several limitations of existing methods. For example, SCARVAsnp limits loss of statistical power by not including variants that are not associated with the trait of interest in the final model. Also, SCARVAsnp takes into consideration the direction of association by effectively modelling positively and negatively associated variants.
complex traits; rare and common variants
Recent developments in high-throughput genotyping and whole-genome sequencing will enhance the identification of disease loci in admixed populations. We discuss how a more refined estimation of ancestry benefits both admixture mapping and association mapping, making disease loci identification in admixed populations more powerful.
High-throughput genotyping and sequencing will enable refined estimation of ancestry, thus enhancing disease loci identification in admixed populations
Principal components analysis of genetic data is used to avoid inflation in type I error rates in association testing due to population stratification by covariate adjustment using the top eigenvectors and to estimate cluster or group membership independent of self-reported or ethnic identities. Eigendecomposition transforms correlated variables into an equal number of uncorrelated variables. Numerous stopping rules have been developed to identify which principal components should be retained. Recent developments in random matrix theory have led to a formal hypothesis test of the top eigenvalue, providing another way to achieve dimension reduction. In this study, I compare Velicer’s minimum average partial test to a test based on the Tracy-Widom distribution as implemented in EIGENSOFT, the most widely used implementation of principal components analysis in genome-wide association analysis. By computer simulation of vicariance based on coalescent theory, EIGENSOFT systematically overestimates the number of significant principal components. Furthermore, this overestimation is larger for samples of admixed individuals than for samples of unadmixed individuals. Overestimating the number of significant principal components can potentially lead to a loss of power in association testing by adjusting for unnecessary covariates and may lead to incorrect inferences about group differentiation. Velicer’s minimum average partial test is shown to have both smaller bias and smaller variance, often with a mean squared error of zero, in estimating the number of principal components to retain. Velicer’s minimum average partial test is implemented in R code and is suitable for genome-wide genotype data with or without population labels.
admixture; population stratification; principal components; stopping rule; vicariance
Total serum bilirubin is associated with several clinical outcomes, including cardiovascular disease, diabetes and drug metabolism. We conducted a genome-wide association study in 619 healthy unrelated African Americans in an attempt to replicate reported findings in Europeans and Asians and to identify novel loci influencing total serum bilirubin levels. We analyzed a dense panel of over two million genotyped and imputed SNPs in additive genetic models adjusting for age, sex, and the first two significant principal components from the sample covariance matrix of genotypes. Thirty-nine SNPs spanning a 78 kb region within the UGT1A1 displayed P-values <5 × 10−8. The lowest P-value was 1.7 × 10−22 for SNP rs887829. None of SNPs in the UGT1A1 remained statistically significant in conditional association analyses that adjusted for rs887829. In addition, SNP rs10929302 located in phenobarbital response enhancer module was significantly associated with bilirubin level with a P-value of 1.37 × 10−11; this enhancer module is believed to have a critical role in phenobarbital treatment of hyperbilirubinemia. Interestingly, the lead SNP, rs887829, is in strong linkage disequilibrium (LD) (r2≥0.74) with rs10929302. Taking advantage of the lower LD and shorter haplotypes in African-ancestry populations, we identified rs887829 as a more refined proxy for the causative variant influencing bilirubin levels. Also, we replicated the reported association between variants in SEMA3C and bilirubin levels. In summary, UGT1A1 is a major locus influencing bilirubin levels and the results of this study promise to contribute to understanding of the etiology and treatment of hyperbilirubinaemia in African-ancestry populations.
GWAS; replications; bilirubin; African Americans
Association studies are a staple of genotype–phenotype mapping studies, whether they are based on single markers, haplotypes, candidate genes, genome-wide genotypes, or whole genome sequences. Although genetic epidemiological studies typically contain data collected on multiple traits which themselves are often correlated, most analyses have been performed on single traits. Here, I review several methods that have been developed to perform multiple trait analysis. These methods range from traditional multivariate models for systems of equations to recently developed graphical approaches based on network theory. The application of network theory to genetics is termed systems genetics and has the potential to address long-standing questions in genetics about complex processes such as coordinate regulation, homeostasis, and pleiotropy.
multivariate analysis; pleiotropy; systems genetics
African Americans are disproportionately affected by type 2 diabetes (T2DM) yet few studies have examined T2DM using genome-wide association approaches in this ethnicity. The aim of this study was to identify genes associated with T2DM in the African American population. We performed a Genome Wide Association Study (GWAS) using the Affymetrix 6.0 array in 965 African-American cases with T2DM and end-stage renal disease (T2DM-ESRD) and 1029 population-based controls. The most significant SNPs (n = 550 independent loci) were genotyped in a replication cohort and 122 SNPs (n = 98 independent loci) were further tested through genotyping three additional validation cohorts followed by meta-analysis in all five cohorts totaling 3,132 cases and 3,317 controls. Twelve SNPs had evidence of association in the GWAS (P<0.0071), were directionally consistent in the Replication cohort and were associated with T2DM in subjects without nephropathy (P<0.05). Meta-analysis in all cases and controls revealed a single SNP reaching genome-wide significance (P<2.5×10−8). SNP rs7560163 (P = 7.0×10−9, OR (95% CI) = 0.75 (0.67–0.84)) is located intergenically between RND3 and RBM43. Four additional loci (rs7542900, rs4659485, rs2722769 and rs7107217) were associated with T2DM (P<0.05) and reached more nominal levels of significance (P<2.5×10−5) in the overall analysis and may represent novel loci that contribute to T2DM. We have identified novel T2DM-susceptibility variants in the African-American population. Notably, T2DM risk was associated with the major allele and implies an interesting genetic architecture in this population. These results suggest that multiple loci underlie T2DM susceptibility in the African-American population and that these loci are distinct from those identified in other ethnic populations.
For samples of admixed individuals, it is possible to test for both ancestry effects via admixture mapping and genotype effects via association mapping. Here, we describe a joint test called BMIX that combines admixture and association statistics at single markers. We first perform high-density admixture mapping using local ancestry. We then perform association mapping using stratified regression, wherein for each marker genotypes are stratified by local ancestry. In both stages, we use generalized linear models, providing the advantage that the joint test can be used with any phenotype distribution with an appropriate link function. To define the alternative densities for admixture mapping and association mapping, we describe a method based on autocorrelation to empirically estimate the testing burdens of admixture mapping and association mapping. We then describe a joint test that uses the posterior probabilities from admixture mapping as prior probabilities for association mapping, capitalizing on the reduced testing burden of admixture mapping relative to association mapping. By simulation, we show that BMIX is potentially orders-of-magnitude more powerful than the MIX score, which is currently the most powerful frequentist joint test. We illustrate the gain in power through analysis of fasting plasma glucose among 922 unrelated, non-diabetic, admixed African Americans from the Howard University Family Study. We detected loci at 1q24 and 6q26 as genome-wide significant via admixture mapping; both loci have been independently reported from linkage analysis. Using the association data, we resolved the 1q24 signal into two regions. One region, upstream of the gene FAM78B, contains three binding sites for the transcription factor PPARG and two binding sites for HNF1A, both previously implicated in the pathology of type 2 diabetes. The fact that both loci showed ancestry effects may provide novel insight into the genetic architecture of fasting plasma glucose in individuals of African ancestry.
Most genome-wide association studies performed to date have focused on individuals with European ancestry. Admixed African Americans tend to have disproportionately higher risk for many common, complex diseases. Disease or trait mapping in admixed individuals can benefit from joint analysis of ancestry and genotype effects. We developed a joint test that is more powerful than either admixture mapping of ancestry effects or association mapping of genotype effects performed separately. Our joint test fully capitalizes on the reduced testing burden of admixture mapping relative to association mapping. The test is based on generalized linear models and can be performed using standard statistical software. We illustrate the increased power of the joint test by detecting two loci for fasting plasma glucose in a sample of unrelated African American individuals, neither of which loci was detected as significant by traditional association analysis.