Serum urate concentrations are highly heritable and elevated serum urate is a key risk factor for gout. Genome-wide association studies (GWAS) of serum urate in African American (AA) populations are lacking. We conducted a meta-analysis of GWAS of serum urate levels and gout among 5820 AA and a large candidate gene study among 6890 AA and 21 708 participants of European ancestry (EA) within the Candidate Gene Association Resource Consortium. Findings were tested for replication among 1996 independent AA individuals, and evaluated for their association among 28 283 EA participants of the CHARGE Consortium. Functional studies were conducted using 14C-urate transport assays in mammalian Chinese hamster ovary cells. In the discovery GWAS of serum urate, three loci achieved genome-wide significance (P< 5.0 × 10−8): a novel locus near SGK1/SLC2A12 on chromosome 6 (rs9321453, P= 1.0 × 10−9), and two loci previously identified in EA participants, SLC2A9 (P= 3.8 × 10−32) and SLC22A12 (P= 2.1 × 10−10). A novel rare non-synonymous variant of large effect size in SLC22A12, rs12800450 (minor allele frequency 0.01, G65W), was identified and replicated (beta −1.19 mg/dl, P= 2.7 × 10−16). 14C-urate transport assays showed reduced urate transport for the G65W URAT1 mutant. Finally, in analyses of 11 loci previously associated with serum urate in EA individuals, 10 of 11 lead single-nucleotide polymorphisms showed direction-consistent association with urate among AA. In summary, we identified and replicated one novel locus in association with serum urate levels and experimentally characterize the novel G65W variant in URAT1 as a functional allele. Our data support the importance of multi-ethnic GWAS in the identification of novel risk loci as well as functional variants.
A recent, large genome-wide association study (GWAS) of European ancestry individuals has identified multiple genetic variants influencing serum lipids. Studies of the transferability of these associations to African Americans remain few, an important limitation given interethnic differences in serum lipids and the disproportionate burden of lipid-associated metabolic diseases among African Americans.
We attempted to evaluate the transferability of 95 lipid-associated loci recently identified in European ancestry individuals to 887 non-diabetic, unrelated African Americans from a population-based sample in the Washington, DC area. Additionally, we took advantage of the generally reduced linkage disequilibrium among African ancestry populations in comparison to European ancestry populations to fine-map replicated GWAS signals.
We successfully replicated reported associations for 10 loci (CILP2/SF4, STARD3, LPL, CYP7A1, DOCK7/ANGPTL3, APOE, SORT1, IRS1, CETP, and UBASH3B). Through trans-ethnic fine-mapping, we were able to reduce associated regions around 75% of the loci that replicated.
Between this study and previous work in African Americans, 40 of the 95 loci reported in a large GWAS of European ancestry individuals also influence lipid levels in African Americans. While there is now evidence that the lipid-influencing role of a number of genetic variants is observed in both European and African ancestry populations, the still considerable lack of concordance highlights the importance of continued ancestry-specific studies to elucidate the genetic underpinnings of these traits.
Lipids; Genetics; African Americans; Genome-wide association study; Ethnicity
To identify genetic loci that regulate spontaneous arthritis in interleukin-1 receptor antagonist (IL-1ra)-deficient mice, an F2 population was created from a cross between Balb/c IL-1ra-deficient mice and DBA/1 IL-1ra-deficient mice. Spontaneous arthritis in the F2 population was examined and recorded. Genotypes of those F2 mice were determined using microsatellite markers. Quantitative trail locus (QTL) analysis was conducted with R/qtlbim. Functions of genes within QTL chromosomal regions were evaluated using a bioinformatics tool, PGMapper, and microarray analysis. Potential candidate genes were further evaluated using GeneNetwork. A total of 137 microsatellite markers with an average of 12 cM spacing along the whole genome were used for determining the correlation of arthritis phenotypes with genotypes of 191 F2 progenies. By whole-genome mapping, we obtained QTLs on chromosomes 1 and 6 that were above the significance threshold for strong Bayesian evidence. The QTL on chromosome 1 had a peak near D1Mit55 and D1Mit425 at 82·6 cM. It may account for as much as 12% of the phenotypic variation in susceptibility to spontaneous arthritis. The QTL region contained 208 known transcripts. According to their functions, Mr1, Pla2g4a and Fasl are outstanding candidate genes. From microarray analysis, 11 genes were selected as favourable candidates based on their function and expression profiles. Three of those 11 genes, Prg4, Ptgs2 and Mr1, correlated with the IL-1ra pathway. Those genes were considered to be the best candidates.
The incidence of chronic kidney disease varies by ethnic group in the USA, with African Americans displaying a two-fold higher rate than European Americans. One of the two defining variables underlying staging of chronic kidney disease is the glomerular filtration rate. Meta-analysis in individuals of European ancestry has identified 23 genetic loci associated with the estimated glomerular filtration rate (eGFR). We conducted a follow-up study of these 23 genetic loci using a population-based sample of 1,018 unrelated admixed African Americans. We included in our follow-up study two variants in APOL1 associated with end-stage kidney disease discovered by admixture mapping in admixed African Americans. To address confounding due to admixture, we estimated local ancestry at each marker and global ancestry. We performed regression analysis stratified by local ancestry and combined the resulting regression estimates across ancestry strata using an inverse variance-weighted fixed effects model. We found that 11 of the 24 loci were significantly associated with eGFR in our sample. The effect size estimates were not significantly different between the subgroups of individuals with two copies of African ancestry vs. two copies of European ancestry for any of the 11 loci. In contrast, allele frequencies were significantly different at 10 of the 11 loci. Collectively, the 11 loci, including four secondary signals revealed by conditional analyses, explained 14.2% of the phenotypic variance in eGFR, in contrast to the 1.4% explained by the 24 loci in individuals of European ancestry. Our findings provide insight into the genetic basis of variation in renal function among admixed African Americans.
Low levels of high-density cholesterol (HDLc) accompany chronic kidney disease, but the association between HDLc and the estimated glomerular filtration rate (eGFR) in the general population is unclear. We investigated the HDLc-eGFR association in nondiabetic Han Chinese (HC, n = 1100), West Africans (WA, n = 1497), and African Americans (AA, n = 1539).
There were significant differences by ancestry: HDLc was positively associated with eGFR in HC (β = 0.13, P < 0.0001), but negatively associated among African ancestry populations (WA: −0.19, P < 0.0001; AA: −0.09, P = 0.02). These differences were also seen in nationally-representative NHANES data (among European Americans: 0.09, P = 0.005; among African Americans −0.14, P = 0.03). To further explore the findings in African ancestry populations, we investigated the role of an African ancestry-specific nephropathy risk variant, rs73885319, in the gene encoding HDL-associated APOL1. Among AA, an inverse HDLc-eGFR association was observed only with the risk genotype (−0.38 versus 0.001; P = 0.03). This interaction was not seen in WA.
In summary, counter to expectation, an inverse HDLc-eGFR association was observed among those of African ancestry. Given the APOL1 × HDLc interaction among AA, genetic factors may contribute to this paradoxical association. Notably, these findings suggest that the unexplained mechanism by which APOL1 affects kidney-disease risk may involve HDLc.
Advances in technology and reduced costs are facilitating large-scale sequencing of genes and exomes as well as entire genomes. Recently, we described an approach based on haplotypes called SCARVA1 that enables the simultaneous analysis of the association between rare and common variants in disease etiology. Here, we describe an extension of SCARVA that evaluates individual markers instead of haplotypes. This modified method (SCARVAsnp) is implemented in four stages. First, all common variants in a pre-specified region (eg, gene) are evaluated individually. Second, a union procedure is used to combined all rare variants (RVs) in the index region, and the ratio of the log likelihood with one RV excluded to the log likelihood of a model with all the collapsed RVs is calculated. On the basis of previously-reported simulation studies,1 a likelihood ratio ≥1.3 is considered statistically significant. Third, the direction of the association of the removed RV is determined by evaluating the change in λ values with the inclusion and exclusion of that RV. Lastly, significant common and rare variants, along with covariates, are included in a final regression model to evaluate the association between the trait and variants in that region. We apply simulated and real data sets to show that the method is simple to use, computationally effcient, and that it can accurately identify both common and rare risk variants. This method overcomes several limitations of existing methods. For example, SCARVAsnp limits loss of statistical power by not including variants that are not associated with the trait of interest in the final model. Also, SCARVAsnp takes into consideration the direction of association by effectively modelling positively and negatively associated variants.
complex traits; rare and common variants
Recent developments in high-throughput genotyping and whole-genome sequencing will enhance the identification of disease loci in admixed populations. We discuss how a more refined estimation of ancestry benefits both admixture mapping and association mapping, making disease loci identification in admixed populations more powerful.
High-throughput genotyping and sequencing will enable refined estimation of ancestry, thus enhancing disease loci identification in admixed populations
Principal components analysis of genetic data is used to avoid inflation in type I error rates in association testing due to population stratification by covariate adjustment using the top eigenvectors and to estimate cluster or group membership independent of self-reported or ethnic identities. Eigendecomposition transforms correlated variables into an equal number of uncorrelated variables. Numerous stopping rules have been developed to identify which principal components should be retained. Recent developments in random matrix theory have led to a formal hypothesis test of the top eigenvalue, providing another way to achieve dimension reduction. In this study, I compare Velicer’s minimum average partial test to a test based on the Tracy-Widom distribution as implemented in EIGENSOFT, the most widely used implementation of principal components analysis in genome-wide association analysis. By computer simulation of vicariance based on coalescent theory, EIGENSOFT systematically overestimates the number of significant principal components. Furthermore, this overestimation is larger for samples of admixed individuals than for samples of unadmixed individuals. Overestimating the number of significant principal components can potentially lead to a loss of power in association testing by adjusting for unnecessary covariates and may lead to incorrect inferences about group differentiation. Velicer’s minimum average partial test is shown to have both smaller bias and smaller variance, often with a mean squared error of zero, in estimating the number of principal components to retain. Velicer’s minimum average partial test is implemented in R code and is suitable for genome-wide genotype data with or without population labels.
admixture; population stratification; principal components; stopping rule; vicariance
Total serum bilirubin is associated with several clinical outcomes, including cardiovascular disease, diabetes and drug metabolism. We conducted a genome-wide association study in 619 healthy unrelated African Americans in an attempt to replicate reported findings in Europeans and Asians and to identify novel loci influencing total serum bilirubin levels. We analyzed a dense panel of over two million genotyped and imputed SNPs in additive genetic models adjusting for age, sex, and the first two significant principal components from the sample covariance matrix of genotypes. Thirty-nine SNPs spanning a 78 kb region within the UGT1A1 displayed P-values <5 × 10−8. The lowest P-value was 1.7 × 10−22 for SNP rs887829. None of SNPs in the UGT1A1 remained statistically significant in conditional association analyses that adjusted for rs887829. In addition, SNP rs10929302 located in phenobarbital response enhancer module was significantly associated with bilirubin level with a P-value of 1.37 × 10−11; this enhancer module is believed to have a critical role in phenobarbital treatment of hyperbilirubinemia. Interestingly, the lead SNP, rs887829, is in strong linkage disequilibrium (LD) (r2≥0.74) with rs10929302. Taking advantage of the lower LD and shorter haplotypes in African-ancestry populations, we identified rs887829 as a more refined proxy for the causative variant influencing bilirubin levels. Also, we replicated the reported association between variants in SEMA3C and bilirubin levels. In summary, UGT1A1 is a major locus influencing bilirubin levels and the results of this study promise to contribute to understanding of the etiology and treatment of hyperbilirubinaemia in African-ancestry populations.
GWAS; replications; bilirubin; African Americans
Association studies are a staple of genotype–phenotype mapping studies, whether they are based on single markers, haplotypes, candidate genes, genome-wide genotypes, or whole genome sequences. Although genetic epidemiological studies typically contain data collected on multiple traits which themselves are often correlated, most analyses have been performed on single traits. Here, I review several methods that have been developed to perform multiple trait analysis. These methods range from traditional multivariate models for systems of equations to recently developed graphical approaches based on network theory. The application of network theory to genetics is termed systems genetics and has the potential to address long-standing questions in genetics about complex processes such as coordinate regulation, homeostasis, and pleiotropy.
multivariate analysis; pleiotropy; systems genetics
African Americans are disproportionately affected by type 2 diabetes (T2DM) yet few studies have examined T2DM using genome-wide association approaches in this ethnicity. The aim of this study was to identify genes associated with T2DM in the African American population. We performed a Genome Wide Association Study (GWAS) using the Affymetrix 6.0 array in 965 African-American cases with T2DM and end-stage renal disease (T2DM-ESRD) and 1029 population-based controls. The most significant SNPs (n = 550 independent loci) were genotyped in a replication cohort and 122 SNPs (n = 98 independent loci) were further tested through genotyping three additional validation cohorts followed by meta-analysis in all five cohorts totaling 3,132 cases and 3,317 controls. Twelve SNPs had evidence of association in the GWAS (P<0.0071), were directionally consistent in the Replication cohort and were associated with T2DM in subjects without nephropathy (P<0.05). Meta-analysis in all cases and controls revealed a single SNP reaching genome-wide significance (P<2.5×10−8). SNP rs7560163 (P = 7.0×10−9, OR (95% CI) = 0.75 (0.67–0.84)) is located intergenically between RND3 and RBM43. Four additional loci (rs7542900, rs4659485, rs2722769 and rs7107217) were associated with T2DM (P<0.05) and reached more nominal levels of significance (P<2.5×10−5) in the overall analysis and may represent novel loci that contribute to T2DM. We have identified novel T2DM-susceptibility variants in the African-American population. Notably, T2DM risk was associated with the major allele and implies an interesting genetic architecture in this population. These results suggest that multiple loci underlie T2DM susceptibility in the African-American population and that these loci are distinct from those identified in other ethnic populations.
For samples of admixed individuals, it is possible to test for both ancestry effects via admixture mapping and genotype effects via association mapping. Here, we describe a joint test called BMIX that combines admixture and association statistics at single markers. We first perform high-density admixture mapping using local ancestry. We then perform association mapping using stratified regression, wherein for each marker genotypes are stratified by local ancestry. In both stages, we use generalized linear models, providing the advantage that the joint test can be used with any phenotype distribution with an appropriate link function. To define the alternative densities for admixture mapping and association mapping, we describe a method based on autocorrelation to empirically estimate the testing burdens of admixture mapping and association mapping. We then describe a joint test that uses the posterior probabilities from admixture mapping as prior probabilities for association mapping, capitalizing on the reduced testing burden of admixture mapping relative to association mapping. By simulation, we show that BMIX is potentially orders-of-magnitude more powerful than the MIX score, which is currently the most powerful frequentist joint test. We illustrate the gain in power through analysis of fasting plasma glucose among 922 unrelated, non-diabetic, admixed African Americans from the Howard University Family Study. We detected loci at 1q24 and 6q26 as genome-wide significant via admixture mapping; both loci have been independently reported from linkage analysis. Using the association data, we resolved the 1q24 signal into two regions. One region, upstream of the gene FAM78B, contains three binding sites for the transcription factor PPARG and two binding sites for HNF1A, both previously implicated in the pathology of type 2 diabetes. The fact that both loci showed ancestry effects may provide novel insight into the genetic architecture of fasting plasma glucose in individuals of African ancestry.
Most genome-wide association studies performed to date have focused on individuals with European ancestry. Admixed African Americans tend to have disproportionately higher risk for many common, complex diseases. Disease or trait mapping in admixed individuals can benefit from joint analysis of ancestry and genotype effects. We developed a joint test that is more powerful than either admixture mapping of ancestry effects or association mapping of genotype effects performed separately. Our joint test fully capitalizes on the reduced testing burden of admixture mapping relative to association mapping. The test is based on generalized linear models and can be performed using standard statistical software. We illustrate the increased power of the joint test by detecting two loci for fasting plasma glucose in a sample of unrelated African American individuals, neither of which loci was detected as significant by traditional association analysis.
The genetic architecture of body weight and body composition is complex because these traits are normally influenced by multiple genes and their interactions, even after controlling for the environment. Bayesian methodology provides an efficient way of estimating these interactions.
Subjects and measurements
We used Bayesian model selection techniques to estimate the effect of epistatic interactions on age-related body weight (at 3, 6, and 10 weeks) and body composition (organ weights and fat-related traits) in an F2 sample obtained from a cross between high-growth (M16i) mice and low-growth (L6) mice.
We observed epistatic and main-effect quantitative trait loci (QTL) that controlled both body weight and body composition. Epistatic effects were generally more significant for WK3 and WK6 than WK10. Chromosomes 5 and 13 interacted strongly to control body weight at 3 weeks. A pleiotropic QTL on chromosome 2 was associated with body weight and some body composition phenotypes. Testis weight was regulated by a QTL on chromosome 13 with a significantly large main effect.
By analyzing epistatic interactions, we detected QTL not found in a previous analysis of this mouse population. Hence, the detection of gene-gene interactions may provide new information about the genetic architecture of complex obesity-related traits and may lead to the detection of additional obesity genes.
Bayesian methods; body weight; epistasis; obesity; quantitative trait loci
In mapping of quantitative trait loci (QTLs), performing hypothesis tests of linkage to a phenotype of interest across an entire genome involves multiple comparisons. Furthermore, linkage among loci induces correlation among tests. Under many multiple comparison frameworks, these problems are exacerbated when mapping multiple QTLs. Traditionally, significance thresholds have been subjectively set to control the probability of detecting at least one false positive outcome, although such thresholds are known to result in excessively low power to detect true positive outcomes. Recently, false discovery rate (FDR)-controlling procedures have been developed that yield more power both by relaxing the stringency of the significance threshold and by retaining more power for a given significance threshold. However, these procedures have been shown to perform poorly for mapping QTLs, principally because they ignore recombination fractions between markers. Here, I describe a procedure that accounts for recombination fractions and extends FDR control to include simultaneous control of the false non-discovery rate, i.e. the overall error rate is controlled. This procedure is developed in the Bayesian framework using a direct posterior probability approach. Data-driven significance thresholds are determined by minimizing the expected loss. The procedure is equivalent to jointly maximizing positive and negative predictive values. In the context of mapping QTLs for experimental crosses, the procedure is applicable to mapping main effects, gene–gene interactions and gene–environment interactions.
Chronic kidney disease (CKD) is an increasing global public health concern, particularly among populations of African ancestry. We performed an interrogation of known renal loci, genome-wide association (GWA), and IBC candidate-gene SNP association analyses in African Americans from the CARe Renal Consortium. In up to 8,110 participants, we performed meta-analyses of GWA and IBC array data for estimated glomerular filtration rate (eGFR), CKD (eGFR <60 mL/min/1.73 m2), urinary albumin-to-creatinine ratio (UACR), and microalbuminuria (UACR >30 mg/g) and interrogated the 250 kb flanking region around 24 SNPs previously identified in European Ancestry renal GWAS analyses. Findings were replicated in up to 4,358 African Americans. To assess function, individually identified genes were knocked down in zebrafish embryos by morpholino antisense oligonucleotides. Expression of kidney-specific genes was assessed by in situ hybridization, and glomerular filtration was evaluated by dextran clearance. Overall, 23 of 24 previously identified SNPs had direction-consistent associations with eGFR in African Americans, 2 of which achieved nominal significance (UMOD, PIP5K1B). Interrogation of the flanking regions uncovered 24 new index SNPs in African Americans, 12 of which were replicated (UMOD, ANXA9, GCKR, TFDP2, DAB2, VEGFA, ATXN2, GATM, SLC22A2, TMEM60, SLC6A13, and BCAS3). In addition, we identified 3 suggestive loci at DOK6 (p-value = 5.3×10−7) and FNDC1 (p-value = 3.0×10−7) for UACR, and KCNQ1 with eGFR (p = 3.6×10−6). Morpholino knockdown of kcnq1 in the zebrafish resulted in abnormal kidney development and filtration capacity. We identified several SNPs in association with eGFR in African Ancestry individuals, as well as 3 suggestive loci for UACR and eGFR. Functional genetic studies support a role for kcnq1 in glomerular development in zebrafish.
Chronic kidney disease (CKD) is an increasing global public health problem and disproportionately affects populations of African ancestry. Many studies have shown that genetic variants are associated with the development of CKD; however, similar studies are lacking in African ancestry populations. The CARe consortium consists of more than 8,000 individuals of African ancestry; genome-wide association analysis for renal-related phenotypes was conducted. In cross-ethnicity analyses, we found that 23 of 24 previously identified SNPs in European ancestry populations have the same effect direction in our samples of African ancestry. We also identified 3 suggestive genetic variants associated with measurement of kidney function. We then tested these genes in zebrafish knockdown models and demonstrated that kcnq1 is involved in kidney development in zebrafish. These results highlight the similarity of genetic variants across ethnicities and show that cross-species modeling in zebrafish is feasible for genes associated with chronic human disease.
Genome-wide association (GWA) studies have identified common variants that are associated with a variety of traits and diseases, but most studies have been performed in European-derived populations. Here, we describe the first genome-wide analyses of imputed genotype and copy number variants (CNVs) for anthropometric measures in African-derived populations: 1188 Nigerians from Igbo-Ora and Ibadan, Nigeria, and 743 African-Americans from Maywood, IL. To improve the reach of our study, we used imputation to estimate genotypes at ∼2.1 million single-nucleotide polymorphisms (SNPs) and also tested CNVs for association. No SNPs or common CNVs reached a genome-wide significance level for association with height or body mass index (BMI), and the best signals from a meta-analysis of the two cohorts did not replicate in ∼3700 African-Americans and Jamaicans. However, several loci previously confirmed in European populations showed evidence of replication in our GWA panel of African-derived populations, including variants near IHH and DLEU7 for height and MC4R for BMI. Analysis of global burden of rare CNVs suggested that lean individuals possess greater total burden of CNVs, but this finding was not supported in an independent European population. Our results suggest that there are not multiple loci with strong effects on anthropometric traits in African-derived populations and that sample sizes comparable to those needed in European GWA studies will be required to identify replicable associations. Meta-analysis of this data set with additional studies in African-ancestry populations will be helpful to improve power to detect novel associations.
The FTO gene is one of the most consistently replicated loci for obesity. However, data from populations of African ancestry are limited. We evaluated genetic variation in the FTO gene and investigated associations with obesity in West Africans and African Americans.
RESEARCH DESIGN AND METHODS
The study samples comprised 968 African Americans (59% female, mean age 49 years, mean BMI 30.8 kg/m2) and 517 West Africans (58% female, mean age 54 years, mean BMI 25.5 kg/m2). FTO genetic variation was evaluated by genotyping 262 tag single nucleotide polymorphisms (SNPs) across the entire gene. Association of each SNP with BMI, waist circumference, and percent fat mass was investigated under an additive model.
As expected, both African-ancestry samples showed weaker linkage disequilibrium (LD) patterns compared with other continental (e.g., European) populations. Several intron 8 SNPs, in addition to intron 1 SNPs, showed significant associations in both study samples. The combined effect size for BMI for the top SNPs from meta-analysis was 0.77 kg/m2 (P = 0.009, rs9932411) and 0.70 kg/m2 (P = 0.006, rs7191513). Two previously reported associations with intron 1 SNPs (rs1121980 and rs7204609, r2 = 0.001) were replicated among the West Africans.
The FTO gene shows significant differences in allele frequency and LD patterns in populations of African ancestry compared with other continental populations. Despite these differences, we observed evidence of associations with obesity in African Americans and West Africans, as well as evidence of heterogeneity in association. More studies of FTO in multiple ethnic groups are needed.
Imputation of genotypes for markers untyped in a study sample has become a standard approach to increase genome coverage in genome-wide association studies at practically zero cost. Most methods for imputing missing genotypes extend previously described algorithms for inferring haplotype phase. These algorithms generally fall into three classes based on the underlying model for estimating the conditional distribution of haplotype frequencies: a cluster-based model, a multinomial model, or a population genetics-based model. We compared BEAGLE, PLINK, and MACH, representing the three classes of models, respectively, with specific attention to measures of imputation success and selection of the reference panel for an admixed study sample of African Americans. Based on analysis of chromosome 22 and after calibration to a fixed level of 90% concordance between experimentally determined and imputed genotypes, MACH yielded the largest absolute number of successfully imputed markers and the largest gain in coverage of the variation captured by HapMap reference panels. Following the common practice of performing imputation once, the Yoruba in Ibadan, Nigeria (YRI) reference panel outperformed other HapMap reference panels, including 1) African ancestry from Southwest USA (ASW) data, 2) an unweighted combination of the Northern and Western Europe (CEU) and YRI data into a single reference panel, and 3) a combination of the CEU and YRI data into a single reference panel with weights matching estimates of admixture proportions. For our admixed study sample, the optimal strategy involved imputing twice with the HapMap CEU and YRI reference panels separately and then merging the data sets.
admixture; African American; coverage; reference panel
Uric acid is the primary byproduct of purine metabolism. Hyperuricemia is associated with body mass index (BMI), sex, and multiple complex diseases including gout, hypertension (HTN), renal disease, and type 2 diabetes (T2D). Multiple genome-wide association studies (GWAS) in individuals of European ancestry (EA) have reported associations between serum uric acid levels (SUAL) and specific genomic loci. The purposes of this study were: 1) to replicate major signals reported in EA populations; and 2) to use the weak LD pattern in African ancestry population to better localize (fine-map) reported loci and 3) to explore the identification of novel findings cognizant of the moderate sample size.
African American (AA) participants (n = 1,017) from the Howard University Family Study were included in this study. Genotyping was performed using the Affymetrix® Genome-wide Human SNP Array 6.0. Imputation was performed using MACH and the HapMap reference panels for CEU and YRI. A total of 2,400,542 single nucleotide polymorphisms (SNPs) were assessed for association with serum uric acid under the additive genetic model with adjustment for age, sex, BMI, glomerular filtration rate, HTN, T2D, and the top two principal components identified in the assessment of admixture and population stratification.
Four variants in the gene SLC2A9 achieved genome-wide significance for association with SUAL (p-values ranging from 8.88 × 10-9 to 1.38 × 10-9). Fine-mapping of the SLC2A9 signals identified a 263 kb interval of linkage disequilibrium in the HapMap CEU sample. This interval was reduced to 37 kb in our AA and the HapMap YRI samples.
The most strongly associated locus for SUAL in EA populations was also the most strongly associated locus in this AA sample. This finding provides evidence for the role of SLC2A9 in uric acid metabolism across human populations. Additionally, our findings demonstrate the utility of following-up EA populations GWAS signals in African-ancestry populations with weaker linkage disequilibrium.
Common, complex diseases are hypothesized to result from a combination of common and rare genetic variants. We developed a unified framework for the joint association testing of both types of variants. Within the framework, we developed a union-intersection test suitable for genome-wide analysis of single nucleotide polymorphisms (SNPs), candidate gene data, as well as medical sequencing data. The union-intersection test is a composite test of association of genotype frequencies and differential correlation among markers.
We demonstrated by computer simulation that the false positive error rate was controlled at the expected level. We also demonstrated scenarios in which the multi-locus test was more powerful than traditional single marker analysis. To illustrate use of the union-intersection test with real data, we analyzed a publically available data set of 319,813 autosomal SNPs genotyped for 938 cases of Parkinson disease and 863 neurologically normal controls for which no genome-wide significant results were found by traditional single marker analysis. We also analyzed an independent follow-up sample of 183 cases and 248 controls for replication.
We identified a single risk haplotype with a directionally consistent effect in both samples in the gene GAK, which is involved in clathrin-mediated membrane trafficking. We also found suggestive evidence that directionally inconsistent marginal effects from single marker analysis appeared to result from risk being driven by different haplotypes in the two samples for the genes SYN3 and NGLY1, which are involved in neurotransmitter release and proteasomal degradation, respectively. These results illustrate the utility of our unified framework for genome-wide association analysis of common, complex diseases.
Admixture mapping is a powerful approach for identifying genetic variants involved in human disease that exploits the unique genomic structure in recently admixed populations. To use existing published panels of ancestry-informative markers (AIMs) for admixture mapping, markers have to be genotyped de novo for each admixed study sample and samples representing the ancestral parental populations. The increased availability of dense marker data on commercial chips has made it feasible to develop panels wherein the markers need not be predetermined.
We developed two panels of AIMs (~2,000 markers each) based on the Affymetrix Genome-Wide Human SNP Array 6.0 for admixture mapping with African American samples. These two AIM panels had good map power that was higher than that of a denser panel of ~20,000 random markers as well as other published panels of AIMs. As a test case, we applied the panels in an admixture mapping study of hypertension in African Americans in the Washington, D.C. metropolitan area.
Developing marker panels for admixture mapping from existing genome-wide genotype data offers two major advantages: (1) no de novo genotyping needs to be done, thereby saving costs, and (2) markers can be filtered for various quality measures and replacement markers (to minimize gaps) can be selected at no additional cost. Panels of carefully selected AIMs have two major advantages over panels of random markers: (1) the map power from sparser panels of AIMs is higher than that of ~10-fold denser panels of random markers, and (2) clusters can be labeled based on information from the parental populations. With current technology, chip-based genome-wide genotyping is less expensive than genotyping ~20,000 random markers. The major advantage of using random markers is the absence of ascertainment effects resulting from the process of selecting markers. The ability to develop marker panels informative for ancestry from SNP chip genotype data provides a fresh opportunity to conduct admixture mapping for disease genes in admixed populations when genome-wide association data exist or are planned.
Mapping multiple quantitative trait loci (QTL) is commonly viewed as a problem of model selection. Various model selection criteria have been proposed, primarily in the non-Bayesian framework. The deviance information criterion (DIC) is the most popular criterion for Bayesian model selection and model comparison but has not been applied to Bayesian multiple QTL mapping. A derivation of the DIC is presented for multiple interacting QTL models and calculation of the DIC is demonstrated using posterior samples generated by Markov chain Monte Carlo (MCMC) algorithms. The DIC measures posterior predictive error by penalizing the fit of a model (deviance) by its complexity, determined by the effective number of parameters. The effective number of parameters simultaneously accounts for the sample size, the cross design, the number and lengths of chromosomes, covariates, the number of QTL, the type of QTL effects, and QTL effect sizes. The DIC provides a computationally efficient way to perform sensitivity analysis and can be used to quantitatively evaluate if including environmental effects, gene-gene interactions, and/or gene-environment interactions in the prior specification is worth the extra parameterization. The DIC has been implemented in the freely available package R/qtlbim, which greatly facilitates the general usage of Bayesian methodology for genome-wide interacting QTL analysis.
complex trait; deviance; DIC; model selection and comparison; quantitative trait loci
Delineating the genetic basis of body composition is important to agriculture and medicine. In addition, the incorporation of gene-gene interactions in the statistical model provides further insight into the genetic factors that underlie body composition traits. We used Bayesian model selection to comprehensively map main, epistatic and sex-specific QTL in an F2 reciprocal intercross between two chicken lines divergently selected for high or low growth rate.
We identified 17 QTL with main effects across 13 chromosomes and several sex-specific and sex-antagonistic QTL for breast meat yield, thigh + drumstick yield and abdominal fatness. Different sets of QTL were found for both breast muscles [Pectoralis (P) major and P. minor], which suggests that they could be controlled by different regulatory mechanisms. Significant interactions of QTL by sex allowed detection of sex-specific and sex-antagonistic QTL for body composition and abdominal fat. We found several female-specific P. major QTL and sex-antagonistic P. minor and abdominal fatness QTL. Also, several QTL on different chromosomes interact with each other to affect body composition and abdominal fatness.
The detection of main effects, epistasis and sex-dimorphic QTL suggest complex genetic regulation of somatic growth. An understanding of such regulatory mechanisms is key to mapping specific genes that underlie QTL controlling somatic growth in an avian model.
Human height is the prototypical polygenic quantitative trait. Recently, several genetic variants influencing adult height were identified, primarily in individuals of East Asian (Chinese Han or Korean) or European ancestry. Here, we examined 152 genetic variants representing 107 independent loci previously associated with adult height for transferability in a well-powered sample of 1,016 unrelated African Americans. When we tested just the reported variants originally identified as associated with adult height in individuals of East Asian or European ancestry, only 8.3% of these loci transferred (p-values≤0.05 under an additive genetic model with directionally consistent effects) to our African American sample. However, when we comprehensively evaluated all HapMap variants in linkage disequilibrium (r2≥0.3) with the reported variants, the transferability rate increased to 54.1%. The transferability rate was 70.8% for associations originally reported as genome-wide significant and 38.0% for associations originally reported as suggestive. An additional 23 loci were significantly associated but failed to transfer because of directionally inconsistent effects. Six loci were associated with adult height in all three groups. Using differences in linkage disequilibrium patterns between HapMap CEU or CHB reference data and our African American sample, we fine-mapped these six loci, improving both the localization and the annotation of these transferable associations.
CD8+ cytotoxic T lymphocytes (CTL) are strong mediators of human immunodeficiency virus type 1 (HIV-1) control, yet HIV-1 frequently mutates to escape CTL recognition. In an analysis of sequences in the Los Alamos HIV-1 database, we show that emerging CTL escape mutations were more often present at lower frequencies than the amino acid(s) that they replaced. Furthermore, epitopes that underwent escape contained amino acid sites of high variability, whereas epitopes persisting at high frequencies lacked highly variable sites. We therefore infer that escape mutations are likely to be associated with weak functional constraints on the viral protein. This was supported by an extensive analysis of one subject for whom all escape mutations within defined CTL epitopes were studied and by an analysis of all reported escape mutations of defined CTL epitopes in the HIV Immunology Database. In one of these defined epitopes, escape mutations involving the substitution of amino acids with lower database frequencies occurred, and the epitope soon reverted back to the sensitive form. We further show that this escape mutation substantially diminished viral fitness in in vitro competition assays. Coincident with the reversion in vivo, we observed the fixation of a mutation 3 amino acids C terminal to the epitope, coincident with the ablation of the corresponding CTL response. The C-terminal mutation did not restore replication fitness reduced by the escape mutation in the epitope and by itself had little effect on replication fitness. Therefore, this C-terminal mutation presumably impaired the processing and presentation of the epitope. Finally, for one persistent epitope, CTL cross-reactivity to a mutant form may have suppressed the mutant to undetected levels, whereas for two other persistent epitopes, each of two mutants showed poor cross-reactivity and appeared in the subject at later time points. Thus, a viral dynamic exists between the advantage of immune escape, peptide cross-reactivity, and the disadvantage of lost replication fitness, with the balance playing an important role in determining whether a CTL epitope will persist or decline during infection.