|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide association studies of asthma have implicated many genetic risk factors, with well-replicated associations at approximately 10 loci that account for only a small proportion of the genetic risk.
We aimed to identify additional asthma risk loci by performing an extensive replication study of the results from the EVE Consortium meta-analysis.
We selected 3186 SNPs for replication based on the p-values from the EVE Consortium meta-analysis. These SNPs were genotyped in ethnically diverse replication samples from nine different studies, totaling to 7202 cases, 6426 controls, and 507 case-parent trios. Association analyses were conducted within each participating study and the resulting test statistics were combined in a meta-analysis.
Two novel associations were replicated in European Americans: rs1061477 in the KLK3 gene on chromosome 19 (combined OR = 1.18; 95% CI 1.10 – 1.25) and rs9570077 (combined OR =1.20 95% CI 1.12–1.29) on chromosome 13q21. We could not replicate any additional associations in the African American or Latino individuals.
This extended replication study identified two additional asthma risk loci in populations of European descent. The absence of additional loci for African Americans and Latino individuals highlights the difficulty in replicating associations in admixed populations.
Asthma is a common and complex disease, affecting approximately 300 million people worldwide (1). There is strong evidence for a genetic contribution to asthma risk, with heritability estimates ranging between 35% and 95% (2, 3) and many genetic associations reported (reviewed in (4)). Two large meta-analyses by the GABRIEL (5) and EVE (6) Consortia and many genome-wide association studies (7–13) for asthma have been reported in the last five years. The meta-analysis by the GABRIEL Consortium focused on populations of European ancestry and included twenty-three different studies from across Europe and Canada. Their findings included associations in or near the genes IL1RL1-IL1RL18, HLA-DQ, IL33, SMAD3, and GSDMA/GSDMB (17q21). Most of these loci had been previously been implicated in asthma or asthma-related phenotypes (e.g., 17q21 (7), HLA-DQ (14–16), IL33 (17), IL1RL1 (17, 18)). Most of these associations were also observed in the EVE Consortium meta-analysis, which included both primary and replication individuals from European American, African American and Latino populations. The EVE Consortium study also reported associations with single nucleotide polymorphisms (SNPs) near the TSLP gene, which was reported in a sub-analysis in GABRIEL as well as in previous studies (12, 19), and identified a novel association among African American individuals with SNPs in the PYHIN1 gene.
In the original EVE meta-analysis, we attempted replication of 13 SNPs from separate regions. The selected SNPs had association (p-value < 10−6) in either one of the three ethnic groups or in the combined sample. For the current study, we hypothesized that some SNPs with larger association p-values might also represent true associations. To address this possibility, we performed a deeper replication study of the results of the EVE meta-analysis. To this end, we genotyped a panel of 3,186 candidate SNPs in a replication sample totaling 7,709 individuals comprising European American, African American, and Latino subjects with asthma.
The detailed methods for the EVE consortium meta-analysis were previously published (6). Briefly, the EVE meta-analysis was comprised of 4,867 European Americans (1,486 cases, 1,539 controls, and 620 trios), 4,644 Latinos (606 cases, 792 controls, and 1,082 trios), and 3,447 African Americans (1,154 cases, 1,054 controls, and 413 trios). Two million HapMap SNPs were either genotyped or imputed (20) and were analyzed for association with asthma. The meta-analysis test statistic was a linear combination of normally distributed test statistics weighted by the square root of the study sample size. Significance of the meta-analysis test statistic was assessed using standard normal approximations. This analysis will henceforth be referred to as the primary analysis.
We selected SNPs for additional genotyping based on the results from the original EVE meta-analysis (referred to here as the primary study. This include all SNPs with a p-value < 1×10−4 in any one group (i.e. European American, African American, or Latino American individuals) or a p-value < 1×10−3 in the combined meta-analysis of the groups combined. A less conservative p-value cut-off was used for the combined meta-analysis than the ethnic specific analyses because the combined analysis was more powerful and showed a larger number of small p-values than the ethnicity-specific analyses. We also genotyped an additional 407 SNPs that were informative for determining genetic ancestry (i.e. ancestry informative markers [AIMs]).
Asthma cases, non-asthma controls, and asthma case-parent trios were recruited from locations across North and Central America by the same investigators contributing samples to the primary analysis. Descriptions of the replication samples and ascertainment schemes were previously reported (Supplemental Table 4 and Supplemental Note in reference (6)). Genotyping was performed using a custom Illumina iSELECT array at the Southern California Genotyping Consortium (SCGC) at the University of California – Los Angeles.
Each EVE center performed quality control and association analyses on their own data sets, using protocols identical to those used in the primary analysis (6). The description of each center’s quality control and statistical analyses, including covariates, is presented in the supplemental methods. A shared summary file for each replication study population included SNP name, risk allele, alternative allele, risk allele frequency in cases, risk allele frequency in controls, normal or chi-square distributed test statistic, p-value, odds ratio, and the standard error of the log odds ratio. Test statistics were converted to a standard normal distribution and all results were checked for consistent allelic directionality across all study populations. The meta-analysis test statistics were calculated as a linear, weighted combination of the individual study population test statistics. The weights (w) were a function of sample size (N), proportion of cases (v), and allele frequency (p) and defined as: (21). P-values were obtained using standard normal approximations. The combined odds ratios were calculated as a linear combination of log odds ratios with weights proportional to the standard errors of the log odds ratios. All statistical analyses were completed using the statistical software R.
We investigated whether or not there was an enrichment of small p-values at the 0.01 level in each of the four meta-analyses,, one each in the European American, African American, Latino, and combined sample. To do this analysis we first constructed a set of p-values that are not correlated or in linkage disequilibrium (LD). For each analysis, we grouped SNPs according to physical location in the genome. Then, for each group of SNPs, we constructed a region-based p-value by taking the smallest p-value from the group of SNPs and multiplied that p-value by the number of SNPs in the group. This resulted in a region-based p-value that has two properties: 1) the region-based p-value is a conservative multiple testing corrected p-value for all of the SNPs within each genomic region and 2) the region-based p-values are independent of each other. A binomial test was used to ascertain enrichments of small (< 0.01), independent, region-based p-values. The null hypothesis in this test is the region-based p-values are uniformly distributed and we do not observe any replications, where as the alternative hypothesis is there exist an excess of small region-based p-values. The assumption in the binomial test is that the all of the region-based p-values are independent of each other, whereas this assumption of independence would be violated if the binomial test was applied to the unadjusted p-values
Of the 5,560 SNPs selected for replication, 3,186 passed the Illumina array design and were successfully genotyped in 7,202 cases, 6,426 controls, and 507 case-parent trios (Table I). The 3,186 SNPs included 177 SNPs from the five genomic regions that were replicated in our earlier study (6). The remaining 3,009 SNPs (listed in Table EI in the Online Repository) were in 690 genomic regions that showed a signal of association in the primary analysis: 128 SNPs in 68 regions in European Americans, 340 SNPs in 124 regions in African Americans, 258 SNPs in 74 regions in Latinos, and 2,450 SNPs in 595 regions in the combined sample. The SNPs and regions are not entirely mutually exclusive among the ethnic groups, 167 SNPs overlapped between two or more of the samples. The average power to replicate these SNPs was highest among African Americans and lowest in Latinos (see Table EII in the Online Repository). The Q-Q plots for the 177 SNPs from the five previously replicated regions show substantial inflation of small p-values in the European American, Latino American, and the combined samples, but not in the African American samples (Figure E1A–D in the Online Repository). In contrast, the Q-Q plots (Figure 1A–D) of p-values for the ethnicity-specific replications revealed an inflation of small p-values only in the European Americans, but not in the in the African American, Latino American, or combined replication samples.
A summary of the replication results is shown in Table II. SNPs selected for replication based on a group-specific result in the primary meta-analysis were analyzed in their respective replication samples (e.g. primary meta-analysis results seen in European American individuals were analyzed in a European American replication population). In European American individuals, we expected fewer than one of the 128 SNPs to have a p-value < 0.001 by chance. We observed 2 SNPs with a p-value < 0.001 in these samples, which is a significant enrichment after accounting for linkage disequilibrium (LD) among SNPs (p-value = 0.031). Only one SNP had a p-value < 0.001 in the African American individuals and in the combined sample, which does not represent a significant enrichment of low p-values in either group. Finally, none of the 258 SNPs had a p-value < 0.001 in the Latino American samples.
The SNPs with the most significant combined (primary + replication) p-values are shown in Table III. The two most significant associations were in the European American sample, with p-values < 0.001 and combined (primary + replication) p-values approaching genome-wide levels of significance (2×10−8). The first replication is with rs1061477 with a combined primary and replication p-value of 2.3 × 10−8 and a combined odds ratio of 1.18 (95% CI 1.10 – 1.25) (Figure 2A). This SNP is in an intron in the KLK3 gene (encoding kallikrein-related serine peptidase 3) on chromosome 19q13 (Figure 2B). The second replication is with rs9570077, with a combined p-value of 9.9 × 10−8 and combined odds ratio of 1.20 (95% CI: 1.12–1.29) (Figure 3A). This SNP is located on chromosome 13q21, approximately 420 kb downstream of the DIAPH3 gene, which encodes diaphanous homolog 3 (Figure 3B). Finally, seven SNPs showed small p-values in both the primary sample (p-value < 1×10-4) and the replication sample (p-value <0.01) but large combined p-values, indicating that the associations were in opposite directions. One such SNP is rs9891949 with a replication p-value = 0.005 and a primary p-value = 7.5 × 10−7 in European Americans. The risk SNP differed between studies (combined OR = 1.00, 95% CI = 0.94 – 1.07, Figure 4). This SNP is located on chromosome 17p13, 1 kb from the gene AURKB which encodes serine/threonine-protein kinase auora-B.
We report here two additional replicated associations with asthma in European Americans. Despite using a conservative test for enrichment, we observed a significant excess of small p-values in the European American replication samples. Two of the three SNPs match the direction of association seen in the primary analysis.
One of the replicated SNPs, rs1061477, lies in the KLK3 gene on chromosome 19q13. This region includes other genes in the kallikrein family and lies within a region that was previously implicated in asthma and related phenotypes through linkage studies in European and European American families (14, 22, 23). Although not previously considered as an asthma candidate in this gene rich region on 19q13, the fine-scale resolution of GWAS specifically localized a strong association signal to the KLK3 gene. KLK3 encodes PSA (prostate specific antigen) which can invoke an immune response by inducing IFN-γ (interferon gamma) secretion (24). IFN-γ secretion is a key immunomodulatory molecule in asthma pathogensis (25). KLK3 gene expression is highest in the prostate but at lower levels in other tissues (26), including bronchial epithelium from individuals with asthma (27).
The second replicated association, rs9570077, is located at 13q21 and resides between two pseudogenes, RPP40P2 and POLR3KP1. SNPs in this region show very little LD (r2 ≤ 0.075) with SNPs around the closest gene, DIAPH3. The 13q21 region has also been implicated in asthma through linkage analysis (28). It is possible that the associated region encodes a long-range enhancer of DIAPH3 or another gene on chromosome 13q21.
In addition to the two replicated associations, seven SNPs had small p-values in both the primary (p-value < 1 × 10−4) and replication (p-value < 0.01) studies, but associations were in opposite directions. The most dramatic example of this was the associations with rs9891949, located 1 kb from the AURKB gene on chromosome 17p13. The association signals were strong in both the primary and replication studies; the combined p-value ignoring association directionality = 5.8 ×10−8 in European Americans. However, the combined p-value accounting for directionality = 0.316 in European Americans (Figure 4). This type of association with opposite alleles associated in primary and replication samples has been reported in other studies of asthma, where it has been attributed to a genotype-environment interactions with maternal asthma status (15) or to differences in LD between population groups (13). In our study, it is difficult to attribute this pattern to either explanation because participants in the Children’s Health Study (CHS) in Los Angeles were included in both the primary and replication samples, and opposite association patterns were observed within this one study. There are no substantial differences between the CHS children included in the primary vs. replication study. In particular, we ruled out differences in sex ratios, ages of onset of asthma, and indicators of atopy between the European American primary and replication samples that showed association with one allele vs. the other (Table EIII in the Online Repository). Finally, it is possible that these seven associations are purely chance occurrences resulting from random variation in association signals. Thus, it remains to be determined whether the seven associations of this type represent true associations with direction of effect determined by an as yet unknown variable or if they are simply the result of random variation.
Notably, none of the associations were replicated in African Americans or Latinos. This observation is surprising because the power to replicate the primary associations signals was high in all populations, with greatest power in the African Americans and least in the Latinos (Table EII). However, there was a wider range of power estimates in the Latinos and African Americans compared to European Americans. The combination of the lowest average power and increased range of power for the Latino sample may have contributed to the lack of replication in that sample, but is less likely to have contributed to the lack of replication in the African Americans because power was the highest in that group.
It is possible that the association signals observed in the primary analyses were more likely to be false positive associations in the admixed individuals due to reduced accuracy of imputation in these populations. In the primary analysis, we included samples that were genotyped with different platforms (Affymetrix or Illumina) and on arrays with different numbers of SNPs. To enable analyses with the most inclusive set of SNPs, each participating center imputed all SNPs that were not present on their platform but present on at least one of the other platforms. Thus, different SNPs were imputed in different samples, although all imputations were referenced against HapMap release 21 samples. Whole genome sequences were available for a subset of the European American (n = 21), African American (n = 32), and Latino (n = 24) individuals from the primary samples, allowing us to determine concordance rates between imputed genotypes and true genotypes. We observed similar concordance rates (98%) in all three populations, suggesting that our ability to replicate associations was not influenced by ethnicity-based differences in imputation accuracy.
To address the possibility that signal heterogeneity across the primary studies limited our ability to replicate, we investigated a measure technical heterogeneity: the proportion of variation in odds ratios attributed to signal heterogeneity rather than random chance (I2). In all populations, the heterogeneity (I2) of the odds ratios is very low (<25%) (29), indicating that most of the variation in odds ratio estimates is a result of random variation rather than study based differences in effect sizes. Less than five percent of SNPs had I2 values considered highly heterogeneous (>75%). Therefore, heterogeneity of the primary association signals does not explain the lack of replication in the admixed samples.
Alternatively, the lack of replication in the non-European, admixed populations may have been due to differences between the primary and replication samples. In particular, differences in ancestry or patterns of linkage disequilibrium (LD) may be limiting replication in these samples. For example, it is well known that asthma prevalences differs within Latinos (30) with highest prevalence among Puerto Ricans and lowest among Mexicans (31). Although both the primary and replication studies included Mexican Americans and Puerto Rican Americans from the U.S., the primary study also included subjects from Mexico City whereas the replication study included subjects from Costa Rica and Honduras. Likewise, our ‘African ancestry’ group included individuals from Barbados in the primary analysis whereas only African Americans were included in the replication studies. Differences in global and local ancestry between these samples could have impacted our ability to replicate associations, even after adjusting for global ancestry (32). In addition, patterns of LD in admixed populations may also be an important factor, notably if the original association signals at typed or imputed SNPs are due to LD with causal variation. Additional studies in these populations may shed light on these possibilities.
In summary, we report two additional asthma risk loci in European Americans that represent previously unknown genes or regions and possibly novel pathways of pathogenesis. While the effects of genotypes at each locus on risk are small, we provide evidence for two new asthma candidate regions where family-based studies previously showed evidence of linkage. Further investigations are required to identify the specific causal variants and the mechanism of the associations for KLK3 and the 13q21 locus. The lack of replication in the African and Latino American populations suggests that additional strategies are warranted when studying these groups (33). Careful attention to sample composition and recruitment strategies, environmental risk factors, and admixture corrections may improve the identification and replication of genetic risk factors for asthma in admixed populations.
The authors acknowledge Rebecca Anderson, David Witonsky, Duanny Alva, Gaby Ayala-Rodriguez, Ulysses Burley, Lisa Caine, Elizabeth Castellanos, Grace Y. Chiu, Jaime Colon, Denise DeJesus, Iliana Flexas, Dana B. Hancock, Blanca Lopez, Brenda Lopez, Louis Martos, Vivian Medina, Juana Olivo, Mario Peralta, Esther Pomares, Jihan Quraishi, Blanca del Rio Navarro, Elizabeth Nguyen, Johanna Rodriguez, Shahdad Saeedi, Sandra Salazar, Min Shi, Juan J. Sierra-Monge, Dean Soto, Ana Taveras for data collection, management, and analysis. The authors also acknowledge the support from J. Kiley, S. Banks-Schlegel, and W. Gan at the National Heart, Lung, and Blood Institute, all of the patients and families for their participation in these studies, and the numerous health care providers and community clinics for their support.
Sources of Support
This work was supported by grants from the Office of the Director, NIH to C.O. and D.L.N. and the National Heart, Lung, and Blood Institute (HL101651 to C.O. and D.L.N.; HL087665 to D.L.N.; HL070831, HL072414 and HL049596 to C.O.; HL064307 and HL064313 to F.D.M.; HL075419, HL65899, HL083069, HL066289, HL087680, HL101543 and HL101651 to S.T.W.; HL079055 to L.K.W.; HL087699, HL49612, HL075417, HL04266 and HL072433 to K.C.B.; HL061768 and HL076647 to F.D.G.; HL087680 to W.J.G.; HL078885 and HL088133 to E.G.B.; HL87665 to D.A.M.); the National Institutes of Allergy and Infectious Disease (AI070503 to C.O.; AI079139 and AI061774 to L.K.W.; AI50024, AI44840 and AI41040 to K.C.B.; and AI077439 to E.G.B.), the National Institute of Diabetes and Digestive and Kidney Diseases to L.K.W. (DK064695); the National Institutes of Environmental Health Sciences (ES09606, ES018176 and ES015903 to K.C.B.; ES007048, ES009581, R826708, RD831861 and ES011627 to F.D.G.; ES015794 to E.G.B.; and the Division of Intramural Research, Z01 ES049019, to S.J.L.); the National Center for Research Resources (RR03048 to K.C.B.), the Environmental Protection Agency (83213901 and R-826724 to K.C.B.), the American Asthma Foundation and the Fund for Henry Ford Hospital (to L.K.W.), Mary Beryl Patch Turnbull Scholar Program (to K.C.B.); and the Flight Attendant Medical Research Institute (FAMRI), Robert Wood Johnson Foundation (RWJF) Amos Medical Faculty Development Award, the American Asthma Foundation, and the Sandler Foundation (to E.G.B.).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.