|Home | About | Journals | Submit | Contact Us | Français|
Increasing evidence suggests that single nucleotide polymorphisms (SNPs) associated with complex traits are more likely to be expression quantitative trait loci (eQTLs). Incorporating eQTL information hence has potential to increase power of genome-wide association studies (GWAS). In this paper, we propose using eQTL weights as prior information in SNP based association tests to improve test power while maintaining control of the family-wise error rate (FWER) or the false discovery rate (FDR). We apply the proposed methods to the analysis of a GWAS for childhood asthma consisting of 1296 unrelated individuals with German ancestry. The results confirm that eQTLs are enriched for previously reported asthma SNPs. We also find that some SNPs are insignificant using procedures without eQTL weighting, but become significant using eQTL-weighted Bonferroni or Benjamini–Hochberg procedures, while controlling the same FWER or FDR level. Some of these SNPs have been reported by independent studies in recent literature. The results suggest that the eQTL-weighted procedures provide a promising approach for improving power of GWAS. We also report the results of our methods applied to the large-scale European GABRIEL consortium data.
Asthma is a disorder characterized by inflamed mucosa of small airways of lung, causing wheezing and shortness of breath (Moffatt et al., 2010). Among the most common chronic diseases of childhood, asthma has been reported to affect more than 10% of children in many westernized societies (Cookson, 2004). It is caused by a combination of genetic and environmental factors (Cookson, 2004; Moffatt et al., 2007), and several genome-wide association studies (GWAS) have been conducted to study the genetic basis underlying the complex disorder. More than 50 single nucleotide polymorphisms (SNPs) have been reported to be associated with asthma, according to the GWAS catalog (www.genome.gov/gwastudies, accessed on January 15, 2013). Remarkably, the recent report (Moffatt et al., 2010) from the GABRIEL (A Multidisciplinary Study to Identify the Genetic and Environmental Causes of Asthma in the European Community) consortium identified several SNPs reaching genome-wide significance through a large-scale meta-analysis.
Prior biological information, often available in practice, has potential to increase power of GWAS. The common practice of GWAS, “agnostic” in some sense, assumes no prior information about any of the SNPs under investigation, meaning that all the SNPs have an equal likelihood of being causal. Some recent studies have taken advantage of information from linkage analysis (Roeder et al., 2006) and gene expression (Xiong et al., 2012) in genome-wide association scans. In genetic studies of etiology of asthma, it is of our particular interest to employ similar approaches and explore potentials of power gain in identifying asthma-associated SNPs by incorporating expression quantitative trait loci (eQTL) information.
Catalogs of eQTLs in multiple tissues have been made publicly available, resulting from recent efforts of GWAS of gene expressions (Stranger et al., 2005, 2007, 2012; Dixon et al., 2007; Dimas et al., 2009; Yang et al., 2010). eQTLs provide insight into biology of transcription regulation. It has been shown that eQTLs are enriched for SNPs associated with complex diseases and traits using GWAS (Cookson et al., 2009; Nicolae et al., 2010). eQTL results can be used to provide functional interpretation for findings from GWAS (Moffatt et al., 2007; Heid et al., 2010; Hsu et al., 2010; Lango Allen et al., 2010; Speliotes et al., 2010; Chu et al., 2011; Wu et al., 2012) and prioritize genes in an association region for carrying out functional experiments using animal models (Teslovich et al., 2010). Focusing on eQTLs may also be useful to identify genetic pathways associated with the risk of complex diseases and traits, such as basal cell carcinoma in a skin cancer GWAS (Zhang et al., 2012) and type 2 diabetes (Zhong et al., 2010). Other results show that many cis eQTLs are shared across tissues (Ding et al., 2010) and that a comprehensive eQTL catalog in one tissue might be used to increase the power of capturing relevant transcripts for other diseases (including those that are only weakly or incidentally expressed in tissues where eQTL information was collected).
As single-SNP analysis still remains the most popular in GWAS, we focus on those methods designed for this type of analysis. Single-SNP analysis tests one SNP at a time for association by scanning across the whole genome, and hence involves a large number of hypotheses. To correct for multiple comparisons, statistical methods have been proposed and applied to control for the family-wise error rate (FWER) (Bonferroni, 1936; Holm, 1979) or the false discovery rate (FDR) (Benjamini and Hochberg, 1995; Storey and Tibshirani, 2003). Recent advances in statistical methodology make it possible to incorporate prior information through weighted hypothesis testing. In several of such methods (Genovese et al., 2006; Roeder et al., 2006, 2007), hypotheses are up-weighted or down-weighted based on prior likelihood of association with the trait of interest. While keeping the FWER or FDR under control, the procedures can improve power with informative weights and suffer small loss in power with uninformative weights (Genovese et al., 2006; Roeder and Wasserman, 2009). This feature is appealing as compared to prescreening SNPs based on prior information (e.g., to consider only eQTLs for association testing). In this paper, we propose to use eQTLs as prior information, and apply these weighted hypothesis testing methods to reanalyze the MAGICS (Multicentre Asthma Genetics in Childhood Study) data of asthma GWAS (Moffatt et al., 2007) as well as the GABRIEL meta-study of asthma (Moffatt et al., 2010).
We extracted published asthma associations from the GWAS catalog maintained by the National Human Genome Research Institute. As of January 15, 2013, 52 distinct reference SNPs in or near more than 40 genes have been reported to be associated with asthma (Table (TableA1).A1). According to the eQTL database (described in Materials and Methods), 20 of these 52 SNPs (38.5%) are eQTLs. Using the proxy SNP search tool SNAP (Johnson et al., 2008), we then obtained an extended list of 506 SNPs that were either in the GWAS catalog or in strong linkage disequilibrium (LD) with the 52 SNPs (r2≥ 0.8). We called all these 506 SNPs the extended set of asthma-associated SNPs.
We calculated an eQTL enrichment p-value (Hosack et al., 2003) using the MAGICS data. There are 300,821 SNPs that passed quality control in the MAGICS data. Among these SNPs, 29 SNPs are in the GWAS catalog, and 64 SNPs are among the 506 extended asthma-associated SNPs defined previously. To account for the LD between SNPs in the calculation of enrichment p-value, we conducted LD pruning with the r2 threshold of 0.8 on the 300,821 SNPs. This resulted in 251,826 SNPs and 38 of them are extended asthma-associated SNPs according to the GWAS catalog. According to the eQTL database, 22,922 SNPs (9.1% of 251,826 SNPs) are eQTLs, and 13 asthma associated SNPs (34.2% of 38 SNPs) are eQTLs. The corresponding enrichment p-value is 6.78 × 10−5, suggesting the asthma associations are enriched with eQTLs in the MAGICS data. Note that other analyses considered all the SNPs rather than the pruned set of SNPs.
These results are in line with the previous findings (Nicolae et al., 2010), which studied the eQTLs in lymphoblastoid cell lines (LCL) from the HapMap samples and the GWAS catalog. Their results suggest that SNPs associated with complex traits are more likely to be eQTLs.
We calculated two kinds of weights using eQTL information for the MAGICS data. All the 300,821 SNPs passing quality control were considered. First, we defined a SNP as an eQTL SNP if it was labeled as an eQTL in the eQTL database (see details in Materials and Methods). There are 31,781 cis eQTL SNPs (10.6% of 300,821 SNPs) according to the definition, and for each of them, we retrieved an eQTL p-value peQTL. Next, we considered two choices of weights, the general weight and the binary weight. The general weight is for an eQTL SNP, and wg = 1 otherwise. The binary weight takes only two possible values, wb = 3.70 for any eQTL SNP and wb = 0.68 otherwise. The two values of the binary weight were chosen to maximize the minimum power while keeping at least 10.6% (also the percentage of eQTL SNPs) of all the hypotheses with a power of 60%. The parameters for calculating the binary weight are ϵ = 0.106, α = 0.05, and β = 0.4 (see details in Materials and Methods). Last, both weights were normalized to have the mean equal to 1 which is necessary for the weighted hypothesis testing methods to maintain the correct FWER or FDR (Genovese et al., 2006). After normalization, the general weight wg has a mean of 2.44 and a median of 2.21 among eQTL SNPs, while the binary weight wb is 3.70 for all eQTL SNPs (Figure (Figure11).
We applied the weighted hypothesis testing methods (Genovese et al., 2006; Roeder and Wasserman, 2009) using the general weight wg and the binary weight wb to the MAGICS data. For each of the 300,821 SNPs, we calculated the trait association p-value, p, from the single-SNP association test on the phenotypes of asthma status, as well as the weighted p-values Qg = p/wg and Qb = p/wb. Multiple testing adjustments were done for both the original p-values (p) and the weighted p-values (Qg and Qb). Bonferroni (1936) and Holm's (1979) methods were considered to control for FWER, and Benjamini and Hochberg's (1995) method was used to control for FDR.
We first ranked the SNPs using their p-values in the ascending order, and compared the ranks based on the weighted p-values with those based on the original p-values (Figure (Figure2).2). Since only 10.6% SNPs are eQTL SNPs according to the eQTL database, and hypotheses for eQTL SNPs are up-weighted, eQTLs generally have higher ranks after weighting, and non-eQTLs' ranks are lower but the magnitude of changes is small. This is true for both the general and binary weights. When restricting to the 29 asthma-associated SNPs reported in the GWAS catalog, we also observed similar behaviors, suggesting that weighting hypotheses may improve power using informative weights, and sacrifice a little power using uninformative weights (Roeder and Wasserman, 2009).
We then looked at the Q–Q plots of the original and weighted p-values. For p-values greater than 0.0001, the Q–Q curves (Figure (Figure3)3) are similar between the original and weighted p-values, regardless of the weights used. For those p-values less than 0.0001, some weighted p-values are smaller than original ones, and the difference is larger using the binary weight. We also observed that 3 asthma-associated SNPs in the GWAS catalog are among the top SNPs with original p-values less than 10−6. The weighted p-values for all the 3 asthma-associated SNPs are smaller than original ones.
Next, we applied the methods to control for the FWER. An effective ratio of 0.791 (Li et al., 2012) was used to calculate the effective number of SNPs (300,821 × 0.791). Controlling for an FWER level of 0.05, we obtained significant SNPs using both the original and weighted p-values. Both Bonferroni and Holm's methods gave the same results, and both weights (binary and general) also gave the same results (Tables (TablesA2,A2, ,A3).A3). The unweighted hypothesis testing claimed 6 SNPs to be significant, all on chromosome 17, including 2 asthma-associated SNPs (rs3894194 with GSDMA, and rs7216389 with ORDML3) that have been reported previously (Moffatt et al., 2007, 2010). After applying the weighted hypothesis testing, we obtained 9 significant SNPs including all the 6 SNPs identified by the unweighted method, although the ranks are not exactly the same. The 3 SNPs additionally identified by eQTL weighting were rs3902025, rs4795405, and rs2305480. The SNP rs2305480, a missense SNP in the gene GSDMB, was not reported in the previous GWAS study (Moffatt et al., 2007) but has been reported as an asthma-associated SNP in a later larger scale study by the GABRIEL consortium (including the MAGIC data, Moffatt et al., 2010) and was found to be strongly interacting with exposure to tobacco smoke in early life (Bouzigon et al., 2008). We found that rs2305480 is actually in LD (r2 = 0.702, D′ = 0.926) with rs7216389 that was identified by the unweighted methods, suggesting that rs2305480 may not represent a new association. Using a stringent r2 threshold of 0.4, we found that the other two SNPs, rs3902025 and rs4795405, are also in LD with at least one SNP identified by the unweighted methods. So there is no new association identified by the weighted methods in this particular analysis.
Besides controlling for FWER, we also used Benjamini and Hochberg's (BH) procedure (Benjamini and Hochberg, 1995) to control for a FDR level of 0.05 (Table (TableA4).A4). Based on the original p-values without weighting, the BH procedure gave 11 positive results (SNPs). The weighted BH procedures based on the general weight and the binary weight resulted in 7 and 8 additional positive results (SNPs), respectively. Using a stringent r2 threshold of 0.4, we found that 5 SNPs (Table (Table1)1) are not in LD with any of the SNPs identified without weighting. Although none of the 5 SNPs are, or in LD with, any asthma-associated SNPs according to the GWAS catalog, there are some SNPs that seem interesting. Some of the SNPs are in or close to the genes PGAP3 and STARD3 on chromosome 17, and interestingly, rs2941504 has been reported in a recent independent study (Anantharaman et al., 2011) to be associated with asthma, although it does not meet the criteria for inclusion in the GWAS catalog. This suggests that the reanalysis using the eQTL weighting approaches is promising and potentially useful.
As another application, we reanalyzed the GABRIEL data using the eQTL weighted approaches. Since only the p-values are necessary for the use of eQTL weighting, we took the p-values of the meta analysis of 37 studies that were calculated based on imputed data. In total, there are 2,473,850 SNPs and their p-values available in the GABRIEL study, which include 267,350 out of 268,204 eQTL SNPs in the eQTL database. The weights based on eQTL information were calculated in the similar way to the MAGICS data analysis.
We applied Bonferroni and Holm's methods with an FWER level of 0.05, as well as the BH procedure with an FDR level of 0.05. An effective ratio of 0.30 (Li et al., 2012) was used to calculate the effective number of SNPs (2,473,850 × 0.30). After obtaining the lists of significant SNPs using different methods, we report any SNPs identified by eQTL weighting that are not in LD with any SNPs identified by unweighted methods using an r2 threshold of 0.4. Such SNPs may be informative and suggest new associations. Tables Tables22 and and33 show the SNPs that were identified based on the general weight and the binary weight, respectively.
We conducted simulations using 5000 permutations based on the MAGICS data, and calculated the percentage of having at least one false positive claimed by Bonferroni and Holm's methods (α = 0.05, with an effective ratio of 0.791). In fact, any SNPs claimed significant using the two methods would be a false positive. The calculated percentages (Table (Table4)4) provide estimates of the FWER. Bonferroni and Holm's methods give the same results. The results suggest that, under the null hypothesis, the FWER level is controlled for the methods based on both the original (unweighted) and the weighted p-values. The simulations confirm the validity of the weighted hypothesis method (Genovese et al., 2006).
It is of substantial interest to enhance the power for identifying associations in the era of post-GWAS. Besides meta-analysis that has been proved successful in power gain (Moffatt et al., 2010), incorporating prior information has also received increasing attention. Such information can be obtained from various sources and levels, such as linkage analysis (Roeder et al., 2006), gene expression (Yang et al., 2010), and annotation information of variants (Adzhubei et al., 2010), genes (Saccone et al., 2007), and pathways (Wang et al., 2007). The so-called “agnostic” GWAS may benefit from incorporating useful prior information. In our study of asthma, gene expression information is of particular interest, as a recent study (Moffatt et al., 2007) identified several eQTLs associated with asthma.
In the reanalysis of the MAGICS data (Moffatt et al., 2007), we applied recently developed statistical methods that can improve power by weighting hypothesis (Genovese et al., 2006; Roeder and Wasserman, 2009). Using eQTL information obtained from an independent dataset, we employed weighted procedures that up-weighted eQTL SNPs and down-weighted non-eQTL SNPs while controlling for the FWER or the FDR. It has been proved (Genovese et al., 2006) that any set of nonnegative weights can guarantee substantial power gain given informative weights and little power loss for uninformative weights. The property implies that the weighted procedures are robust to informativeness of weights and to the uneven coverage of genes and expression targets on the genome. We took advantage of this robustness and applied the procedures to an asthma study. We found additional SNPs that were significantly associated with asthma according to the weighting hypothesis methods. Some of them were interesting after we accounted for LD and compared them to literature. Our analysis was the first application of this approach to asthma GWAS studies, and the results successfully illustrated the use of eQTL weighting in the context of asthma studies. As another application, we also reanalyzed the GABRIEL meta-analysis p-values and reported corresponding results.
It is noted that the weighted procedures can utilize eQTL information from a reference database. Multiple choices of eQTL databases have already been made available (e.g., Yang et al., 2010; Liang et al., 2013), and future efforts may provide even better reference of eQTL information. For example, the eQTL information considered in our reanalysis was obtained through a single platform (Affymetrix HG-U133 Plus 2.0), and better coverage of gene expression profiling may be achieved through RNA-Seq technologies or by combining information from various platforms.
Besides the weighted procedures, an alternative method of using eQTL information is to simply test association between eQTL SNPs and the trait of interest. Such a method is not recommended in GWAS as it excludes non-eQTL completely and relies on the prior information too heavily. By contrast, weighted procedures make it possible to consider eQTLs and non-eQTLs simultaneously. More importantly, they can possibly increase power if the prior information is useful and are able to maintain the type I error under the null.
Applying the weighted procedures in our reanalysis only requires p-value of eQTL SNPs. This flexibility means that such analyses can be applied to any existing GWAS data, even if they do not have accompanying gene expression data. Although gene expression may have tissue-specific patterns, a substantial fraction of eQTLs may be shared across tissues (Ding et al., 2010). Hence eQTLs developed from tissues that are not directly relevant to the outcome of interest, such as those from publicly available eQTL databases based on LCL, can be used to improve power on GWAS. It is possible that using eQTL information from relevant tissues may result in even more power gain, if such information is available.
Besides the particular weighting hypothesis method (Genovese et al., 2006) we adopted, Bayesian methods are potentially alternative strategies to incorporate eQTL information. The use of Bayes factors has been applied to genetic association studies (Wellcome Trust Case Control Consortium, 2007; Stephens and Balding, 2009). In single SNP analysis, a prior is assumed for each SNP effect [e.g., N(0, 0.22) under a model of association in Wellcome Trust Case Control Consortium (2007)]. eQTL information can be naturally incorporated into the prior, although it may be challenging to choose a realistic yet tractable alternative model and to assess error rates (Hoggart et al., 2008), especially with the eQTL weight. One possible choice is through modifying the variance of the prior, for example assuming a prior N(0.22w), where w is a weight of eQTL signal. Another possible choice is to keep the variance the same and increase the probability of association for eQTLs a priori. It is of interest in future research to explore these possibilities and consider the extension of Bayesian methods to incorporate eQTL information.
In our analysis, we took into account the LD between SNPs by considering the effective number of SNPs (Li et al., 2012). As an alternative, testing SNP sets for association has potential of improving power and reducing the correlation between tests. Since the focus of this paper is to demonstrate the use of eQTL information in association testing, we will consider the weighted correlated hypothesis in future research.
Two choices of weights were applied in our analysis including a binary weight and a weight using strength of eQTLs, and the results using the two weights were similar in our analysis. Theoretical results exist (Roeder and Wasserman, 2009) for the optimal binary weight, which provide guidance in choosing the values of the weight. The weight taking advantage of the eQTL strength may possibly provide more useful information, and what is the best choice of weights is still under research.
Through an application to an asthma GWAS, we demonstrated the usefulness of eQTL weights in GWAS. Although results may vary depending on the traits of interest and the underlying biological mechanism, the potentials of increasing power and little investment required for reanalysis make the eQTL-weighted procedures desirable for reanalysis of existing GWAS data and useful for design and analysis of future studies.
The MAGICS (Multicentre Asthma Genetics in Childhood Study) study data (Moffatt et al., 2007), part of the GABRIEL consortium, were reanalyzed by incorporating eQTL information. Quality control procedures were conducted similarly to a published protocol (Anderson et al., 2010). Individuals with missing phenotypes, elevated missing rates (≥ 5%), or outlying heterozygosity rate were removed. Markers with an excessive missing rate (≥ 5%), low MAF (<5%), or failing in the HWE test (p-value < 10−5) were all excluded as well. The remaining dataset contains 1296 individuals (647 affected and 649 unaffected) genotyped across 300,821 SNPs.
To account for possible divergent ancestry and population stratification, principal component analysis (PCA) was conducted using EIGENSOFT 4.2 (Patterson et al., 2006; Price et al., 2006). The genotype data were pruned for LD prior to the PCA. The PCA result (Figure (FigureA1)A1) suggests that no obvious stratification exists, and the signal of the first principal component is very weak. In the subsequent analysis, we still included the first principal component as a covariate.
LD pruning was considered only in the calculation of enrichment p-value. It was conducted using PLINK (v1.07, downloaded from http://pngu.mgh.harvard.edu/purcell/plink/) (Purcell et al., 2007). A moving window with a width of 50 SNPs and a step size of 5 SNPs was considered, and pairwise LDs were calculated and pruned if r2 > 0.8 (corresponding PLINK arguments: “–indep-pairwise 50 5 0.8”).
Association testing results, including SNP ID and p-values, were obtain from a reanalysis of the GABRIEL consortium data using imputed SNPs (Bouzigon et al., personal communication). The meta-analysis considered imputation of SNP genotypes using the HapMap 2 reference data for 37 studies, and calculated a meta-analysis p-value for each SNP using available data. Imputed SNPs were kept for analysis if their imputation scores (Rsq) were ≥ 0.5 and if their minor allele frequencies were ≥ 1%. In total there were 2,473,850 SNPs that passed the quality control. Only the SNP ID and the p-values of these SNPs were obtained and used for the reanalysis described in this paper.
An eQTL database (http://www.hsph.harvard.edu/liming-liang/software/eqtl/) resulting from an independent dataset was used as prior information to be incorporated in the GWAS. The sample contains 405 siblings from a panel of families of British descent (MRC-A) (Dixon et al., 2007). Global gene expression in LCLs was measured using Affymetrix HG-U133 Plus 2.0 chips. All siblings were genotyped using the Illumina Sentrix HumanHap300 BeadChip (ILMN300K) and/or the Illumina Sentrix Human-1 Genotyping BeadChip (ILMN100K). The SNP genotype data were further imputed using the MaCH program, and each SNP was tested for association with probes in the gene expression data. Restricting to cis eQTLs (1 Mb region) and controlling for the FDR of 1%, there are 515,947 tests with logarithm of odds (LOD) scores greater than 3.172, corresponding to 268,204 unique SNPs. In case a SNP has multiple p-values reported for associations with different probes, the minimal p-value was used for that SNP. These 268,204 SNPs are considered as eQTLs, and the database contains information of their physical positions, LOD scores, p-values, and residing or nearby genes. Details of the database are described by Liang et al. (2013).
Genetic association analysis of the MAGICS data was conducted in PLINK. Logistic regression was used to test for disease-trait SNP association while adjusted for gender and the first principal component. Meta-analysis on GABRIEL data was carried out by combining association results from 37 studies using a random effect model, and all computations were done using Stata software.
Consider m hypotheses H1, …, Hm and their test statistic p-values, P1, …, Pm. Suppose there are weights W1, …, Wm available for the m tests, respectively, satisfying Wi > 0 and ∑mi = 1Wi = m. Define Qi = Pi/Wi and let Q(1) ≤ … ≤ Q(m) be the sorted values. Let P(1), …, P(m) and W(1), …, W(m) be the values in the corresponding order. Qi is sometimes referred to a “weighted p-value” (e.g., Roeder and Wasserman, 2009), although it is not a p-value.
The weighted Bonferroni procedure is to reject any hypothesis Hj (1 ≤ j ≤ m) that satisfies Qj ≤ α/m, where α is the desired level of FWER. Genovese et al. (2006) showed that this procedure controls FWER at level no greater than α.
Holm's weighted procedure (1979) is carried out as follows: given the desired α level of FWER, if Q(1) ≥ α/m, no hypothesis is rejected; otherwise, find the largest j that satisfies Q(i) ≤ α/∑mk = iW(k) for all i ≤ j, and reject the hypotheses corresponding to the j smallest Qj's. Genovese et al. (2006) also prove that this procedure can work for a general setting of weights.
We also consider Benjamini and Hochberg's procedure (1995) for controlling FDR. Given the desired level α, find the largest j such that Q(j) ≤ α · j/m, and reject the hypotheses corresponding to the j smallest Qj's. Genovese et al. (2006) prove that this procedure controls FDR at level α.
The eQTL p-values were used to construct weights for the SNPs in the asthma GWAS reanalysis. We considered two kinds of weights, the binary weight wb and the general weight wg. The binary weight takes only two possible values that are predefined, denoted by weQTL and wnon−eQTL. For m hypotheses, a binary weight is defined as wb = (wb, 1, …, wb, m) where wb, j = weQTL if the jth SNP is an eQTL SNP, and wb, j = wnon−eQTL if it is not an eQTL SNP. Given the values of α, β, and ϵ, the optimal values of weQTL and wnon−eQTL were chosen (Roeder and Wasserman, 2009) to maximize the minimum power among all the hypotheses while having at least a fraction ϵ with high power 1−β. Here α is either the level of FWER or FDR. We also considered a general weight, where the weight wg = (wg, 1, …, wg, m) has if the jth SNP is an eQTL SNP with the eQTL p-value peQTL, and wg, j = 1 otherwise. The particular form was intuitively chosen prior to the reanalysis of the GWAS data in consideration of avoiding up-weighting top eQTL SNPs too much. Both wb and wg were then normalized such that the means equal to 1, i.e., and .
Asthma-associated SNPs and genes reported in publications were retrieved from the online catalog of published GWAS on January 15, 2013. The catalog limits the associations to those with p-values less than 1.0 × 10−5 and records only one SNP with a gene or region of high LD unless there was evidence of independent association. The reported associations were compared against the findings in the asthma GWAS data we reanalyzed.
To account for LD between SNPs, LD information based on HapMap 2 was obtained. The SNAP proxy search tool (http://www.broadinstitute.org/mpg/snap/ldsearch.php) was used to obtain the information, based on the HapMap 2 (rel22) reference and a distance limit of 500kb.
Besides analyzing the MAGICS asthma GWAS data, we also conducted size simulations by permuting the disease status in the data. Logistic regression was considered where the dependent variable was the disease status (affected or unaffected) and the independent variables included a single SNP effect, gender, and the first principal component. The regression was applied to all the ~300,000 SNPs across the whole genome. Five thousand permutations were done by permuting the disease status among all the individuals, and then the model was refitted for each SNP. In the end of simulations, 5000 permutation p-values were obtained for each of the ~300,000 SNPs.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Lin Li and Xihong Lin's research was supported by grants from National Cancer Institute (R37 CA076404 and P01 CA134294). Martin Farrall is supported by the British Heart Foundation Centre for Research Excellence in Oxford and the Wellcome Trust core award [090532/Z/09/Z]. We thank the members of the GABRIEL consortium groups for providing data and summary results (a full list of the GABRIEL groups can be found in the Supplement of Moffatt et al., 2010).
|6p21.32||6||rs987870||Intron; nearGene-5||HLA, DPB1||Y|
“Y” in the eQTL column means the corresponding SNP is an eQTL SNP according to the eQTL database described in Materials and Methods. “nearGene-5” is an NCBI dbSNP function code, meaning that SNP is 5′ to and 2kb away from a gene.
|Rank||Unweighted method||General weight||Binary weight|
Bonferroni's correction is used with an FWER level of 0.05.
|Rank||Unweighted method||General weight||Binary weight|
Holm's method is used with an FWER level of 0.05.
|Rank||Unweighted method||General weight||Binary weight|
The BH procedure is used with an FDR level of 0.05.
R functions have been made available for users' convenience to compute the eQTL weighted p-values in order to conduct weighted procedures such as Bonferroni, Holm's and Benjamini–Hochberg procedures. The functions can be found at http://www.hsph.harvard.edu/liming-liang/eqtl-weighted-gwas.