|Home | About | Journals | Submit | Contact Us | Français|
Genome-wide association studies (GWAS) have led to the discovery of multiple SNPs that are associated with prostate cancer (PCa) risk. These SNPs may potentially be used for risk prediction. To date, there is not a stable estimate of their effect on PCa risk and their contribution to the genetic variation both of which are important for future risk prediction.
A literature review was conducted to identify SNPs associated with PCa risk with the following criteria: (1) GWAS in the Caucasian population; (2) SNPs with p-value < 1.0×10−6; and (3) one SNP from each independent LD block. A meta-analysis was performed to estimate combined odds ratio (OR) and its 95% confidence interval (CI) for the identified SNPs. The proportion of total genetic variance that is attributable by each of these SNPs was also estimated.
Thirty PCa risk-associated SNPs were identified. These SNPs had OR estimates between 1.12 – 1.47 except for marker rs16901979 (OR = 1.80). Significant heterogeneity in OR estimates was found among different studies for 13 SNPs. The proportion of total genetic variance attributed by each SNP ranged between 0.2% – 0.9%. These 30 SNPs explained ~13 .5% of the total genetic variance of PCa risk in the Caucasian population.
This study provides more stable OR estimates for PCa risk-associated SNPs, which is an important baseline for the effect of these SNPs in risk prediction. These SNPs explain a considerable proportion of genetic variance, however, the majority of genetic variance has yet to be explained.
Prostate cancer (PCa) is the most common solid organ malignancy and the second leading cause of cancer mortality in males in the United States . Using new, high-throughput technologies, GWAS and follow-up fine mapping studies have successfully identified over two dozens of genetic variants or single nucleotide polymorphisms (SNPs) that are associated with PCa risk. The genetic variants detected in this manner have been found to have a moderate effect on PCa risk with ORs between 1.0 –1.5. However, the combinations of these genetic variants can provide a greater cumulative effect on PCa risk and may play an important role in the risk prediction of PCa in addition to established risk factors such as age, ethnicity, and family history.
PCa risk prediction using genetic information relies heavily on the effect size (e.g., OR of each genetic variant), however, some genetic variants have had inconsistent associations among different study populations possibly due to the small magnitude of effect of the SNPs and/or issues with respect to statistical power in these studies. The impact of these inconsistencies can be magnified by the approach employed to examine this cumulative impact, namely, cumulative effect analysis, absolute risk analysis, and proportion of total genetic variance [3–6]. Herein, a meta-analysis was conducted, examining the degree of impact on PCa risk of SNPs identified in GWAS and fine-mapping, taking advantage of the high accuracy of effect size and increased statistical power, which come from these high quality studies performed with large populations [7–9]. Due to increased power and validation, meta-analyses provide more robust risk estimates to distinguish men with high risk of PCa from those with low risk.
Although advances have been made in the discovery of genetic markers, it is important to determine what impact the markers identified to date have on the overall genetic contribution to PCa risk. We estimate the proportion of overall genetic risk that the genetic variants selected in this meta-analysis contribute by using the method of Pharoah et al. 
In order to identify PCa risk SNPs from published GWAS studies, in September of 2009, the national GWAS database maintained by the National Human Genome Research Institute of the NIH, (http://www.genome.gov/gwastudies), was searched for the term "prostate cancer". References in the retrieved articles were also reviewed.
According to the search results of the GWAS database, over two dozens of genetic variants associated with PCa risk have been reported between 2007 and September 2009. The selection of eligible genetic variants was restricted to those meeting the following criteria: (1) GWAS with follow-up fine mapping studies using a case and control study design; (2) studies restricted to individuals of European ancestry; (3) SNPs associated with PCa risk with a p-value cut-off of 1.0×10−6; and (4) a randomly selected single SNP in each independent LD block. We selected the most recently published and/or the largest sample size if the same or overlapping data in multiple research studies were used for one SNP. Using these criteria, we identified 30 SNPs, all of which were from independent linkage LD blocks. We present the main characteristics of the SNPs including locations, related genes, references, p-values of association test, the number of populations studied for each SNP, and the total number of subjects in cases and control in Table 1. The average number of independent populations studied for each SNP was approximately 6. For the eight SNPs in Eeles , we included the combined results of 18 European populations from the PRACTICAL Consortium.
A meta-analysis was performed to obtain the pooled estimate of OR and its CI for the 30 SNPs identified by GWAS as being associated with PCa risk. The OR was used to evaluate the extent of this association between study populations. If the raw data, e.g., allele counts of case and control, were available, we used this information for calculating the OR and its standard error for each study population. Otherwise, we calculated these estimates using the reported OR and 95% CI. The results from both approaches are statistically comparable .
We assessed heterogeneity among study populations with the p-value of the Q-statistic for test of heterogeneity and I2 statistic which measures the proportion of total variance in estimated ORs due to heterogeneity. The I2 statistic provides the degree of heterogeneity while the Q-statistic only provides the presence or absence of heterogeneity. A value of the I2 statistic greater than 50% indicates the high degree of heterogeneity in estimated ORs across study populations .
In this meta-analysis, the fixed effects method was used to calculate the OR and 95% CI estimates by weighing each study with the inverse of variance of logarithm of OR while the random effects method additionally incorporated between variance in that weight. Although the random effects method is preferred for meta-analyses because this method can capture the heterogeneity among research studies, the random effects method for GWAS is likely to suffer from the winner’s curse problem in the sense that the random effects method may be too conservative due to additional variability compared to the fixed effect . Nonetheless, if there was a high degree of heterogeneity in OR estimates, then we considered the results of the random effects method. We obtained the pooled estimates of OR and its 95% CI for each SNP using both fixed and random effects methods. In order to evaluate the statistical significance of the pooled estimates of OR, we used the two-sided Z-test with the corresponding p-value of 2(1-Φ(|Z|)) where Φ(·) is the standard normal cumulative distribution. All meta-analyses and forest plots in this paper were performed using R software where forest plots provide the OR and 95% CI of each study population and the pooled estimate of OR and its 95% CI for each SNP.
As a utilization of the result of this meta-analysis, we estimated the proportion of a single SNP of the total genetic variance (V) and the proportion of the combination of selected SNPs. An advantage of using the results of the meta-analysis is that the pooled estimate of OR obtained from the meta-analysis provides more robust results in the calculation of genetic variance than the OR of a single study population. Assuming the selected SNPs were independent of each other, the proportion of variance explained by the combination of selected SNPs was calculated by adding the proportion of each SNP. In order to estimate the proportion of the total genetic variance explained by each SNP, we first estimated the variance of risk allele for each SNP (Vi) using the risk allele frequency in the HapMap CEU population and the pooled estimate of OR from the meta-analysis [Table 2]. If the p-value of the Q-statistic was greater than 0.05, then the results of fixed effect were used. Otherwise, the results of random effect were used to consider the heterogeneity in the estimates. The total genetic variance for all the genetic factors was calculated as 2.5 for PCa risk . The proportion of each SNP to the total genetic variation was calculated as the value of Vi divided by V.
The fixed and random effects meta-analyses for the 30 SNPs identified by searching the National Human Genome Research Institute database as being associated with PCa risk can be seen in Table 2. The pooled estimates of OR and its CI were calculated in terms of risk allele. All meta-analysis results were statistically significant by both the fixed and random effect methods. There was no significant difference in pooled estimates of OR between the two methods. We thus reported the p-values for the random effect in Table 2. The marker rs7127900 on 11p15.5, which was selected with the smallest p-value of 3.0×10−33, achieved the lowest p-value from the random effects meta-analysis. The markers rs2928679 on 8p21 and rs2660753 on 3p12 were marginally significant at the 5% significance level compared to other markers. These markers were identified using populations with relatively small number of study subjects for this meta-analysis and thus have a substantially high degree of heterogeneity which exceeds 80%.
There were 13 SNPs which have the p-value of the Q-statistic less than 0.05 and the I2 statistic greater than 50%. For these SNPs, we considered only the results of the random effects method although the OR estimates of both fixed and random effects were significant. The heterogeneity pattern of these SNPs is illustrated in Figure 1a–b. This result implies that the Q-statistic and I2 statistic are compatible for heterogeneity of study populations in meta-analysis. In addition, as the I2 statistic approaches to zero, the results of the random effect become similar to the results of fixed effect. In case that the I2 statistic is zero, the results of the two methods are the same [Table 2].
There was a moderate effect on PCa risk, with the identified SNPs in this study with ORs between 1.0 and 1.5. However, marker rs16901979 exhibited the highest pooled OR (95% CI) of 1.80 (1.57 – 2.06). Overall, the CI of estimated ORs became narrower than that of individual study populations [Figure 1a–b], which indicated that the meta-analysis achieved a more robust result. The marker rs16901979 has a low minor allele frequency in the HapMap CEU sample of 0.03 [Table 2] and a small number of subjects in the case compared to the control, 2936 vs. 37848 [Table 1], and as a result, the estimated OR and CI is relatively higher and wider than other SNPs, respectively. The second highest estimate of OR (95% CI) is 1.47 (1.33 – 1.1.62) in random effect for the marker rs1447295. These two highest estimates of OR are from loci 8q24, a region known for its association with PCa susceptibility [14 and references therein]. The functional relevance of SNPs within this region is not well understood, however, the MYC oncogene is located >250 kilobases from this region.
The results of the meta-analysis for the 30 identified SNPs were applied to the estimation of the proportion of total genetic variance. The proportion of each SNP to the total genetic effect on PCa risk is reported in Table 2. The proportion of total genetic variance attributed by each SNP ranges between 0.2% – 0.9%. Markers rs16901979 and rs10993994 have the highest proportion of 0.9%. The proportion of the total variance depends on both OR and risk allele frequency. The combination of the 30 selected SNPs in this meta-analysis explains 13.5% of the total genetic variance in the European ancestry population associated with PCa risk. The first 5 SNPs and 10 SNPs with the highest proportion explain 4.1% and 7.1% of the total genetic variance, respectively.
Herein we identify 30 genetic variants in independent loci consistently associated with PCa risk in multiple Caucasian populations by exploiting the results of published studies discovered by searching the National Human Genome Research Institute database. We calculated the pooled estimate of OR for each SNP which was statistically significant by both fixed and random effect methods. These pooled estimates can be utilized in future risk prediction models to predict PCa risk in individuals of Caucasian population. These estimates will be more accurate due to the more robust OR estimates of this meta-analysis compared to those from an individual study.
There are two main approaches to incorporate the results of multiple studies: meta-analysis and mega-analysis. The former uses the result of each study, and the latter uses the raw data of each study for calculating an effect size. Meta-analysis has been proven as valid as mega-analysis in the sense that these two approaches have approximately the same variance . For example, the results of markers, rs10934853, rs16902094, and rs8102476 in Table 2 are almost the same as the results in Gudmundsson , which used the mega-analysis approach. Thus, meta-analysis is useful especially when it is not possible to access the raw data set of an individual study.
The meta-analysis in this paper has a few limitations. Some SNPs have been identified using study populations with relatively small numbers of study subjects. The numbers of case and control subjects are asymmetric for some SNPs. These factors may lead to bias and inefficiency in this meta-analysis. Since we only used the reported SNPs in the GWAS database, we might have missed some SNPs that have not been reported or that were not captured in the database. In addition, publication bias, which is an intrinsic problem of meta-analysis, could not be addressed because the reported ORs of the selected SNPs showed similar effects, e.g., either greater than or less than 1 [1a–b].
The contributions of single and multiple SNPs to the total genetic variance of PCa risk were not very high in this analysis. The assessment of genetic variation incorporating additional SNPs discovered from ongoing GWAS and fine-mapping studies may provide beneficial information for future PCa risk prediction studies.
These results allow for many opportunities to capitalize on the stable estimates of OR for the PCa associated SNPs. First, this meta-analysis can be extended to the reported SNPs that have p-values greater than 1.0×10−6 because they could be potentially associated with PCa risk. The best combination of potential SNPs is crucial to risk prediction analyses using multiple genetic variants. Thus, more research is needed to construct the best set of SNPs associated with PCa risk. Second, this meta-analysis can be analyzed using the OR and CI after adjusting with other covariates such as age, family history, sub-ethnicity type, geographical factor, etc. These two approaches combined will allow researchers to conduct a more precise meta-analysis and an improved risk prediction analysis.
Grants: National Cancer Institute (CA129684, CA140262, CA148463 to J.X.) and Department of Defense (W81XWH-09-1-0488 to J.S.)