We have identified statistically significant joint effects between genotypes known to be involved in prostate cancer etiology and macroenvironment-level effects on biochemical failure in White men. These results cannot be interpreted as having direct biological implications of macroenvironment effects. Just as race or gender are used as surrogates for differences in socio-economic status, health care access, exposures, and other factors, we interpret our statistically significant interactions as reflecting the surrogate effects causal factors in the environment that are measured by census tract-level variables. Thus, the inferences made here are not necessarily biological in nature, but may provide improved understanding of the contextual relationship of genotype effects in a given macroenvironment setting. Instead, the goal of these analyses is to identify whether information about the macroenvironment in which an individual lives provides information that is predictive of prostate cancer outcomes. Because we were able to identify significant macroenvironment effects as well as genotype by macroenvironment interactions after we considered individual factors, our data support the hypothesis that macroenvironment variables contain information that is not captured at the individual level. By providing additional surrogate metrics of factors that may be correlated with disease risk, outcomes, and disparities, the present results may provide information that moves research in these areas away from more misclassified variables (e.g., race), and toward variables that may be both less misclassified as well as point toward specific areas in which targeted interventions may be developed to reduce disparities.
Although some studies have reported associations with disease aggressiveness (
19,
31), most of the loci or combinations of loci studied to date have not been associated with disease aggressiveness or outcomes (
31-
34). Therefore, we attempted to take a novel approach that identified contextual factors that may be associated with prostate cancer outcomes and considered information beyond genotype alone. We identified two statistically significant
HNF1B/TCF2 or
MSMB by macroenvironment interactions. HNF1B is the hepatocyte nuclear factor 1 homeobox B, also known as transcription factor 2 (TCF2). HNF1B/TCF2 is a member of the transcription factor superfamily that interacts with HNF1A (TCF1), HNF4A, CDH16, ONECUT1, and NR2F2. The HNF1B/TCF2 protein is involved in metabolism of glucose, cholesterol, uric acid, and is expressed not just in hepatocytes, but also in prostate and other tissues. Genotypes at HNF1B/TCF2 have been identified in prostate cancer risk (
20,
22), diabetes risk (
20,
35), male infertility (
36), and other traits. Therefore, there is evidence that this protein is involved in a wide variety of metabolic processes that reflect potential hormonal, cardiovascular and diabetes risk factors. In our data, we found that time to BF increased with census tract per capita income in prostate cancer cases with TT genotype at rs4430796, while time to BF decreased with census tract per capita income in prostate cancer cases with CC genotype at rs4430796. Since these factors have been associated with adverse cardiovascular, diabetes, and obesity phenotypes, we also evaluated whether additional adjustment for body mass index (BMI) might in part explain our observed associations. After adjusting these analyses for obesity (i.e. BMI<30 vs. BMI≥30), there was no substantial difference in the HR effects or interaction inferences (results not shown). Therefore, if there is a relationship between HNF1B, obesity, and time to BF, it is not explained by confounding in our data.
MSMB (microseminoprotein-β) encodes PSP94 (prostate secretory protein of 94 amino acids), which is found in semen and has been proposed as a prostate cancer screening and prognosis biomarker (
37,
38). A SNP in MSMB, rs10993994, has been reported in multiple GWAS to be associated with prostate cancer etiology (
15,
19,
39,
40). Rs10993994 is located 57 bp upstream of the MSMB transcription start site and has been suggested to regulate PSP94 expression. We reported that men with TT genotype at rs10993994 were at increased time to BF risk if they lived in census tracts with a higher percentage of older single heads of household. This suggests that the effect of MSMB may be correlated with factors related to age or social isolation. This effect was present even after adjusting for age at diagnosis and tumor aggressiveness.
The multilevel molecular epidemiology approach is novel, yet its evaluation here is limited. First, our data included only White men who were seen at a tertiary referral hospital and underwent prostatectomy as their primary treatment. The advantage of this sample selection is that it involves a narrowly defined patient and treatment population within a single hospital which avoids some extraneous variability that may cloud the results. However, these patients are not representative of the general population, nor of patients who are not treated by prostatectomy. Therefore, these results may not reflect the same effects as might be seen in men of other ethnicities, those who receive treatment other than prostatectomy, or who are diagnosed and treated in community hospital or clinic settings. We would expect that the individual risk factor and macroenvironment-level variable distributions would be different than the distributions observed in the present study population. Similarly, genotype frequencies in this study population of White men are also likely to differ from those in non-White populations. Therefore, while we find provocative associations of time to BF with individual-level and macroenvironment-level factors and with genotype interacting with macroenvironment-level factors, these may be quite different in other populations. In particular, future studies should include African American populations that suffer from the greatest prostate cancer disparities.
Second, the present study does not fully explore the relationship of susceptibility genotypes, individual-level risk factors, and macroenvironment-level factors. Additional research may be needed to understand the relationship of these factors and the optimal approach toward their modeling. Other variables not studied here should be considered that may influence prostate cancer outcomes, including individual insurance or other metrics of health care access as well as prostate cancer screening history. Furthermore, we have used a dataset that includes a very narrowly defined sample set (i.e., White men undergoing prostatectomy within a single hospital) to limit the heterogeneity that might mask the effects seen here. However, these sample restrictions also limit the inferences because of the relatively narrow spectrum of the population being studied. Thus, broader sample definitions should be considered in the future to more fully address questions of prostate cancer disparities by race or other factors. Conceptually, the multilevel molecular epidemiology approach discussed here is conceptually tied to that of Mendelian Randomization (
41), which uses observational study designs to evaluate genetic effects indirectly via exposures of interest. As such, some of the analytical approaches and pitfalls of the Mendelian Randomization framework may be applied in the future to the type of studies proposed here. In addition, we have not fully explored whether macroenvironment-level factors are a better measure of disadvantage than individual-level variables. Since the various macroenvironment-level variables are correlated with one another, and presumably with individual-level factors (many of which remain unmeasured here), it is likely that the effects of these variables do not represent independent associations. Therefore, the associations reported here may reflect similar or even identical phenomena measured through different analytical variables. Additional exploration of how correlated macroenvironment variables measure related phenomena that influence prostate cancer outcomes is required.
Third, our study uses a relatively small sample size of 444 White men followed prospectively from the time of prostate cancer diagnosis. The sample size studied here included 444 men residing in 342 census tracts. While the factors studied here reflect macroenvironment-level effects, the small number of men in a single census tract limits the “multilevel” nature of the analysis. Despite the limited sample size, our study was adequately powered to detect the effects reported here. The MME approach used here involved continuously-distributed macroenvironment effects, which provides generally greater power than discrete variables. We also limit our genotype analyses to those variants with 10% or greater allele frequency to ensure reasonable statistical power, as specified a priori in our study design, and to those SNPs with a sample size of 100 or more cases. Also, we have limited our sample set to include only a narrow range of men (i.e., White men from a single hospital who have undergone prostatectomy) to minimize the potential for unmeasured confounding that may influence our results. We have taken this approach to demonstrate the MME approach. However, it is also likely that the effects of GWAS genotypes, individual-level, and macroenvironment-level effects on time to BF are small in magnitude. For example, it is possible that the sample size used here was too small to detect effects of genotypes on time to BF if they are of a similar magnitude as seen in etiology studies. In addition, while we have identified statistically significant interactions between genotypes and macroenvironment-level context, larger studies will be required to confirm these results and to extend them to other populations. Furthermore, studies with greater statistical power and longer follow up will be required to assess macroenvironment and/or genotype effects on other outcomes, including disease recurrence, or death.
Finally, macroenvironment-level variables are generally derived from administrative databases found through the US census or other community surveys, and linked back to the individual by geocoding the person to their place of residence (
42). However, because macroenvironment-level effects are broad and clearly represent surrogates for both differences and inequities, the approach presented here is valid for prediction of risk or outcomes, but not necessarily as a means of identifying underlying etiology. We also considered a limited range of metrics, and only continuously distributed macroenvironment variables. Future studies should consider the optimal coding of these variables.
Using a multilevel molecular epidemiology approach, we have identified associations of candidate prostate cancer loci that are dependent on the context in which these genotype effects may be acting to predict prostate cancer outcomes. This approach could provide useful information in studies of cancer outcomes and disparities. Use of macroenvironment-level variables rather than (or in addition to) other surrogates such as age, gender, or race may provide better indices of disadvantage. Research related to cancer disparities that uses this approach may benefit from having measures other than race to compare groups that may differ in ways relevant to health disparities. These groups may represent target populations in which interventions can be designed and implemented around potentially modifiable factors. These macroenvironment-level factors may also identify novel genotype-environment interactions. However, because macroenvironment-level effects are broad and clearly represent surrogates for both differences and inequities, the approach presented here may be valid for prediction of risk or outcomes, but not necessarily as a means of identifying underlying etiology. Thus, the multilevel molecular epidemiology approach defined here may provide new avenues for research in cancer health disparities.