|Home | About | Journals | Submit | Contact Us | Français|
Purpose: Tamoxifen was approved for breast cancer risk reduction in high-risk women based on the National Surgical Adjuvant Breast and Bowel Project's Breast Cancer Prevention Trial (P-1:BCPT), which showed 50% fewer breast cancers with tamoxifen versus placebo, supporting tamoxifen's efficacy in preventing breast cancer. Poor metabolizing CYP2D6 variants are currently the subject of intensive scrutiny regarding their impact on clinical outcomes in the adjuvant setting. Our study extends to variants in a wider spectrum of tamoxifen-metabolizing genes and applies to the prevention setting. Methods: Our case-only study, nested within P-1:BCPT, explored associations of polymorphisms in estrogen/tamoxifen-metabolizing genes with responsiveness to preventive tamoxifen. Thirty-nine candidate polymorphisms in 17 candidate genes were genotyped in 249 P-1:BCPT cases. Results: CVP2D6_C1111T, individually and within a CYP2D6 haplotype, showed borderline significant association with treatment arm. Path analysis of the entire tamoxifen pathway gene network showed that the tamoxifen pathway model was consistent with the pattern of observed genotype variability within the placebo-arm dataset. However, correlation of variations in genes in the tamoxifen arm differed significantly from the predictions of the tamoxifen pathway model. Strong correlations between allelic variation in the tamoxifen pathway at CYP1A1-CYP3A4, CYP3A4-CYP2C9, and CYP2C9-SULT1A2, in addition to CYP2D6 and its adjacent genes, were seen in the placebo-arm but not the tamoxifen-arm. In conclusion, beyond reinforcing a role for CYP2D6 in tamoxifen response, our pathway analysis strongly suggests that specific combinations of allelic variants in other genes make major contributions to the tamoxifen-resistance phenotype.
Tamoxifen was the first drug to be approved by the FDA to reduce breast cancer incidence in women at increased risk of breast cancer according to Gail Model criteria  based on data generated from the National Surgical Adjuvant Breast and Bowel Project (NSABP)'s Breast Cancer Prevention Trial (P-1:BCPT) . Among the 13,388 eligible increased-risk women in P-l, the incidence of invasive breast cancer in those randomized to tamoxifen was significantly reduced compared with those randomized to placebo, with a risk ratio [RR] of 0.51 (95% CI 0.39 -0.66) at a mean follow-up of 47.7 months.
In a subsequent study nested within P-1:BCPT, BRCA1 and BRCA2 were resequenced in constitutional DNA from 288 breast cancer cases that developed while on study . Only 19 participants with breast cancer carried deleterious (cancer-associated) mutations in BRCA1, BRCA2, or both. Despite this small sample size, a tendency for BRCA2-mutation but not BRCA1-mutation carriers to benefit from tamoxifen was suggested in this BRCAl/2-carner case population.
The toxicities of tamoxifen, especially endometrial cancer and thromboembolism , have limited its acceptance as a breast cancer risk-reducing agent. One strategy for addressing this resistance might be to develop tools which identify women who will experience the best balance between tamoxifen's benefits and risks. If we could identify genetic modifiers of tamoxifen-related benefits and risks, stratification of potential candidates for preventive tamoxifen into those more or less likely to achieve a net benefit from treatment becomes a real possibility.
Developing a pharmacogenetic strategy for targeting women with the optimal benefitrisk ratio was the goal of our study, NSABP P-1G3. The primary P-1 endpoint of invasive breast cancer  allows assessment of associations between this clinical outcome and pre-selected genetic variants, in an approach analogous to that used in the case-only study of BRCA1/2 mutations . Candidate genes for P-1G3 were chosen based on published relationships to breast carcinogenesis or tamoxifen response: genes involved in estrogen synthesis and metabolism (estrogen/E pathway) or tamoxifen metabolism (tamoxifen/TAM pathway); genes encoding the estrogen and progesterone receptors; and genes regulating the generation of nitrogen oxide free radicals. Selection of polymorphisms relied on prior epidemiologic or laboratory associations with breast cancer risk and/or estrogen/tamoxifen metabolism or action. In this manner, we used the available literature to identify 39 loci in 17 candidate genes for analysis in DNA from the breast cancer cases that developed while on study.
We hypothesized that one or more of the genetic polymorphisms selected for examination in this study, alone or in combination with others, would alter estrogen and tamoxifen metabolism and/or action in a manner expected to decrease (or increase) the clinical response to tamoxifen. CYP2D6 variants with decreased metabolic activity have been reported as conferring resistance to tamoxifen [4-7]. Yet, CYP2D6 is only one of the genes that we were interested in studying, given that many other genes have potential to impact response to tamoxifen. Accordingly, the goal of our research has been to expand beyond the exclusive focus on this one gene and to carry out an analysis of multiple genes taken together as a network. The pursuit of such an alternative, systems approach to pharmacogenomic analysis of tamoxifen response assumes increased importance in view of the inconsistency of recent findings regarding the impact of polymorphisms in CYP2D6 analyzed as a single gene on clinical outcomes in breast cancer patients [8-12]. Our pathway analytic approach allowed us to investigate whether polymorphic alleles of additional enzymes within either the tamoxifen or estrogen metabolic pathways were associated with lack of response to this agent. Even if the expected small effect of polymorphisms at the individual level does not translate into observable pharmacogenetic interactions, pathway analysis incorporating all genes in the E and TAM pathways might reveal genotypic differences between cases occurring in the presence versus absence of tamoxifen. To address the usefulness of this pathway approach using the limited number of case samples (n=249) available to us from P-1:BCPT, we implemented the current pilot study. Our results suggest that elucidation of pharmacogenomic associations at the level of an entire metabolic network should contribute to refinement of criteria for selecting women at increased risk who will experience the best benefitrisk balance from preventive tamoxifen use.
This genomic investigation is an ancillary study to NSABP's P-1:BCPT trial; the details of the methodology and the demographic characteristics of the study cohort in P-1 have been previously described . The overall study was approved by the institutional review boards (IRBs) of the participating institutions. Women enrolled in P-1 provided written informed consent. With one exception, the IRBs of these institutions allowed informed consent forms to include the collection of blood samples that could be used in future studies for genomic analyses.
For purposes of this genomic analysis, cases were defined as all incident invasive breast cancers occurring before the P-1 participants were informed of their assigned treatment on April 1, 1998, as identified from the NSABP summary file with a data lock of September 30, 1999. Non-invasive breast cancers were not included. The current study utilized lymphocyte-derived DNA samples isolated from blood collected from P-1 breast cancer cases only. The actual samples consisted of DNA aliquots remaining from the DNA utilized in P-1G, a prior genetic study of BRCA1 and BRCA2 mutations in P-1 cases . As described in that study, the DNA samples from the breast cancer cases were anonymized prior to delivery to study investigators.
We employed a candidate gene strategy in selecting genes for this study, reasoningthat genes encoding proteins involved in estrogen and tamoxifen synthesis, metabolism, and function would be biologically plausible candidates as genetic modifiers of breast cancer risk or response to preventive tamoxifen (Figure 1). Table 1 lists the selected genes and markers. An extensive epidemiologic literature suggesting associations between breast cancer risk and specific polymorphisms in estrogen-synthesizing/metabolizing genes and estrogen receptor genes encouraged this approach [13-26].
Buffy coat DNA samples isolated from whole blood collected from participants on entry to the P-1 trial were obtained from NSABP and stored at −20 °C until used for quality assessment and genotyping. Once a sample was thawed for use, it was stored at 4 ° C until the completion of the project. After quality control procedures were carried out (details are available in http://DunnBK_ DataSupplementAndLegends.pdf at http://sftp://caftps.nci.nih.gov), a total of 249 DNA samples were available to be genotyped for the project.
SNP genotypes were determined using homogeneous MassEXTEND™ assays with primer extension (hME™; Sequenom, San Diego, CA; http://www.sequenom.com) . The hME™ is a high-throughput, multiplex SNP genotyping chemistry, which employs matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) to distinguish between the al-lele-specific extension products [28, 29]. The MALDI-TOF-based technology was chosen because, at the time of study inception, it was the best genotyping technology to enable optimization of the number of loci interrogated while keeping sample use at a minimum; using this technology, each SNP locus was assayed with 2-4 ng of DNA, enabling genotyping for all the polymorphisms listed in Table 1 with the sample quantity we obtained.
Of note, five CYP2D6 polymorphisms were selected for analysis from the literature that existed at the project inception (Table 1). Although the subsequent literature contains numerous reports of additional genomic variants that modify CYP2D6 function, we did not have access to this information at the time the study was planned. Furthermore, the technology available at that time posed challenges to the assay design due to the complexity inherent in the CYP2D6 gene [30-33]. Newer technology subsequently opened the possibility of assaying for additional polymorphisms. Although we considered using the newer methods as a follow-up to our original MALDI-TOF analysis, DNA from the tested P-1 case subjects was no longer available to us. In part this was due to the limited amount of DNA remaining after MALDI-TOF testing from the aliquots originally given to us. Importantly, however, the DNA that we received for analysis had been anonymized before transfer to our laboratory so that even if sufficient sample DNA were remaining for a subset of P-1 cases, these samples could not be linked to the samples used in our earlier analyses. Hence, the statistical analyses described below were applied to MALDI-TOF results for the SNP variants listed in Table 1. Details of the 35 SNP assay designs can be found in http://DunnBK_ TableS1_SNPAssaylnfo35_pub.xls at http://sftp://caftps.nci.nih.gov. Of note, recently Schroth, et al reported using the MALDI-TOF assay system for accurate genotyping of CYP2D6 when there was concern for limited sample quantity or quality .
PCR amplification of short tandem repeat polymorphic fragments (STRPs) and variable number tandem repeat (VNTR) markers were performed using custom primers (Applied Biosystems, Foster City, CA) (details in Table S2 in http://DunnBK_DataSupplementAndLegends.pdf at http://sftp://caftps.nci.nih.gov). The PCR products were detected and separated with an ABI 3730 DNA Analyzer (Applied Biosystems, Foster City, CA) and allele sizing and calling was carried out using GeneMapper 3.0 software (Applied Biosystems, Foster City, CA).
The primary efficacy endpoint in this P-1G3 sub-study of P-1:BCPT is the relative risk (RR) of having a given genotypic state while being a case, i.e. while having developed breast cancer, in the presence versus absence of tamoxifen. Genotypic states are measured in a series of progressively more complex levels, as described below.
All of our P-1G3 analyses included data from 249 of the individuals enrolled in P-1 who developed breast cancer during the course of follow-up in the study. Genetic assessment was performed for 39 variants of 17 genes (Table 1). The variants comprised 35 SNPs and 4 STRP/VNTRs. Each SNP was coded into three possible genetic states representing no variant, one variant copy, and two variant copies. The STRP/VNTRs were coded into all possible states depending on the length in base pairs of each of the alleles; treatment RRs were also measured for individual STRP/VNTRs collapsed into a bial-lelic format that is consistent with the literature (Table 2).
The P-1G3 study design was that of a case-only analysis, in which each incident invasive breast cancer patient was studied by assessing her genotype in relation to whether or not she had been exposed to tamoxifen. This assessment of tamoxifen effect among those within a particular genetic state of SNP or STRP/VNTR was based on the testing of the treatment relative risk comparing the cases randomized to the tamoxifen group to those randomized to the placebo group. Based on Bayes' theorem, and the fact that treatment was randomly assigned in the P-1 study, the expected treatment relative risk for cases in any genetic state would be the same as the overall effect observed in the P-1 trial, specifically 0.5. The confidence intervals for these relative risks and the corresponding tests against the expected population average were calculated exactly under the corresponding binomial distributions. In addition to the assessment of tamoxifen effect within a specific genetic state, testing was performed to determine if there were tamoxifen interactions with any of the SNPs. The Mantel-Armitage test  was used for this analysis.
To control for multiple testing and to reduce false discovery, the polymorphisms were grouped into six families according to the functions of the genes to which they were related (Table 3). Two sets of false discovery rate (FDR) analyses were performed on each family, one for the test of the treatment relative risk within each separate genetic state of the polymorphisms, and another for the tests of tamoxifen interaction. Thus, a total of 12 FDR procedures were conducted. The FDR procedure of Benjamini and Hochberg  was applied to adjust for the multiple hypotheses with a threshold q=0.2. In addition, FDR-adjusted p-values were determined based on the re-sampling method of Yekutieli and Benjamini . Additional analyses of individual markers (primary end-point) and pairs of markers (secondary end-point) were carried out using the chi-square test and Fisher's exact test, performed using R-project software, version 2.5.0.
As a secondary analysis, we created haplotypes for the eight genes in which multiple markers were tested, using the same individual variants used in the Primary Analysis with the STRP/VNTR results converted to biallelic format according to Table 2. Following the construction of haplotypes, the 25 individual variants within the genes characterized were reduced to eight multiallelic loci, shown in Table 4 .
For the work described here, haplotypes were constructed using the PHASE software, version 2.1.1 . PHASE uses an expectation maximization (EM) algorithm to calculate maximum likelihood estimates of haplotype frequencies given genotype measurements which do not specify phase. Unconditional logistic regression analysis was used to calculate a chi-square statistic and the corresponding p-value for the individual locus haplotypes as a measure of association for primary and secondary endpoints described in the Primary Analysis. Logistic regression analyses were conducted using SAS version 8.2 (SAS Institute, Cary, North Carolina). Statistical tests based on haplotype counts, including the chi-square test and Fisher's exact test, were carried out using R-project software, version 2.5.0.
Gene architecture, i.e. the physical linkage of variants within a haplotype, represents only one source of lack of independence of genetic information. Lack of independence can also derive from the interactions of genes within biologic pathways, such as the tamoxifen metabolic pathway. Thus, the complex networks of biologic pathways represent an alternative to physical linkage as a framework by which the interactions of genes can be modeled. In our application of network analysis, we constructed a pathway model based on genes involved in tamoxifen metabolism and used it to test: 1) whether variations in pairs of genes are non-randomly associated in a manner dependent on their relative positions in the pathway; and 2) whether such associations differ in the two treatment arms. To accomplish this, we employed pathway analysis, a type of multiple regression analysis in which the user proposes a set of structured “causal” correlations to measure the strength of relationships between variables of interest, adjusted for other factors in the model [39-42]. In our case, the “variables of interest” were allelic variations in genes in the tamoxifen pathway. We chose to implement pathway analysis by means of Structured Equation Models (SEMs). In developing a model from biological pathways, it is essential to incorporate the adjacent nature of each set of two sequential genes, the strength of correlation of variations in these two genes, and the directionality of this sequential relationship. Influences of a single gene on multiple dependent variables can be measured simultaneously. Given a structure of correlations and genotype data, pathway analysis returns measures of the overall fit of the model, estimates of the magnitude of correlations, the proportion of variance (R2) accounted for by the model for each gene that is a dependent variable, and the significance of hypothesis tests to determine whether the pathway coefficients are statistically different from a given value (i.e. zero, or the value measured in another data classification). SEMs were used to calculate the above statistics for the primary (individual markers) and secondary (pairs of markers) endpoints described in the Primary Analysis. The fit of the pathway model was assessed for each arm, tamoxifen (T-arm) or placebo (P-arm). The parameter estimates for models derived for alternative endpoint states (in our case, the two alternative treatment arms) were tested for significant differences. SEM analyses were conducted using SAS version 8.2 (SAS Institute, Cary, North Carolina). Our application of pathway analysis is described below.
Two alternative approaches were utilized to construct the pathway models used in this study. In the first approach, pathways were constructed based on information compiled from the physiological/pharmacological literature relatingto the genes that encode enzymes involved in tamoxifen metabolism (Figure 1). In the second approach, network information derived from the Pathways Interactions Database (http://pid.nci.nih.gov/) compiled by the NCI Center for Biomedical informatics and Information Technology (NCI CBIIT) and the cancer Biomedical Informatics Grid (caBIG®) was also incorporated into the pathway models. The pathway structures derived from each of these approaches were combined to develop a pathway structure for the tamoxifen pathway. The resulting tamoxifen pathway structure (a priori tamoxifen pathway model), was then used to explicitly model interactions of the individual genotypes observed within the case population in each of the two arms (Figure 2).
The individual or haplotype-based genotypic state of each locus was projected onto the pathway structure (Figure 2) as a basis for implementing our pathway analysis. For this pathway model based on the tamoxifen pathway, the interactions of the observed genotypes were modeled separately for each drug exposure arm (P or T). If the observed correlations between variation in genes in an arm are consistent with the predictions of the pathway model, the statistical analysis will reveal a “good fit”. The actual measurement that reflects how accurately the pathway model predicts the gene variants is the chi-square value for goodness of fit. A “good fit” will be associated with a non-significant chi-square p-value. Conversely, if the pathway model does not predict the correlations between variation in genes in a given drug arm, the analysis should reveal a “poor fit”. The chi-square p-value in this latter situation should be low; if the chi-square p-value <0.01, this p-value indicates a statistically significant difference between the model's prediction and the actual observations.
Genotyping assays were successfully carried out for 39 candidate loci (Table 1). Three loci, CYP2D6_EX4, CYP2C19_G636A and ESR1_ K303R, did not have a minor allele represented in this P-1 case population and thus were not included in the subsequent statistical analyses. The full set of genotype data can be obtained at http://sftp://caftps.nci.nih.gov - http://DunnBK_TableSR1_SNPFinalDataAIISNP35_pub.xls and http://DunnBK_TableSR2_STRPdata_pub.xls.
When individual polymorphisms were analyzed for association with the P- and T-arms, none of the observed associations reached statistical significance after adjustment for multiple comparisons (Table 1 p-value and OR (Cl); additional data available in Tables SR3-SR6 at http://sftp://caftps.nci.nih.gov). CYP2D6_C111T showed a borderline significant association with the T-arm (unadjusted P-value=0.038; Table 1). Similarly, no pairwise combination of polymorphic markers showed statistically significant association with either arm (Table SR6 at http://sftp://caftps.nci.nih.gov).
For each of the eight genes that contain more than one polymorphism, we constructed haplotypes. The only gene with a haplotype showing significant association with either treatment arm was CYP2D6 (p-value=0.045 in the 2Xk chi-square analysis) (Table 4). This borderline 2Xk result shows that at least one of the CYP2D6 haplotypes is associated with a drug arm but does not reveal which of the five observed haplotypes is implicated. The 2X2 chi-square analyses of CYP2D6 haplotypes showed that only the haplotype containing the variant CYP2D6_C1111T was associated with the drug arm, and this association was only of borderline significance (unadjusted p-value=0.068; Table 5).
Pathway analysis was performed by applying polymorphism data from each arm to the model for the TAM pathway (Figure 1), as described in Materials and Methods. The outcomes of the pathway analyses for the TAM pathway in relation to the P- and T-arms appear in the pathway projections in Figure 2. Lines connect genes that encode enzymes that catalyze sequential reactions in the TAM pathway. The number next to each line is a “path coefficient”, similar to a correlation coefficient [40-41], which indicates how strongly the variations in the two adjacent genes correlate with each other according to the pathway model (Figure 2; Table 6; Table SR7 available at http://sftp://caftps.nci.nih.gov). A zero would indicate no (i.e., random) association between the variation patterns in two adjacent genes; thus, the number next to the line reflects the degree of deviation from zero, either positive or negative, according to the model (a red number indicates that the deviation from 0 is statistically significant). Of note, the CYP2D6 gene is the hub of several strong (red numbers) correlations in the P-arm. In the P-arm population a strong correlation is observed between CYP2D6 and CYP3A4, CYP2C9, and SULT1A2. Importantly, additional strong P-arm correlations were observed between adjacent genes that do not involve CYP2D6: between CYP1A1 and CYP3A4, CYP3A4 and CYP2C9, as well as CYP2C9 and SULT1A2.
Finally, pathway analysis integrates these associations between pairs of adjacent genes (represented by the path coefficients in Figure 2) in order to elucidate the patterns of variation at the level of the entire network. The observed correlation values for an arm are then compared to the correlation coefficients predicted by the model. When the observations, taken together as a pathway, are successfully predicted by the model, then these observations are said to show a “good fit” to the model, i.e. the data fit the predictions of the model. In our dataset, pathway analysis demonstrated that our TAM pathway model is consistent with the observed genotype dataset for the P-arm cases. This consistency is reflected in a “good fit”, i.e. there is not a significant difference between the pathway model predictions and the P-arm data-set (chi-square p-value=0.4279; Table 7, Figure 2A). In essence, the TAM pathway model that we constructed successfully predicted the composite set of genetic variations in cases that had taken placebo. This contrasts with results for pathway analysis of the data from the T-arm in relation to the TAM pathway model (Figure 2B). Here, the pathway model predictions do not fit the observed patterns for allelic variations in the T-arm dataset, i.e. the goodness-of-fit is poor, and there is a significant difference between the model and the observations (chi-square p-value=0.0090; Table 7).
In our pharmacogenomic analysis, we investigated in a step-wise, increasingly complex genetic fashion, the interactions between candidate polymorphisms and tamoxifen exposure among P-1 breast cancer cases. This case-only analysis revealed that only one of our genomic variants (CYP2D6_C1111T) was associated (marginal uncorrected statistical significance) with treatment arm at either the individual or the haplotype level. Thus, none of the polymorphisms or haplotypes, when tested individually or in pairs, showed a significant association with treatment arm. This limited association can be explained by: 1) In most cases, our candidate polymorphisms were selected from epidemiologic studies which involved estimates of population-based breast cancer risk, as opposed to the impact of pharmacogenomic interactions on risk. Our assumption was that genes shown to affect cancer risk would also influence the effect of gene-environment (e.g. drug) interactions on breast cancer risk; 2) The small P-1 case sample size (288), of which only 249 were available for analysis, also contributed to the lack of detection of statistically significant interaction at these levels of analysis; 3) Individually the tested polymorphisms were expected to have minor impact on phenotypic outcomes (risk or drug interaction), which likely contributed to inconsistencies in detecting risk associations in prior studies.
In contrast to the primary (SNP) and secondary (haplotype) analyses, pathway analysis is an additional approach which could potentially amplify the small risk associations occurring at the level of individual genetic variants and haplotypes by elevating the analysis to the level of a network. Multiple polymorphisms are expected to influence each other's effects and/or to show interaction with environmental factors (i.e., drug exposure) in relation to a specific shared outcome (breast cancer occurrence). The genetic polymorphisms are related to each other through a global “causal” network, or pathway. Since all the genes occupy specific places in a pathway, the biological effect of each genetic factor is not independent of the others; the genes are epistatically related to each other. An advantage of the pathway method is that it reduces the statistical complexity by limiting statistical measurements to gene-gene interactions specified by the network. Our application of pathway analysis, using SEMs [39-42], incorporated constitutional genomic variations. These candidate genomic variants had previously been shown to have a strong positive, i.e. “causal”, relationship with increased breast cancer risk, presumably via differences in the activity of their encoded enzyme products. One way to view these genotypic variants is as “perturbers” of the functioning of a metabolic pathway network . We investigated how these genomic variations perturbed responsiveness to tamoxifen at both the individual gene level and the pathway network, or systems, level . The pathway approach enabled us to analyze the simultaneous perturbation of tamoxifen response by all the co-existing genetic variants in the study. For example, tamoxifen was shown to reduce the risk of breast cancer in P-1:BCPT high-risk women exposed to this drug. However, it is possible for some individuals to carry minor alleles that perturb responsiveness to tamoxifen in a manner that prevents an effective response. These women could develop breast cancer despite receiving tamoxifen.
A priori, given that there was a preventive effect of tamoxifen in the overall P-1 trial, we would expect the cases in the P-arm to more closely resemble the overall P-1 population than the T-arm cases with regard to the allelic state of genes encoding tamoxifen-metabolizing enzymes. According to the results of our pathway analysis, the composite genotype dataset in the P-arm is consistent with the TAM pathway model, i.e. the observed data are not significantly different from the predictions of the model (chi-square p-value=0.4279). In contrast, the pathway is disrupted in women with breast cancer in the presence of tamoxifen such that the T-arm cases have different patterns of genotypes than the P-arm cases. Because P-1:BCPT was a randomized trial, we expect the P-arm and T-arm populations in the overall trial (i.e. cases and controls) to have the same distribution of genotypes associated with tamoxifen responsiveness and resistance. Thus, some of the alleles seen in P-arm cases would have conferred responsiveness to tamoxifen, had these women received the drug. Tamoxifen recipients who have such tamoxifen-responsive alleles, i.e. the common alleles that encode the normal metabolizing enzymes that successfully activate this pro-drug, were more likely to benefit from tamoxifen exposure, i.e., to have experienced prevention of breast cancer; they would have been selected out of the case population, leaving them under-represented among all P-1:BCPT participants who developed breast cancer, and thus, eliminated from the T-arm in this case-only study. The allelic combinations that appear in the T-arm cases would be enriched for genomic variants that encode enzymes that conjointly confer a decreased ability to metabolize tamoxifen; the T-arm cases represent a tamoxifen-resistant population. This could explain why the TAM pathway allelic observations in this ta-moxifen-resistant population do not fit the model described by the TAM pathway; in fact, there is a statistically significant difference at this network level between the observations and the predictions of the TAM pathway model for the tamoxifen-treated arm (chi-square p-value=0.0090).
The TAM pathway gene that has emerged as encoding the most important single enzyme involved in tamoxifen activation is CYP2D6, a highly polymorphic gene. An extensive pharmacogenetic literature documents the many CYP2D6 genotypes encoding isozymes with reduced activity [30-33]. Decreased activity results in poor conversion of tamoxifen to its most active metabolite, endoxifen (4-hydroxy-N-desmethyl tamoxifen) [45-47]. Women with low or absent CYP2D6 activity (due to either perturbation by an inherited CYP2D6 variant or concomitant treatment with a drug that suppresses CYP2D6 activity [32-33, 48] exhibit lower levels of endoxifen which, in turn, might be expected to impact clinical outcomes. Such an effect of CYP2D6 has been observed in several studies [4, 7, 9, 49-51]. For example, in these studies homozygosity for poor metabolizing CYP2D6 genotypes was shown to correlate with worse outcomes such as higher risk of relapse, shorter time to recurrence, and worse relapse-free survival in women with breast cancer treated with adjuvant tamoxifen [4, 7, 9, 50, 52]. As a result of such accumulating data supporting associations between CYP2D6 activity and clinical outcome, CYP2D6 genotype is beginning to be considered by some clinicians in making therapeutic decisions regarding tamoxifen use in the clinical treatment setting. However, in contrast to the findings of positive associations, a number of other studies have shown no correlation or even lower recurrence risk in tamoxifen-treated patients with poor metabolizing CYP2D6 variant status [53-57]. Among these, the updated results of an ongoing multi-center study conducted by the International Tamoxifen Pharmacogenomics Consortium, presented at the 2009 San Antonio Breast Cancer Symposium, showed no difference in clinical outcome in relation to CYP2D6 metabolizer phenotype [58, 59]. The clinical significance of CYP2D6 variants is therefore still a topic of debate among oncologists who prescribe tamoxifen in the treatment setting [8-12].
Given these contradictory findings about the impact, if any, of CYP2D6 allele status on clinical outcome, the role for factoring CYP2D6 genotype into therapeutic decision-making regarding tamoxifen requires further clarification in large prospective randomized clinical trials . The NSABP P-1:BCPT offers a cohort of cases nested within such a large prospective randomized trial (13,388 participants), in this case among high-risk women. Furthermore, the inconsistency of the data regarding the impact of polymorphisms in CYP2D6 analyzed as a single gene on clinical outcomes in breast cancer patients [8-12] provides a rationale for evaluating alternative analytic strategies. Our systems approach, by incorporating multiple relevant genes taken together as a network, offers a meaningful alternative to the pharmacogenomic analysis of tamoxifen in relation to breast cancer at the single gene level. The inconsistency among study outcomes is undoubtedly due to multiple factors, both exogenous and endogenous, particularly genetic factors, that obscure the effect of variation in the single gene, CYP2D6. The systems approach to genomic analysis that we present in this paper, in providing an alternative to single gene pharmacogenetic analysis, has the potential to override the limitations inherent in focusing on one gene, even when that gene, like CYP2D6, encodes the dominant enzyme in the tamoxifen pathway. Pathway analysis offers a global analytic approach that recruits the small but additive effects of variants in multiple contributory genes that metabolize tamoxifen.
Our search for relevant CYP2D6 variants predated the current pharmacogenetic literature and was based on the then-available reports that focused on association of polymorphisms with breast cancer risk. In our candidate gene approach, five such CYP2D6 polymorphisms (Table 1) emerged from this literature search, and we tested all of them in our MALDI-TOF assay. One of the tested variants, CYP2D6_EX4, was not observed in our P-1G3 case population and was excluded from our statistical analyses. Of the four CYP2D6 variants that were subjected to statistical analyses, CYP2D6_C1111T was the only one found to exhibit significant associations with study arm. These associations were apparent when CYP2D6_C1111T was tested as a single variant or within a CYP2D6 haplotype defined for all four tested polymorphic sites (Table 4). The C1111T variant is found in haplotypes with decreased enzymatic activity and thus may be associated with resistance to tamoxifen through its failure to convert tamoxifen efficiently to endoxifen (http://www.cypalleles.ki.se/cyp2d6.htm, last accessed June 22, 2010). This mechanism could explain our statistical observations regarding CYP2D6_C1111T. Thus, the three observed C1111T alleles all occurred in the T-arm subjects, where their negative impact on endoxifen production would be expected to allow breast carcinogenesis despite the presence of the pro-drug tamoxifen.
The recent literature on CYP2D6 genotyping in relation to enzymatic function and clinical outcome reports numerous additional genomic variants that were not in the epidemiologic literature used to plan this study. Based on these updated reports we attempted to obtain additional P-1:BCPT case DNA for more complete CYP2D6 genotyping. This proved impossible because the quantity of DNA remaining in the original aliquots after our genotyping was insufficient for additional testing, and the sample anonymization required for human research subject protection prevented our matching new samples to the residual DNA.
In our analysis of the P-arm in relation to the TAM pathway, we observed that allelic variation in CYP2D6 had nonrandom associations with alleles in multiple TAM pathway genes. The CYP2D6 gene was a hub at the center of strong associations between genotypic variations only in this arm (Figure 2A). The tight correlations between allelic variation in CYP2D6 and allelic variation in adjacent pathway genes (CYP3A4, CYP2C9, and SULT1A2) are depicted as red numbers in Figure 2A. However, this cluster of nonrandom associations centered on CYP2D6 is only seen in TAM pathway analysis of the P-arm. This hub of associations disappears in the T-arm, implying a random association of CYP2D6 with specific variants in its neighboring genes (Figure 2B). This difference between the two arms with respect to the central “hub” nature of CYP2D6 suggests that the tight allelic correlations seen only in the P-arm are a major contributor to the goodness-of-fit, i.e. agreement, that the P-arm shows with the TAM pathway model. The emergence of CYP2D6 as a hub of activity in the P-arm but not the T-arm in our TAM pathway model is consistent with the key role played by this gene in tamoxifen metabolism. In this respect, our findings at the pathway, but not the individual gene, level concur with data from studies supporting an impact of CYP2D6 genotype on clinical outcomes [4-5, 7, 9, 50, 52]. The ability of pathway analysis to override the null associations seen at the individual gene level offers a possible reconciliation of the contradictory findings from other studies, all of which incorporated CYP2D6 variation in a single-gene approach.
Beyond the CYP2D6 hub, path analysis has elucidated other key foci of genetic variation. The correlations between variation in three additional pairs of adjacent genes, CYP1A1-CYP3A4, CYP3A4-CYP2C9, and CYP2C9-SULT1A2, were also significant. None of these associations involves CYP2D6, suggesting that these other pathway enzymes are meaningful contributors to tamoxifen activation. Again, these non-CYP2D6 correlations may partially explain the negative results in some of the studies that focused solely on the effect of slow-metabolizing variants of CYP2D6 on clinical outcome.
In conclusion, we have evaluated correlations between variants in multiple genes simultaneously, viewing them as a composite system. We have shown that by treating the entire system as a marker of drug response, the limitations inherent in evaluating individual genetic variants, each conferring a minor effect on tamoxifen response, may be overcome. Our underlying candidate gene approach was critical to this process, with variants identified from preexisting biological literature offering a qualitative framework on which to quantitatively model actual data from the P-1 cases. Our application of pathway analysis to the genetic network underlying tamoxifen metabolism (TAM pathway) has shown the ability to discern differences between the cases exposed to the two drugs (tamoxifen versus placebo), despite the evidence for non-significant or only marginally significant distinctions based on less complex statistical genetic comparisons. We have also demonstrated that CYP2D6 plays a key role in tamoxifen activation and clinical response at the level of pathway analysis, observations that fit with the extensive recent literature documenting its importance in metabolizing tamoxifen to its active form. Beyond reinforcing the centra I ity of CYP2D6 to tamoxifen response, our pathway analysis strongly suggests that specific combinations of allelic variants in other genes, including CYP1A1 with CYP3A4, CYP3A4 with CYP2C9, and CYP2C9 with SULT1A2, also make major contributions to the tamoxifen-resistance phe-notype.
Our observations suggest directions for pursuing more detailed analysis of pharmacogenomic interactions involving the TAM pathway. Thus, four critical foci in the TAM pathway (at CYP2D6, CYP1A1-CYP3A4, CYP3A4-CYP2C9, and CYP2C9-SULT1A2) offer a starting point for future investigations to dissect apart the specific allelic interactions that underlie these correlations. This should allow us to build more reliable genetic classifiers of tamoxifen response which could be used to offer an individualized approach to tamoxifen chemoprevention by identifying and treating only those women who are most likely to benefit from its administration.
The authors would like to thank Leslie Ford for facilitating the collaboration with NSABP; Soon-myung Paik (NSABP) for providing the samples; Alan Hoofring for graphical contributions; Carl Schaefer, Jinghui Zhang, and Sol Efroni for excellent discussions and critical reading of the manuscript; and David Flockhart for input regarding CYP2D6.
This study was supported in part by Public Health Service Grants No. U10-CA-37377, U10-CA-69974, U10-CA-12027, U10-CA-69651, and U24-CA-114732 from the National Cancer Institute, National Institutes of Health, Department of Health and Human Services.