We have designed a novel SNP panel, containing 3404 genetic variations associated with 983 genes involved in a variety of cellular functions that could impact population variations in tumor progression and response (Table and [
28]). This approach is distinct from using genome-wide SNP arrays of 500,000 SNPs. The Affymetrix 500K SNP Panel is based on restriction enzyme cleavage sites and representative spacing on the chromosomes. While having significantly greater content, over 90% of the SNPs on the whole genome array are intragenic; and the chip is most often analyzed for linkage associations. The multiple comparison false positive error rate is large, and the technology considerably more expensive. Indeed, of the 3404 gene-associated SNPs on the BOAC SNP panel, only 401 are contained on the 500K SNP panel.
There are limitations to the BOAC SNP panel as well. The public and Affymetrix databases used to construct the chip content are constantly updating, so that missing elements may be noted. While we targeted SNPs in non-synonymous coding sequences or highly conserved regulatory sequences, many of the SNPs have not yet been functionally documented for effects. As such, SNP associations in the BOAC panel represent a first step in exploring the genome for clinically relevant genetic variations that will require both extensive validations as well as functional assays to confirm their effect.
We made a considerable effort to ensure that quality controls were in place. The Affymetrix platform provided a high call rate (96%) as well as very high concordance in replicate samples, even those run at different facilities. The concordance extended to 786 SNPs on the panel that were documented for the Coriell cell lines we have included [
10]. All of the samples we analyzed had high quality DNA (A260/280 ratios > 1.7, and little DNA degradation). In subsequent unrelated studies, we found that even highly degraded DNA provides robust, high call rates and reproducibility (not shown); probably because the initial amplifications are across 100–150 bp of DNA. The most likely source for quality control error may come from sample misidentification or placement in multi-well plates. To control for this, we routinely incorporate randomly positioned controls and replicates.
Within the Coriell cell line panel is a distribution of racial groups. It is striking how much allelic frequencies differ in the African American vs. Caucasian racial groups. It is likely there is more refinement of allelic variations associated with more geographical based lineages [
33], as racial definitions are somewhat subjective and often self reported. Importantly, as the BOAC database increases, multiple comparisons can be done with appropriate corrections for allelic variations among races. It will be important to include the full spectrum of patients as the database expands.
Disease progression, response and survival vary widely among patients. There are a number of studies that have examined variations in tumor cell chromosomal abnormalities [
5] and gene expression profiles [
6-
8]. The evidence strongly suggests that patient outcomes are impacted by these tumor cell variations. However, patient populations show considerable germline variation that could influence the microenvironment, immune status, and drug metabolism or transport. For example, the authors (DJ, GM) have presented evidence that germline variations in
GSTP1 show alterations in melphalan metabolism, and have been associated with different outcomes in patients receiving high dose melphalan therapies [
34]. Numerous examples of variations in drug metabolism, transport, and DNA repair have been documented, with emerging associations on therapeutic outcomes.
Our approach was to provide a more global germline analysis that was driven by bioinformatic searches for potentially relevant variations in multiple genes and gene functions. This is still an exploratory approach to identify potential variations of functions that impact upon therapeutic responses and disease progression that may result in differences in survival outcomes. Rather than a linear progression of survival, we chose to examine two extreme ends of the PFS spectrum, to maximize the first steps in identifying potential functional variations. Patients were stratified by short (< 1 year) versus longer (> 3 years) PFS groups. Nevertheless, it is likely that survival is a complex endpoint resulting from both tumor progression and therapeutic failure that may impact upon multiple organ systems. Moreover, we recognized that a) tumor variation among patients may have dominant effects that are associated with survival; b) the trials we examined used multi-drug regimens, and each drug response may be impacted upon by complex genetic variations in transport, metabolism, and export; and c) sample number is still limiting statistical power. Thus, our initial approaches in this study were to determine whether germline variations had any measurable influence on survival.
We felt it was important to determine, first, if there were any true discrimination of the SNP panel in the two PFS groups, when the complete SNP profile was considered. Using a variety of methods that were tested against randomly mixed sample analysis, we found the SNP panel had true signal to discriminate the short and long progression-free survivors, although the accuracy did not reach the level of prediction that would allow clinical application. Notably, a smaller subset increased the predictive power. Significantly, no individual genetic variation provided a strong, independent prediction of survival. This likely reflects the fact that individual germline variations may impact upon response, but are not solely responsible; and it is likely that such variations are the result of complex interactions. Indeed, genetic variations in the tumor cell may play a dominant role in response and survival. Thus, patient responses are likely to involve interactions affecting multiple functions within the tumor cell as well as external factors affecting tumor progression and drug response. Nevertheless, our analysis of the SNP panel as a group suggests it is likely that germline variations impact upon patient survival and deserve further attention.
Recognizing the limited statistical power to detect single SNPs associated with PFS, we did perform a univariate analysis to rank order the SNPs that individually best discriminated the groups in the two similar phase III clinical trials. We did not correct for multiple comparisons, which would certainly reduce the
p-value significance but would not alter the rank order comparison. This approach also assumes association for the individual SNP. It is more likely that complex multi-SNP groupings influence response. Nevertheless, among the top SNP variations in both trials were those associated with drug metabolism/detoxification/transport, including: cyp genes, multiple variants of
GSTA4,
SLCO,
UGT1,
NAT2,
ABCB genes; as well as genes impacting cellular response, including:
BMP2 (inducing myeloma apoptosis), cathepsin B (inducing IL-8 dependent cellular migration and angiogenesis [
31,
32],
XRCC5 (DNA repair); and genes associated with proliferative responses (
PCNA,
MAPK, cyclin kinase). The association of multiple alleles of
GSTA4 is particularly compelling, suggesting consistency in its impact across several variant alleles. In addition, several alleles are in linkage disequilibrium, appearing as a cluster in the list – providing quality controls (as linked genes would be expected to show the same association). Surprisingly absent from the SNP association lists are cytokines, growth factors and receptors that might be expected to cause variations in disease progression and resistance, with the exception of
IL-10, which has been reported in previous studies [
35].
While still an exploratory analysis, the paired SNPs identified by recursive partitioning in each trial have some intriguing possible connections to PFS. COMT (catechol-O-methyltransferase) metabolizes catechol drugs, and has been linked to breast cancer risk and survival [
36]; GHRL has been shown to stimulate angiogenesis [
37] and regulate bone formation through osteoblasts [
38,
39]; FDFT is the farnesyl transferase that may regulate important signaling (eg, ras) [
40,
41]; and ABCC is among a class of transporters that may influence multi-drug resistance [
42]. It is noted that strong association in one trial was significantly reduced in the validation trial. Nevertheless, the functional impact of these genetic variations may warrant further investigation.