The initial genome-wide scan was performed for 307 260 SNPs that passed strict quality control measures. In the discovery phase, we observed 31 751 SNPs that met our selection criteria (P
< .05; among them, 20 with P
and one with P
) for the analysis focused on stage III and IV NSCLC patients (). When the analysis was restricted to only stage IIIB (wet) and IV patients, 30 258 SNPs (P
< .05; among them, 15 with P
and three with P
) were identified (). In a sensitivity analysis, we performed two separate analyses (one with clusters from PLINK and the other with eigenvectors from EIGENSTRAT) to assess the potential effect of population substructure using multivariable Cox regression analysis. The data indicated that the patient population is relatively ethnically homogeneous and the observed associations were not driven by potential population substructure (Supplementary Table 2
, available online).
Figure 1 Genome-wide association study results for overall survival by chromosome in the MD Anderson discovery population. Associations are expressed as −log10(P). P values were from multivariable Cox proportional hazards models and were two-sided. A) (more ...)
To determine which associations identified in the discovery phase were robust, we identified 60 SNPs (Supplementary Table 1
, available online) to perform a fast-track validation study using an independent NSCLC patient cohort from the Mayo Clinic following the same eligibility criteria as the discovery population (). As summarized in , under the additive model, rs1878022 was statistically significantly associated with poor overall survival in the MD Anderson discovery population (hazard ratio [HR] of death = 1.59, 95% CI = 1.32 to 1.92, P
= 1.42 × 10−6
), and the difference in survival time was dependent on the number of variant alleles carried by the patient (P
= 0.001) (). This association reached borderline statistical significance in the Mayo Clinic validation population (HR of death = 1.16, 95% CI = 0.95 to 1.41, P
= .15) but was associated with survival time differences in patients with different genotypes (homozygous common genotype [TT] vs heterozygous genotype [TC] vs homozygous variant genotype [CC], P
= .04) ().
Characteristics of study populations
Summary results for rs1878022 analysis*
Figure 2 Kaplan–Meier curves of overall survival in non–small cell lung cancer patients who received platinum-based chemotherapy by rs1878022 genotype. Overall survival is shown for A) the MD Anderson discovery set, B) the Mayo Clinic validation (more ...)
Another candidate SNP, rs10937823, was associated with statistically significantly poorer survival in the MD Anderson discovery dataset (HR of death = 2.40, 95% CI = 1.67 to 3.43, P = 1.80 × 10−6), the Mayo Clinic validation dataset (HR of death = 1.45, 95% CI = 1.02 to 2.08, P = .04), and the combined MD Anderson and Mayo Clinic dataset (HR of death = 1.82, 95% CI = 1.42 to 2.33, P = 1.73 × 10−6). In the combined dataset, the median survival time for individuals with the common homozygous genotype (16.05 months) was statistically significantly longer than the mean survival time for those carrying the variant-containing genotypes (10.72 months, P = 6.76 × 10−5).
In a sensitivity analysis, we repeated the analysis for rs1878022 by restricting to all patients who had chemotherapy treatment before 2004, 2005, and 2006 (ie, those patients who had at least 5, 4, and 3 years of follow-up). The associations between overall survival and rs1878022 were very similar to the overall analysis, and for the Mayo study, we observed stronger association with poor overall survival when the follow-up was longer. We also assessed the effect on overall survival of rs1878022 in patients who received radiotherapy as part of their treatment regimen. In both the MD Anderson and Mayo Clinic populations, the results were similar between the two treatment groups (MD Anderson: chemotherapy and radiotherapy, HR of death = 1.50, 95% CI = 1.16 to 1.94, vs chemotherapy only, HR of death = 1.64, 95% CI = 1.22 to 2.20; Mayo Clinic: chemotherapy and radiotherapy, HR of death = 1.17, 95% CI = 0.90 to 1.52, vs chemotherapy only, HR of death = 1.23, 95% CI = 0.89 to1.69) with overlapping 95% confidence intervals.
To further provide evidence that these genetic loci are associated with poor overall survival in patients receiving platinum-based chemotherapy, we analyzed rs1878022 and rs10937823 in patients enrolled in the PLATAX clinical trial. Under the dominant model, rs10937823 was non-statistically significantly associated with poor overall survival in the PLATAX validation population (HR of death = 0.96, 95% CI = 0.69 to 1.35, P = .84). Also, rs1878022 was validated in the PLATAX population (HR of death = 1.23, 95% CI = 1.00 to 1.51, P = .05) (). However, a statistically significant difference in median survival times (log-rank P = .04) was evident based on the number of variant alleles. Patients with the common genotype had a mean survival time of 10.76 months compared with 8.03 and 8.91 months for patients carrying the heterozygous genotype and the variant genotype, respectively (). This association was stronger in a pooled analysis of the two validation datasets (HR of death = 1.22, 95% CI = 1.06 to 1.40, P = .005) and was highly statistically significant overall using data from all three studies (HR of death = 1.33, 95% CI = 1.19 to 1.48, P = 5.13 × 10−7).
SNP rs1878022 is located on chromosome 12q23.3 within the intron of the chemokine-like receptor 1 gene (CMKLR1). Haplotype analysis of the genomic region surrounding rs1878022 indicated rs1878022 was not in high linkage disequilibrium with any neighboring SNPs, thus rs1878022 is not located in any major haplotype blocks within this region of the chromosome (). Imputation of HapMap SNPs within this genomic region confirmed these findings, and no other SNPs demonstrated a statistically significant association with poor overall survival in NSCLC patients as calculated using this method.
Figure 3 Linkage disequilibrium structure and association of observed and imputed single-nucleotide polymorphisms (SNPs) surrounding rs1878022 on chromosome 12. The linkage disequilibrium structure was created with the GOLD heat map Haploview 4.0 color scheme (more ...)
We created two prediction models based on 1-year survival for the pooled dataset to investigate the clinical relevance of rs1878022. The AUC based on the clinical and epidemiological variables (age, sex, clinical stage, and pretreatment performance status) was 69.1%, demonstrating reasonable predictive power. The addition of the single SNP to the model increased the AUC to 70.5%. Results based on 1000 bootstrapping samples showed that the distribution of the difference of the AUCs has a 95% bias-corrected confidence interval of 0.4% to 3.4%, indicating a statistically significant improvement in prediction of decreased overall survival in NSCLC patients after adding rs1878022 to the model.