We first conducted quality control analysis for all GWAS SNPs included in this study. Quantile-quantile plots of the negative logarithm of the genome-wide P
values and genomic control (GC) λ values computed based on all GWAS SNPs indicated no global variance inflation (genomic controls λ between 1 and 1.02 for all analyses, Supplementary Fig. 1
), excluding the possibility of inflated type-I error rates in the pathway analyses.
Among the 591,928 SNPs from the original GWAS, we analyzed 19,082 SNPs mapping to 917 genes from the HuGe navigator list (13
). Given the large number of genes and the variable degree of involvement of these genes in the inflammation-related function, we used HuGE Literature Finder to assign each gene a score, reflecting the strength of evidence for association with inflammation (Supplementary Table 1
and Supplementary Methods
), and tested the pathway association integrating this weighting for the genes. The list of the 917 genes and the corresponding SNP P
-values are reported in Supplementary Table 1
. After applying a Bonferroni correction, we found a strong association of the pathway for SQ cases ( and Supplementary Table 2
=0.0004, ever smokers), suggesting the existence of SNPs truly associated with risk in this histologic subtype. An analysis without the HuGE-based weighting provided similar results, although with weaker associations (SQ P
= 0.001, Supplementary Table 2
). No statistically significant association was found for AD or SC risk following either the weighted or un-weighted approach ( and Supplementary Table 2
). Therefore, we restricted further analyses to the SQ subtype.
Pathway-level associations in the NCI GWAS by histology and smoking status
We chose the 55 SNPs in the 55 genes with the strongest evidence for association in SQ (gene-wise P
-value <0.05) and performed replication in two independent studies of European ancestry, including UK1 (8
), with 592 SQ cases and 2,699 controls from the 1958 Birth Cohort (WTCCC) (14
), genotyped using Illumina HumanHap550 arrays; and Texas (5
), with 306 SQ cases and 1,137 controls, genotyped using Illumina HumanHap300 arrays.
Among the 55 SNPs, only rs6489769 was consistently replicated in both studies, with a combined P
for the association with SQ risk (). The SNP marker, rs6489769, maps to chromosome 12p13.33 (943,226 bps). The pathway analysis in NCI SQ data after the exclusion of the SNPs at 12p13.33 showed a pathway-level P
= 0.0008. We then replicated the rs6489769 SNP in a third independent sample, UK2 (15
), with 1,038 SQ cases and 933 controls genotyped using Illumina Infinium custom arrays. The association was confirmed also in the third sample (). Although these case-control series were smaller than the discovery dataset, each had the statistical power to replicate the signal of the NCI discovery set (OR=1.23) at one-sided p<0.05 (statistical power for UK1=0.92, UK2=0.93, and Texas=0.70). Combining data from all four studies, the association was statistically significant on a genome-wide basis with P
, two orders of magnitude below the Bonferroni corrected P
-value threshold for 19,082 SNPs (0.05/19,082 SNPs, P
) and odds ratio = 1.20 (95% confidence interval = 1.12–1.28; Phet
= 0.89, I2
Replication and meta-analysis of 55 top SNPs for squamous cell lung carcinoma in NCI, UK1 and Texas.
Summary data for the 12p13.33 SNP rs6489769 associated with squamous cell lung carcinoma risk
We verified whether the association with SQ risk for this SNP was modified by pack-years of tobacco smoking in the NCI GWAS, but found very similar results across smoking strata (Supplementary Table 3
). We also investigated in EAGLE (441 SQ cases and 1319 controls) (16
) whether the association between rs6489769 and SQ was confounded by chronic obstructive pulmonary disease (COPD) status, but found no major changes in the adjusted data (data not shown). rs6489769 was not significantly associated with COPD in lung cancer cases (P
=0.67, OR=0.97). Since only 131 of the controls had documented COPD, a larger study of cancer-free COPD patients is required to robustly examine the impact of this SNP on COPD risk.
To explore the 12p13.33 region further, we imputed unobserved genotypes in SQ cases and controls in NCI SQ data using HapMap Phase III and 1000 Genomes Project data but did not identify any stronger association at 12p13.33 than that provided by rs6489769. This locus harbors the RAD52 gene, which is involved in homologous recombination (HR). We examined whether other genes in the HR pathway or the overall DNA repair pathway influence SQ risk. None of the other 17 HR genes (involving 142 SNPs) showed an association with gene-wise P
<0.05 (Supplementary Table 4
). In the analysis of the overall DNA repair pathway including 1,410 SNPs mapping to 136 genes (Supplementary Table 5
), we observed a modest pathway-level association in the NCI SQ data including (P
= 0.006) or excluding (P
= 0.04) the RAD52
SNPs, and the only SNP with P
-value < 0.001 was rs6489769.