Non-Hodgkin lymphoma (NHL) is a heterogeneous group of neoplasms of B- and T-cells that vary in their causes and molecular profiles1
. With the fifth highest incidence amongst all cancers in the U.S., the annual incidence of NHL has doubled since the 1970s. With the increasing evidence supporting the importance of genetic determinants in lymphomagenesis2
, there is a strong impetus to identify genetic risk factors. Epidemiological and biological evidence suggest that environmental and genetic risk factors differ for the common NHL subtypes, follicular (FL), diffuse large B-cell (DLBCL) and chronic lymphocytic leukemia (CLL)/small lymphocytic lymphoma (SLL)1
. We therefore conducted genome-wide association studies (GWAS) using separate DNA pools from 189 FL, 221 DLBCL, and 148 CLL/SLL cases and 592 controls (NC1 sample set) from a larger San Francisco Bay Area NHL case-control study3
to identify subtype-specific NHL susceptibility genes (for study design see Supplementary Fig. 1
; for description of study populations see Supplementary Table 1
). We restricted genotyping to DNA collected from individuals with European ancestry as determined by AIMS genotyping4
to diminish potential underlying population stratification. Self-reported ethnicity and ancestry data were highly correlated (95%) and used to construct homogeneous DNA pools of participants of European descent.
In the first phase, pools were hybridized to Human Hap550v.3 BeadChips (Illumina, San Diego, CA), and SNPs were ranked after adjusting for pooling error5
. The top 30 ranked SNPs for each NHL subtype were subsequently individually genotyped across the NC1 sample set to confirm the accuracy of estimated allele frequency differences from the pooled data. 87% of raw allelic p
-values were <0.05 (Supplementary Tables2a–c
), and genotype frequencies did not significantly differ from Hardy-Weinberg equilibrium. 32 SNPs with subtype-specific allelic q
-values (corrected p
) <0.05 were subsequently genotyped in an independent set of 89 FL, 159 DLBCL, 135 CLL/SLL and 363 other NHL cases and 820 controls from the same study population as NC1 (NC2 sample set). Joint analyses of the SNP data on all study participants revealed five SNPs associated with FL (rs6457327, rs2517448, rs13286028, rs11158098, rs16940565) and two with DLBCL (rs9936269, rs29605) at q
<0.05 (Supplementary Table 3
); no CLL/SLL SNPs were significant at q
<0.05. As rs6457327 and rs2517448 were in complete LD, rs2517448 was excluded from further testing.
In the second phase, we performed validation genotyping for these six SNPs in a German case-control study6
(G1) comprising 87 FL, 152 DLBCL, 102 CLL/SLL and 153 other NHL cases and 669 controls, where rs6457327 was associated with decreased FL risk (OR=0.28, p
=0.01, ). No other associations were validated. In the third phase, rs6457327 was validated in two additional independent NHL case-control studies () that included 108 FL cases and 685 controls (NC3)7
from the San Francisco Bay Area, and 172 FL cases and 611 controls from Canada (C1)8
. The combined p
-value for rs6457327 and risk of FL across all four studies was 4.7×10−11
for the allelic model and 1.9×10−10
for the Cochran-Armitage trend test. These p
-values are lower than the threshold for genome-wide significance (Bonferroni corrected for 500,000 SNPs×4 genetic models×3 disease outcomes with α=0.05, 8.3×10−9
). Because some evidence suggests shared genetic factors among NHL subtypes2
, we also evaluated whether rs6457327 was associated with CLL/SLL or DLBCL in the combined sample from all study populations. No association was found for CLL/SLL, although a modest association was observed with DLBCL (allelic p-
, trend p
Association results for rs6457327 and follicular lymphoma (FL) and diffuse large B-cell lymphoma (DLBCL) in the San Francisco Bay Area (NC1/NC2, NC3), German (G1) and Canadian (C1) studies.
Respectively, rs6457327 and rs2517448 are 5 and 16kb downstream of the 3′ UTR of the C6orf15 (STG)
gene, telomeric to HLA-C
on chromosome 6p21.33 in the major histocompatibility complex. This 300kb region has been extensively evaluated due to its association with psoriasis, where the HLA-C
gene was identified as a strong psoriasis susceptibility locus (PSORS1
; psoriasis susceptibility region 1)9
. History of psoriasis has been associated with increased risk of T-cell lymphoma; however, this association may be attributable to psoriasis treatment with immunosuppressive agents rather than family history10
. Here, we found little LD between rs6457327 or rs2517448 and SNPs in HLA-C
<0.35; Supplementary Fig. 2a
). Although our data do not suggest that linked HLA-C
SNPs are driving the association, we cannot conclusively rule out common genetic associations between the two diseases.
To explore the genetic interval containing the associated variants, we genotyped 52 additional SNPs within 30kb of rs6457327 in the NC1/2 sample, where we found 13 additional markers with q
-values <0.05 (, Supplementary Table 4
). The strongest signals remained for rs6457327 (allelic p
) and rs2517448 (allelic p
). Five neighboring associated SNPs were correlated with the two top SNPs (r2
>0.6, Supplementary Fig. 2b
). All of the associated SNPs lie within a 26kb block of high LD that covers STG
and expands 23kb downstream (Supplementary Fig. 2c
is the only gene overlapping this block, and no other associated SNPs were found outside this block.
P-values for association testing of SNPs in the 6p21.33 region
We also imputed SNP genotypes within 30kb of rs6457327 in the NC1/2 sample, confirming that the main signal lies in the 26kb block containing STG
. Twelve imputed SNPs showed trend p
(Supplementary Table 5
); 11 of these were located within the block () in a low recombination region (). We used conditional logistic regression to evaluate whether a single SNP could account for all observed association signals in this region. Conditioning on any of the three most statistically significant polymorphisms (rs6457327, rs2517448 and the imputed SNP, rs3132562) in an additive test of association abolished the association signals from all other markers (Supplementary Table 5
), suggesting that a single locus may be associated with FL in the PSORS1
region. We found no evidence of an epistatic effect on FL risk, and the results of haplotype analyses did not provide additional information. Because these three SNPs are in complete LD (r2
=1 in HapMap-CEU), and recombination rates in this region are low relative to the genomic average12
, their effects could not be unambiguously separated, and we could not identify a more restricted associated region.
was originally described as a taste-bud specific gene in rhesus monkeys, though STG protein function in humans is unknown. STG
has previously been reported to be highly expressed in multiple hematopoietic tissues11
. We examined STG
expression in human whole blood and lymphoblastoid cell lines revealing expression of an unspliced STG
transcript, whereas a spliced STG
transcript was found in tonsil tissue (Supplementary Fig. 3
). Eight SNPs in the region, including six that we imputed or genotyped, were either non-synonymous or could disrupt regulatory sequences (Supplementary Methods
) and are thus functional candidates (Supplementary Table 6
). Of particular interest was rs1265054, a non-synonymous SNP located in an exonic splicing enhancer motif predicted to disrupt serine/arginine-rich protein binding and normal splicing. Upon genotyping rs1265054, we found it is in complete LD with rs6457327 (r2
=1.0). Future studies are needed to assess the potential relevance of STG
splice variants as well as this candidate SNP to FL risk.
Our initial GWAS were limited in power, due to the relatively small sample sets included in the genome-wide genotyping phase, particularly for the CLL/SLL subtype. Though we found no statistically significant associations for CLL/SLL, three of seven SNPs highly associated with CLL in a recent GWAS12
were ranked among the top 0.3% of SNPs associated with CLL/SLL. Specifically, rs735665 and rs13397985 in SP140
and rs872071 in IRF4
ranked 128, 503 and 1395, respectively, suggesting that these associations were detectable even with our modestly sized pooled study.
In summary, we have identified a novel FL risk locus on chromosome band 6p21.33 near PSORS1 with a combined allelic p-value=4.7×10−11. Although STG may be a plausible candidate FL susceptibility gene, we cannot exclude a potential role for other genes in this region. Further studies are required to identify the causal variant(s), evaluate whether common risk alleles exist between FL and psoriasis, and to fully dissect the association between the PSORS1 locus and FL pathogenesis.