|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: RPG CFS JJS AB-W. Performed the experiments: JMS SL LC BRB JMC RDG. Analyzed the data: JMS DD PMB. Contributed reagents/materials/analysis tools: DD SL BRB JMC JJS AB-W. Wrote the paper: JMS AB-W.
Non-Hodgkin lymphomas (NHL) are a heterogeneous group of solid tumours of lymphoid cell origin. Three important aspects of lymphocyte development include immunity and inflammation, DNA repair, and programmed cell death. We have used a previously established case-control study of NHL to ask whether genetic variation in genes involved in these three important processes influences risk of this cancer. 118 genes in these three categories were tagged with single nucleotide polymorphisms (SNPs), which were tested for association with NHL and its subtypes. The main analysis used logistic regression (additive model) to estimate odds ratios in European-ancestry cases and controls. 599 SNPs and 1116 samples (569 cases and 547 controls) passed quality control measures and were included in analyses. Following multiple-testing correction, one SNP in MSH3, a mismatch repair gene, showed an association with diffuse large B-cell lymphoma (OR: 1.91; 95% CI: 1.41–2.59; uncorrected p=0.00003; corrected p=0.010). This association was not replicated in an independent European-ancestry sample set of 251 diffuse large B-cell lymphoma cases and 737 controls, indicating this result was likely a false positive. It is likely that moderate sample size, inter-subtype and other genetic heterogeneity, and small true effect sizes account for the lack of replicable findings.
Non-Hodgkin lymphoma (NHL) is a collection of malignancies of lymphocyte origin. In Western countries, 85% of NHLs have a B-cell origin. NHL subtypes vary in prognosis, treatment options and outcome. Diffuse large B-cell lymphoma (DLBCL) patients with different molecular or genetic abnormalities can have diverse presentation and outcomes. Risk of developing NHL can be influenced by both environmental and genetic factors that affect the survival of lymphocytes.
Lymphocyte development is a complex process, with checkpoints in place to ensure that the cells whose function is to quickly and effectively protect the host from a variety of offences, will also withhold such an assault on host cells. Cell growth and cell death need to be regulated so that the number of lymphocytes is controlled in such a way that they are sufficient to fight infections, but not so numerous that they are a burden to maintain. Three important aspects of this control are: 1) immunity and inflammation to respond to stimuli that cause their activation and rapid cell cycle division; 2) DNA repair to counteract errors from cell division or lymphocyte receptor gene rearrangement; and 3) cell death to remove lymphocytes that are not able to meet cell cycle checkpoints and/or reduce autoimmunity.
Previous work by several research groups has identified genetic variants associated with NHL in genes related to B-cell survival , , DNA repair  and immunity and inflammation–. Collectively, genetic variants in these types of genes are likely to play a role in susceptibility to NHL. To survey for genetic factors associated with NHL in genes involved in immunity and inflammation, DNA repair or cell death, we selected 118 genes (listed in Table S1 in File S1) related to these biological processes, tagged them with SNPs and tested them for association with NHL in 569 cases and 547 controls. In addition, we selected 39 SNPs that had previously been associated with NHL in the literature, and tested them for replication in our study. After correction for multiple testing, we found evidence that a SNP in MSH3, a gene that has never before been implicated in NHL, may affect susceptibility to DLBCL; however, this association did not replicate in an independent NHL population.
The samples and genes tested in this study were part of a 1536-SNP Illumina GoldenGate panel that included SNPs from candidate genes related to other pathways and hypotheses . Details of the population, samples and methodology have been previously described .
All new NHL cases in the Greater Vancouver Regional District and Greater Victoria (Capital Regional District), British Columbia, from March 2000 to February 2004 were invited to participate. Cases aged 20 to 79 were included. Patients with prior transplant or HIV-positivity were excluded. Population controls were frequency matched by age (within 5-year groups), sex and area of residence. Family history of cancer was based on subject-reported data. Of 821 cases and 848 controls were available for this study, 797 cases and 790 controls had sufficient DNA for genotyping. The study was approved by the joint University of British Columbia/British Columbia Cancer Agency Research Ethics Board; all participants gave written informed consent.
DNA was extracted from whole blood (407 samples), lymphocytes isolated from blood (782 samples), mouthwash (24 samples), or saliva (48 samples) as previously described . 326/1587 samples, referred to as ‘WGA samples’, had low DNA yields; their DNA was amplified by whole genome amplification using the RepliG kit (QIAGEN, Mississauga, ON, Canada) .
The 118 genes selected for this study (Table 1) were based on a review of the biological literature. For each gene, publicly available data from HapMap phase II was imported into Haploview  for tagSNP selection using Tagger at r2=0.8. TagSNP selection was restricted to SNPs with minor allele frequency (MAF) >5%. In addition, 39 specific SNPs previously reported as associated with NHL, autoimmune disease or cancer were included to test for replication of these associations in our study. These ‘replication’ SNPs are listed in Table S2 in File S1. 51 ancestry-informative markers (AIMs) selected from Halder et al.  were also included in the assay. Genotyping was done using the Golden Gate system (Illumina, San Diego, CA), at The Centre for Applied Genomics, the Hospital for Sick Children in Toronto, Canada; as described previously .
Quality control (Q/C) was conducted using Genome Studio version 2009.1 (Illumina, San Diego, CA) and systems and databases developed in the laboratory of DD . Genotypes derived from WGA DNA and genomic DNA were subjected to Q/C separately. 1411 samples (717 cases and 694 controls) passed Q/C (Table 2); 1116/1411 samples (569 cases and 547 controls) were of European ancestry and subsequently included in statistical analysis . AIMS analysis in this study has been previously described , and supported analysis of the European-ancestry samples as one group.
Of 708 SNPs selected for genotyping of variants in genes related to lymphocyte development, 109 were excluded at the genotype Q/C stage (32 SNPs were rejected by the genotyping centre upon initial inspection, 14 for low GenTrain scores, 26 for being potential copy number variants, 12 for being monoallelic, 8 for having a call rate <0.95, 15 for having any error between duplicate genotypes, and 2 for deviating significantly from Hardy-Weinberg equilibrium [HWE]). An additional 160 SNPs failed Q/C only in WGA samples (8 upon initial inspection by the genotyping centre, 49 for low GenTrain score, 64 for call rate <0.95, 38 SNPs that had discrepant genotypes between WGA samples and pre-WGA matched DNA, and 1 SNP for being out of HWE), and 4 SNPs failed Q/C only in mouthwash or saliva samples. This left 599 SNPs (85%), listed in Table S3 in File S1, for analysis in all non-WGA samples and 439 SNPs in both blood and WGA samples.
Statistical analyses were conducted in SVS Suite 7 (Golden Helix, Bozeman, MT). Logistic regression (additive model) was fit for diffuse large B-cell lymphoma (DLBCL), follicular lymphoma (FL), marginal zone lymphoma (MZL), all B-cell NHLs and all T-cell NHLs. Other NHL subtypes were not individually tested, as sample numbers were insufficient. In all subtype analyses, selected cases were compared to all controls. The analysis was restricted to European-ancestry samples, with other ethnicities (Asian, south-east Asian and “other”) only tested when SNPs showed association in European-ancestry samples, corresponding to 148 DLBCL, 165 FL, 55 MZL, 523 B-cell NHL, 45 T-cell NHL and 547 control samples. This corresponded to a minimum detectable odds ratio of 1.54 for DLBCL, 1.51 for FL, 1.88 for MZL, 1.33 for B-cell NHL and 1.99 for T-cell NHL. For each SNP, p-values were calculated for the model with the SNP of interest vs. the basic model (which accounted for 5-year age groups, sex, and region). For only the SNPs that showed a statistically significant association, to find the model with the best fit we then tested dominant and recessive models in genotypic tests using the chi-squared test, as well as a recessive model by logistic regression with the adjustments listed above (i.e. age groups, sex and region). SNPs that showed an association were also tested for interaction with sex by comparing a model including the SNP, age group, sex and region to a model that also included the SNP*sex interaction. In genes that contained multiple SNPs with an association, the SNPs that showed an association were tested for interaction by comparing a model including that included the two SNPs, age group, sex and region vs. a model with the addition of the SNP*SNP interaction. In addition, for genes with an association, haplotype analysis was conducted in SVS Suite 7.
To correct for multiple testing, we have used a two-tiered approach, as previously described . The Benjamini-Hochberg procedure , implemented in R version 2.11.1, was applied to control the false-discovery rate (FDR) for SNPs within each gene, giving a corrected p-value denoted as pG. The smallest adjusted p-value for each gene was taken to represent the gene, and FDR was applied again across the genes in each of the three hypotheses (i.e. gene categories) tested (cell death, DNA repair and immunity and inflammation). This second corrected p-value was denoted pH. Adjusted p-values <0.05 were considered statistically significant. No multiple-testing correction was applied for the few interaction or haplotype tests.
Since genes involved in mismatch repair pathways have been shown to be important for colorectal cancer risk, we tested whether rs33003 in MSH3 were associated with a family history of colorectal cancer. Colorectal cancer in one or more first-degree relatives of the NHL cases and controls was coded as a true/false “family history of colorectal cancer” variable, and was used in logistic regression analysis in European-ancestry samples, adjusting for sex, region and 5-year age groups. 28/569 cases and 33/547 controls of European-ancestry had a family history of colorectal cancer.
The association of rs33003 with DLBCL was tested in a previously described independent population from the San Francisco Bay Area . Briefly, cases were identified through the Northern California Cancer Center between 2001 and 2005. All were residents of the San Francisco Bay Area, 20–84 years old, and provided informed consent. For this analysis, we used genotypes imputed by BEAGLE v.3.3  for 737 controls and 251 DLBCL cases that self-reported as “non-Hispanic white” and also clustered with Caucasian samples by principal component analysis. The imputation yielded 391 samples of GG genotype, 417 samples of GA genotype, 103 samples of AA genotype and 77 samples with unknown genotype. A logistic regression model under the additive model, with correction for age and sex was used to estimate odds ratios.
Table S4 in File S1 lists all SNPs with p<0.05 (before any multiple testing correction). Table 3 lists the 59 SNPs with pG<0.05. Of note, none of the 39 SNPs selected to replicate previously reported associations were associated with lymphoma in our population. Only one SNP showed an association that was significant after multiple testing correction both at the individual gene and multi-gene (hypothesis) level. rs33003, located in MSH3, was significantly associated with DLBCL (OR per allele: 1.91 [95% CI: 1.41–2.59]; pG=0.0002; pH=0.0103). It is a common SNP, with MAF 0.32. We found the recessive model best fits the inheritance mode of rs33003 (Table S5 in File S1). Many SNPs in the same region had low p-values in the analysis with DLBCL (Figure 1). The second most strongly associated SNP in MSH3, rs181747, is in moderate linkage disequilibrium with rs33003, with r2=0.55 in HapMap data and r2=0.65 in our data set. There is evidence for an interaction between these two SNPs (p=0.0014). However, no haplotype of SNPs in this region was more strongly associated with DLBCL than either of these two SNPs alone. There was no statistically-significant association of rs33003 or rs181747 with DLBCL in 21 cases and 69 controls of Asian descent (OR: 0.68 [95% CI: 0.28–1.64], pG=1.00; and OR: 0.94 [95% CI: 0.45–1.93], pG=1.00, respectively) or 6 cases and 31 controls of South-Asian ancestry descent (OR: 1.86 [95% CI: 0.45–7.61], pG=0.5483; and OR: 2.43 [95% CI: 0.62–9.50], pG=0.4849, respectively); the number of samples in these groups is too small to make a statement about associations in these groups. There was also no evidence for interaction between rs33003 and rs181747 in Asian ancestry samples (p=0.1957) or South-Asian ancestry samples (p=0.9873).
Testing rs33003 for association with increased risk of family history of colorectal cancer showed an association under the recessive model (OR: 0.20 [95% CI: 0.03–1.43], p=0.034) but not under the additive or dominant models. The 95% confidence interval overlaps 1, however, indicating this result could be a chance finding. Furthermore, adjusting the DLBCL susceptibility analysis by family history of colorectal cancer (in addition to 5-year age group, sex, and region) did not change the OR or p-values of the association of rs33003 with DLBCL susceptibility. We find no evidence that family history of colorectal cancer influences the association between rs33003 and susceptibility to DLBCL.
The association of rs33003 with DLBCL did not replicate in the San Francisco sample set (OR 1.03 [95% CI: 0.83–1.29], p=0.774). The minor allele frequencies of rs33003 are similar in the original population (MAF=0.32) and the San Francisco set (MAF=0.34). Furthermore, the r2 value between rs33003 and rs181747 is similar in the two populations (r2=0.65 in the original population and r2=0.67 in the San Francisco population). This indicates that the failure to replicate is unlikely to be due to population-specific differences in minor allele frequencies or LD structure in that area of the genome.
One other SNP, rs12609547, in RELB, was mildly associated with marginal zone lymphoma (OR: 2.03 [95% CI: 1.34–3.07], pG=0.0015). This association was not significant, however, after multiple testing correction at the hypothesis level (pH=0.0570).
After multiple testing correction within genes, there was evidence for associations of NHL subtypes with SNPs in two genes: RELB with MZL and MSH3 with DLBCL. Only the MSH3 association, however, was significant after the additional correction for multiple testing between genes. This association, however, did not replicate in another North American population , indicating that it was likely a type I error.
MSH3 is involved in DNA mismatch repair (MMR), which corrects mismatched or unmatched bases and small insertion/deletion loops that result from DNA replication before cell division or from DNA repair processes . The MMR pathway is an important repair mechanism in normal lymphocyte development as evidenced by mouse models and human patients deficient in this pathway . Studies of MMR deficiency and MMR gene deregulation in lymphomas have also illustrated the potential role of this pathway in NHL–.
Because of the MMR pathway’s established role in hereditary non-polyposis colorectal cancer (HNPCC), we tested whether rs33003 was associated with a family history of colorectal cancer in first degree relatives. Adjusting the DLBCL susceptibility analysis by family history of colorectal cancer in addition to 5-year age group, sex, and region did not change the analysis results, indicating that family history of colorectal cancer is not a confounder for susceptibility to DLBCL. Furthermore, we did not find that rs33003 was associated with a family history of colorectal cancer. This is not entirely surprising, as colorectal cancer is not associated with lymphoma , although mismatch repair cancer syndrome is characterized in part by a combination of colorectal polyposis,  and early-onset hematologic cancers , .
DLBCL can be subdivided into at least three subgroups using molecular signatures . It is therefore possible that the MSH3 association is confined to patients with tumours belonging to specific DLBCL subgroups. We do not, however, have molecular signature data for the tumours of the DLBCL patients included in this study. It is also possible that there are true associations with NHL susceptibility that we are not able to detect in this study. This could be due to low sample sizes for some subtypes of NHL in our study, or perhaps population-specific effects. This could explain our inability to replicate candidate gene (Table 1) associations of SNPs in IRF4 with FL , or our observation of weak associations (i.e. a SNP with p<0.05 but that does not pass multiple testing correction) of SNPs in BID, APAF1 and CASP10 with NHL . We were also unable to replicate other associations for SNPs in the “replication” category, listed in Table S2 in File S1. Furthermore, HapMap coverage may not have been adequately deep to represent causal variants present in some genes we assayed, making our tagSNP approach vulnerable to false negative results. As in most other lymphoma studies–, multiple testing correction was not done for the number of subtypes tested as the subtypes are considered separate disease entities, with different presentation, possible etiology and hypotheses. Finally, any association reported here could be an association with survival as opposed to susceptibility, as patients who have less aggressive disease are more likely to have time to participate in the study and provide a DNA sample. This is not likely, however, given the low percentage of cases who died prior to contact (10.5% in the British Columbia study  and 14.2% in the San Francisco set ).
In summary, we found no replicated associations in the genes studied related to immunity and inflammation, DNA repair and programmed cell death.
Table S1. Candidate genes chosen based on biological interest. Table S2. SNPs tested for replication. Table S3. SNPs that passed quality control. Table S4. Logistic regression analysis results for SNPs with pG<0.05 before multiple testing correction. Table S5. The best model for rs33003 in European-ancestry DLBCL vs. controls is the recessive model.
For their help with previous work on genotype data processing and quality control, the authors would like to thank Dr. Tara Paton of the Centre for Applied Genomics at The Hospital for Sick Children, Toronto, Canada; and Dr. Jinko Graham and Conghui Qu of Simon Fraser University. We also thank the study staff - Agnes Lai, Carmen Ng, Kuldip Bagga, Agnes Bauzon, Betty Hall, Lina Hsu, Pat Ostrow, Lynne Tse and Anthony Tung, the computer support of Dr. Tim Lee and Zenaida Abanto, and the technical assistance of Rozmin Janoo-Gilani. We also would like to thank the continued support of the members of the BC Cancer Agency Lymphoma Tumour Group. Finally, we thank all the participants of the study for making this research possible.
This work was supported by grants from the Canadian Institutes for Health Research (CIHR) and the Canadian Cancer Society. DD holds a Tier II Canada Research Chair (Genetic Epidemiology of Common Complex Diseases). AB-W is a Senior Scholar, and DD and JG are Scholars of the Michael Smith Foundation for Health Research. For the UCSF study, CFS and PMB have support from the National Institutes of Health, National Cancer Institute (R01CA87014, R03CA143947, R03CA150037, R01CA104682, RO1CA122663, RO1CA154643). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.