|Home | About | Journals | Submit | Contact Us | Français|
Genomewide association studies of colorectal cancer (CRC) have identified genetic variants that reproducibly associate with CRC. Associations of twelve SNPs at 8q24, 9p24, and 18q21 (SMAD7) and CRC were investigated in a three center collaborative study including two UK case-control cohorts (Sheffield and Leeds) and an American case-control study of CRC cases from high-risk Utah pedigrees.
Our combined resource included 1,092 CRC case subjects and 1,060 age- and sex-matched controls. Meta-statistics and Monte Carlo significance testing using Genie software provided a valid combined analysis of our mixed independent and related case-control resource. We also evaluated whether these associations differed by sex, age at diagnosis, family history, or tumor site.
At 8q24 we observed two independent significant associations at SNPs located in two different risk regions of 8q24: rs6983267 in region 3 (ptrend=0.01; per allele OR=1.17, 95%CI 1.03, 1.32) and rs10090154 in region 5 (ptrend=0.05; per allele OR=1.24, 95%CI 1.01, 1.51). At 18q21 associations were observed in distal colon tumors, but not in proximal or rectal cancers: rs4939827 (ptrend=0.007; per allele OR=0.77, 95%CI 0.64, 0.93; case-case pdiff=0.03) and rs12953717 (ptrend=0.01; per allele OR=1.27, 95%CI 1.06, 1.52). We were unable to detect any associations at 9p24 with CRC.
Our investigation confirms that variants across multiple risk regions of 8q24 are associated with CRC, and that associations at 18q21 differ by tumor site.
Genomewide association (GWA) studies have recently identified common variants at chromosome 8q24, 9p24 and 18q21 that are associated with CRC (1–5). At the 8q24 locus, multiple different independent regions have been reported to be associated with several common cancers, including prostate, breast, colorectal and ovarian cancers (6–8). Most recently, five separate 8q24 cancer risk regions have been suggested (8). Of these five regions, the CRC GWA studies consistently indicate variants in region 3 as associated with CRC (1, 2, 5), and these have been followed by replication in candidate SNP studies of variants rs6983267 and rs10505477 (6, 8–12). One study included a broader set of 8q24 variants in their replication efforts and additionally suggested association of a region 5 variant, rs10090154, with CRC (6). A single GWA study proposed a risk locus at 9p24 (1), which has been replicated in a single candidate SNP study (9). Two GWA studies indentified a risk locus for CRC at 18q21 (which contains the gene SMAD7, also a functional candidate gene for CRC) (3, 5). It has also been observed that variants at 18q21 may differ in frequency by CRC tumor site (5).
In this three-center collaborative investigation, we studied twelve previously identified SNPs at chromosomes 8q24, 9p24, and 18q21 for association with CRC, and evaluated if these differed by sex, age at diagnosis, family history, or tumor site. Our total resource included two UK case-control cohorts (Sheffield and Leeds) and a case-control study of cases from high-risk Utah cancer pedigrees. Single SNP and multi-locus associations were carried out. Meta-association statistics and Monte Carlo significance testing were used to provide a valid combined analysis of the mixture of independent and related individuals.
In Sheffield, CRC cases were identified from subjects resident in Sheffield, UK and undergoing surgery for a primary colorectal tumor at the Royal Hallamshire or Northern General Hospitals, Sheffield between March 2001 and June 2005. Control subjects were identified from Sheffield General Practice registers and recruited between October 2001 and December 2005. In Leeds, CRC cases were identified from examination of pathology records at the Leeds Teaching Hospitals NHS Trust and age- and sex-matched controls were identified from the records of general practitioners of cases as described previously (13–15). In Utah, CRC cases were selected from 244 high-risk cancer pedigrees; one case per pedigree from 156 pedigrees (156 independent CRC cases), and two or more cases from 88 pedigrees (282 related CRC cases). A high-risk pedigree was defined as one containing a statistical excess of individuals with cancer, as assessed using the Utah Population DataBase (UPDB). The UPDB is a genealogical resource of approximately 2.3 million individuals that is record-linked to data from Utah Cancer Registry (UCR). Utah controls, a convenience sample not specifically ascertained for this study, were selected to be cancer-free, and were matched by sex- and 5-year birth cohort to the prevalent cases. As age of Utah controls represents their age at ascertainment for prior studies, age at diagnosis for cases and age at selection for controls do not necessarily correspond; however, cases and controls were well-matched for age based on birth cohort (see footnote 2 of Table 1). Study subjects in all three centers were of North European descent. The total resource included 1,092 cases and 1,060 controls that were genotyped for twelve variants at 8q24, 9p24, and 18q21.
Proximal colon site was defined as tumors of the cecum through transverse colon. Distal colon was defined as tumors of the splenic flexure, descending, and sigmoid colon. Rectal cancer was defined as tumors of the rectosigmoid junction and rectum.
Genotyping was carried out in 384-well plates using the Applied Biosystems SNPlex™ system which allows multiplex analysis of up to 48 SNPs (www.appliedbiosystems.com). At least 5% of samples were duplicated in the plates to assess the reproducibility of the genotype calls. For each SNP, duplicate concordance, call rate and test for compliance with Hardy-Weinberg equilibrium (HWE) in controls are shown in online Supplemental Table 1.
The analyses were conducted using Genie 2.6.2, a freely available software package (http://www-genepi.med.utah.edu/Genie/index.html). Genie provides valid genetic association testing in cases and controls that include related individuals using Monte Carlo significance testing (16, 17). Genotypes for all SNP markers were tested for deviation from Hardy-Weinberg equilibrium in controls. Pairwise linkage disequilibrium (LD) was assessed between SNP markers within each region using r2 and D′. Meta statistics (chi-square test of trend, odds ratios (OR), and 95% confidence intervals (CI)) for SNP markers and CRC risk controlling for study were calculated using Cochran-Mantel Haenszel (CMH) techniques available in Genie (14). We repeated our analyses also controlling for sex and early/late age of diagnosis. These secondary results did not differ substantively and are therefore not shown. Stratified analyses by sex, age at diagnosis, family history, combined age at diagnosis and family history, and tumor site were performed. Cochran’s Q test was conducted to assess homogeneity of effect size across studies. Statistical heterogeneity was considered present if p<0.05. All p-values were empirically derived based on 10,000 simulations in the Genie null distribution as described (16, 17). Haplotypes were estimated based on an expectation-maximization (EM) algorithm, and the hapConstructor module of Genie v.2.6.2 was used to comprehensively analyze multi-locus haplotypes within the 8q24, 9p24, and 18q21 regions (18).
Descriptions of the study populations in the combined resource are shown in Table 1. Cases from Utah had a higher proportion of first-degree relatives with CRC than cases in the UK cohorts (panova <0.001), and a higher proportion of early-onset cases (age 59 and younger), as would be expected for CRC cases selected from high-risk cancer pedigrees (panova=0.002). Utah also had a lower proportion of rectal cancer (panova <0.001). This is also as expected because the CRC high-risk pedigrees were ascertained primarily for excess of colon cancers. Table 2 describes the twelve SNPs studied. All SNPs were in Hardy-Weinberg equilibrium. The number of cases and controls, of each SNP genotype, by individual study center is provided in Supplemental Table 2 (online).
The meta association results for each SNP with CRC are shown in Table 3. Based on Cochran’s Q test, there were no results that exhibited significant heterogeneity across studies. In 8q24, we observed two nominally significant results for SNPs rs6983267 (ptrend=0.01) and rs10090154 (ptrend=0.05), associated with modest per-allele increased risks. For rs6983267 the high-risk G allele was the major allele in our study population; however, allele T was considered the reference allele to be consistent with previous studies. SNP rs6983267 resides in region 3 of the 8q24 locus (8). In our controls, the LD between rs6983267 and the other region 3 SNPs studied: rs10505477, rs10808556 and rs7013278, was r2 of 0.94, 0.65, and 0.57 (|D′ of 0.99, 0.99, and 1.00), respectively. However, none of the other region 3 SNPs were significant. SNP rs10090154 resides in region 5 of 8q24 (8). One other region 5 SNP with an r2 of 0.89 (|D′| of 0.98) with rs10090154 was studied (rs1447295), but was not significantly associated with CRC (p=0.10). We inspected the study-specific results for rs6983267 and rs10090154. Even though there was no statistically significant evidence for heterogeneity across the three studies for these two SNPs (both phom>0.30), we observed that the association signal for both was primarily driven by the two UK cohorts (Supplemental Table 3, online). Stratification by family history (Supplemental Table 4, online), age at diagnosis, a combination of family history and age at diagnosis (Supplemental Table 5, online), or tumor site, variables that differed between the UK and US sites, did not explain this observation. Potential sources of heterogeneity between the two UK sites and the US site include environmental factors, particularly alcohol consumption and smoking; and phenotypic heterogeneity, particularly that undiscovered high-risk alleles could exist in the Utah pedigrees. However, these tests for heterogeneity could not be directly assessed as the data were not available for this study.
To investigate multi-locus associations at 8q24, we applied a haplotype-mining method (18). Using data for all seven 8q24 SNPs, the method identified a 2-SNP haplotype (G-T) across rs6983267 and rs10090154 variants as the most significantly associated with CRC when compared to a referent haplotype of T-C (p=0.0004). It is also of interest that, in contrast to single-SNP analyses of rs6983267 and rs10090154, the two-locus model (carriage of G-T) association signal (ORmeta=2.04, 95%CI 1.42, 3.34) was more similar across the US and UK cohorts: ORSheffield=2.65 (1.34, 5.24); ORLeeds=2.55 (0.98, 6.63) and ORUtah=1.62 (0.80, 3.31). These two markers are not in LD (r2<0.01 and |D′|=0.13) and are considered to reside in two separate 8q24 regions (8). We simultaneously analyzed both SNPs, each modeled as log-additive. The significance for both SNPs, when included in the same model, was almost unchanged from their single SNP tests (Table 3): rs6983267 (p=0.02 compared to p=0.01 in the single SNP test); rs10090154 (p=0.065 compared to p=0.05 in the single SNP test). We further found no significant evidence for multiplicative interaction (p=0.14), although our power to detect interaction was lower than for main effects. Hence, although rs10090154 did not reach significance at the 0.05 level in the simultaneous analysis (p=0.065), the largely unchanged effects when rs6983267 and rs10090154 were considered together or separately, and the distinctly greater risk and significance of the two-locus model are consistent with the conclusions of others that these two SNPs may be independent risk loci for CRC (6). Independent roles for SNPs in regions 3 and 5 have also been suggested for prostate cancer associations (8).
There were no significant associations in 9p24 and 18q21 SNPs and overall risk of CRC. Associations at 9p24 did not differ when the resource was stratified by sex, age, family history, age and family history, or tumor site. Associations at 18q21 and overall risk of CRC were improved in the stratification by early-onset and family history (Supplemental Table 5 online), and were best observed in the stratification by tumor site (Table 4). Significant associations were observed for distal colon tumors compared to controls for two SNPs, but not for proximal colon and rectal tumors (Table 4). In a distal CRC-control comparison, the minor C allele at rs4939827 was associated with a decreased risk of distal colon tumors (ptrend=0.007) and the minor T allele at rs12953717 with increased risk (ptrend=0.01). The LD between rs4939827 and rs12953717 is reasonably high (r2=0.59, |D′|=0.92), suggesting redundancy of information between the SNPs that was confirmed in a simultaneous analysis of both SNPs. For rs4939827, there was a statistically significant difference between distal and other CRC sites in a case-case comparison (p=0.03). The finding for distal CRC was consistently observed across the UK and Utah study sites (Supplemental Table 3). Findings were less significant in each study center considered alone, thus illustrating the benefit of increased power to detect associations from larger sample sizes inherent in a multi-study collaboration. Haplotype-mining did not identify any interesting two-locus associations or a three-locus association with distal CRC.
Recent GWA studies have suggested small effect susceptibility loci for CRC at 8q24, 9p24 and 18q21 (1–5). In our three-center study we were able to replicate associations at 8q24 and 18q21 (SMAD7), but not 9p24. For 8q24, previous replications have been most abundant for SNPs in region 3 with CRC (1, 2, 5); however, region 5 has also been implicated (6). In region 3, variants rs6983267 and rs10505477 have predominantly been studied, with estimated per-allele ORs ranging from 1.17 to 1.27 (1, 2, 5, 6, 8, 11). The magnitude of our estimate of effect size for rs6983267 is consistent with these previous values (OR=1.17). In region 5, SNP rs10090154 has previously been suggested (6) with an estimated OR of 1.14. We were also able to replicate this finding with similar estimate for effect size (OR=1.24). Multi-locus analyses were consistent with two independent risk loci at 8q24 for CRC. Of note in our replications of both 8q24 SNPs is that, although no statistically significant heterogeneity was observed for either SNP, we did observe notable differences in effect size and association significance between the results for the two UK sites (Sheffield and Leeds) and the American site (Utah) (Supplemental Table 3, online). Estimates of the ORs in a meta analysis consisting of only the two UK cohorts yielded substantially larger per allele risks (1.33 and 1.43 for rs6983267 and rs10505477. respectively, Supplemental Table 3 online). Differences in familial and early onset disease, as observed by Poynter et. al (9), did not explain this observation (Supplemental Table 5 online). Potential differences in phenotype origin or environmental exposure are plausible differences between the UK and US studies. Utah cases from high-risk pedigrees could be influenced by yet undiscovered high-risk alleles and therefore less influenced by low risk 8q24 alleles; although it is pertinent to note that CRC cases in the pedigrees were screened for HNPCC variants and Amsterdam-type criteria, and none were found to be responsible for the clustering. Another plausible explanation is that there may be an important gene-environment interaction. It is certainly likely that the UK subjects differ in their exposure to cigarette smoke and alcohol consumption from individuals in Utah, many of whom abstain from smoking and drinking alcoholic beverages. This observation is anecdotal here, but provides a hypothesis worthy of further exploration.
We were unable to confirm associations of variants rs719725 and rs7857826 at 9p24 and CRC. Results from our meta analysis, and all stratification analyses, indicated no evidence for association. SNP rs719725 was originally identified in a GWA (1) and subsequently replicated in a candidate gene study (9); however, in the final internal replication of the GWA, rs719725 was unsuccessfully replicated (p=0.61). Although the overall seven-site meta was significant (p=0.023), 4 of the 7 individual sites were not significant, with three sites indicating ORs in the opposite direction. For small ORs we cannot definitively refute the 9p24 locus, but it is clear that the signal at this locus is less robust.
Our investigation of 18q21 variants is the first candidate SNP study to follow-up the GWA findings for this locus, including both colon and rectal samples (3, 5). We were not able to replicate findings in a general CRC case-control comparison, although when subset by tumor site, we were able to find significant association for distal colon cancer compared to controls, but not proximal colon or rectal cancers in tumor-site specific analyses. Allele C at rs4939827 indicated decreased risk for distal colon cancer compared to other sites (p=0.03). Heterogeneity by tumor site has been reported elsewhere (5), although Broderick, et al. observed no difference by site (3). While our results for distal colon cancer are in the same direction and of similar magnitude to those found by Tenesa, et al (5) for colon cancer, in a case-case comparison of rectal vs. colon, the current study findings are opposite those indicated in the Tenesa study (5). However, the discrepancies are in large part due to our lack of evidence in rectal cases and may also be due to Tenesa, et al. not stratifying by proximal and distal colon. Therefore, additional site-specific investigations are warranted for this locus. We investigated multi-locus associations, and similar to previous haplotype analyses findings (3), our results are consistent with a single risk allele at the 18q21 locus.
It should be noted that the increased power gained to detect association using familial cases (19) is accompanied by an over-estimate of the effect size as measured by the odds ratio for the general population. Tests of the null hypothesis (effect size or independence) remain valid with the combined populations; the Utah site contains predominantly familial cases and as such, while our significance values are valid, our meta OR estimates may be inflated.
In conclusion, our results from a multi-center, combined independent and related case-control resource provide replication of association of two regions at 8q24 and the locus at 18q21 with CRC. At 8q24 two regions of association were evident, with a possible suggestion for gene-environment interaction. At 18q21 our results provide confirmatory evidence of association with CRC that differs by tumor site.
The genotype data and analysis was supported by a National Institutes of Health grants CA123550 and CA98364 (to N.J.C). Research was supported by the Utah Cancer Registry, which is funded by contract N01-PC-35141 from the National Cancer Institute's SEER program with additional support from the Utah State Department of Health and the University of Utah. Partial support for all datasets within the Utah Population Database was provided by the University of Utah Huntsman Cancer Institute. Recruitment, data collection and genotyping in Sheffield was supported by Yorkshire Cancer Research grants to A.C. and Professor Mark Meuth. Data collection in Leeds was supported by Cancer Research UK Programme Award (C588/A4994) to D.T.B. The authors are grateful to Study Coordinators, Laboratory Specialist Kim Nguyen, and Computer Specialist Jathine Wong.