|Home | About | Journals | Submit | Contact Us | Français|
Present investigations suggest that approximately 30% of colorectal cancer (CRC) cases arise on the basis of inherited factors. We hypothesize that the majority of inherited factors are moderately penetrant genes, common in the population. We use an affected sibling pair approach to identify genetic regions that are coinherited by siblings with CRC. Individuals from families with at least two siblings diagnosed with colorectal adenocarcinoma or high grade dysplasia were enrolled. Known familial CRC syndromes were excluded. A genome-wide scan on 151 DNA samples from 70 kindreds was completed using deCODE's 1100 short tandem repeat marker set at an average 4 cM density. Fine mapping on a total of 184 DNAs from 83 kindreds was done in regions suggesting linkage. Linkage analysis was accomplished with MERLIN analysis package. Linkage analysis revealed three genetic regions with NPL LOD scores ≥ 2.0: Ch. 3q29, LOD 2.61 (p=0.0003); Ch. 4q31.3, LOD 2.13 (p=0.0009); and Ch. 7q31.31, LOD 3.08 (p=0.00008). Affected siblings with increased sharing at the 7q31 locus have an 3.8 year (±3.5) earlier age of CRC onset although this is not statistically significant (p=0.11). No significant linkage was found near genes causing known syndromes or, regions previously reported (8q24, 9q22, and 11q23). The chromosome 3q21-q24 region reported to be linked in CRC relative pairs, is supported by our study, albeit a minor peak (LOD 0.9, p=0.02). No known familial cancer genes reside in the 7q31 locus, thus the identified region may contain a novel susceptibility gene responsible for common familial CRC.
Colorectal cancer (CRC) is the second most common cause of cancer death and is one of the most familial malignancies 1. Approximately 10% of the U.S. population has at least one first-degree relative with colorectal cancer 2. Kindred and twin studies estimate that as much as 35% of the risk for colorectal adenocarcinoma is attributable to heredity 3,4. The known inherited colon cancer syndromes, however, account for less than 5% of CRC, leaving the genetic etiology unknown for the majority of familial CRC cases.
Association studies have reported a few low-penetrance genetic variants associated with colon cancer risk that could account for a yet to be defined proportion of the familial CRC cases 5,6. Affected-relative-pair studies have also reported genetic regions that are coinherited more often in first-degree relatives with CRC than those without. These include 9q22.33, 3q21-24, and 11q23 7-10. A continued combination of association and linkage approaches will be important to defining new genes that predispose to common familial CRC and will in turn allow the synthesis of more risk directed CRC prevention and mortality reduction efforts.
Here we report the results of a genome-wide linkage analysis using 83 kindreds where at least two siblings were diagnosed with colorectal cancer (invasive or high grade dysplasia). A set of 1100 highly-polymorphic short tandem repeat markers were used for genotyping (deCODE 4cM set) of the affected sibships. The results provide strong evidence for linkage of CRC to a locus at 7q31 and support linkage to the previously reported 3q21-q24 locus 9.
Subjects were enrolled in the Inheritance of Colon Cancer: A Sibling Pair Study following review and approval by the Institutional Review Boards of the eleven participating centers. Informed consent was obtained from all research participants. Kindreds were ascertained from Cancer Genetics Network members, population-based county-wide or state-wide cancer registries, clinical referral and direct-to-patient marketing. Each family had at least two individuals diagnosed with colorectal adenocarcinoma, or a polyp with high-grade dysplasia (also reported as carcinoma in situ, which is synonymous with the previous term) over the age of 20. Diagnosis was confirmed by pathology reports in most cases or cancer registry data. A total of 274 individuals were enrolled in the study, however 88 individuals were excluded from linkage analysis because a sibling was not enrolled (deceased or not interested) or they a known colon cancer syndrome. In the end, 186 DNAs constituted a complete sibship and were used for linkage analysis.
Families with known colon cancer syndromes including familial adenomatous polyposis, Juvenile Polyposis, Puetz-Jeghers, and Cowdens syndrome were excluded by medical record review. A small possibility exists that families with attenuated or MYH polyposis could enter the study, but given the rarity of these conditions and the chance that at least one sibling would have a clinically recognizable condition (≥ 10 adenomas) resulting in study exclusion, their inclusion would be very minor effect. When tumor blocks were available on one or more colon cancer cases, microsatellite instability (MSI) analysis was performed as previously described 11,12 to rule out Hereditary nonpolyposis colorectal cancer (HNPCC of Lynch Syndrome). Genomic DNA from cases that demonstrated MSI-high was sequenced for mutations in MLH1 and MSH2. Five mutation positive families were excluded from the analysis. Individuals who were confirmed to have a known syndrome were referred to a genetic counselor and clinical genetic testing was offered.
A genome scan was performed on 138 affected and 13 unaffected siblings composing 70 kindreds (64 complete affected sibships of pairs (60) or trios (4)) using deCODE's 1100 short tandem repeat (STR) marker set at an average 4 cM density 13. Fine mapping included 40 STR markers run on 184 individuals from 83 kindreds (163 affected and 21 unaffected). This contained 75 complete affected sibships of pairs (71) or trios (4). Unaffected siblings were available on 16 kindreds, however there was insufficient data for discordant analysis (32 discordant relationships; 5 concordant unaffected relationships).
Genome-wide nonparametric linkage analysis was performed with Merlin 14 using the Kong and Cox linear model 15, and Whittemore and Halpern's Sall sharing statistic 16. Parametric dominant model was run with a disease allele frequency of 0.001 and an absolute disease risk of 0.025 for non-carriers and 0.5 for carriers. Parametric recessive model was run with an allele frequency of 0.05 and disease risk of 0.025 for noncarriers and heterozygotes, and 0.05 for homozygous carriers. Individuals were coded as affected if they had a confirmed diagnosis of invasive CRC or a colonic adenoma with high grade dysplasia or carcinoma in situ. Linkage analysis was run using two sources of allele frequencies for each marker; both are represented in Figure 1. The first was 157,810 genotypes generated from the complete data set (sibpair allele frequencies). The second was a set of 8000 genotypes from 13 CEPH parents who were part of HapMap. The CEPH DNAs were collected in 1980 from U.S. residents with northern and western European ancestry by the Centre d'Etude du Polymorphisme Humain (CEPH) and have been used as a reference panel for establishing genetic maps of the human genome including the HapMap, a catalog of common genetic variants that occur in human beings.
To assess the association of allele sharing with variation in quantitative pathology traits we used the Variance Components module of Merlin which regresses the pairwise covariation of trait values against the identical by descent (IBD) sharing for each marker. Age at diagnosis, tumor grade (grade 1 is well differentiated, grade 2 is moderately differentiated, grade 3 is poorly differentiated), tumor location (right-sided cancers are defined as proximal to the splenic flexure), tumor size at largest dimension, lymph node involvement, and cancer stage (0-4) were modeled as continuous traits. MSI status was available on too few cases to use this as a covariate.
The mean age of colorectal cancer diagnosis was 58.3 (std. dev. ± 12.5) with a range of 28 to 91 years and a median of 60 years. Carcinoma in situ or high-grade dysplasia was the diagnosis in 14% of the cases, lymph node involvement was found in 30% of the cancers, and 12% had metastasized to other tissues. Cancer staging was available on 76% of the cases. Of these, 33% were Stage 1, 30% were Stage 2, 27% were Stage 3 and 10% were Stage 4. The racial make-up of the participants was 87% white, 4% black, 1% Native American, and 8% other. The ratio of affected males to affected females was nearly equal (82:81). MSI analysis was performed on at least one colorectal tumor from 46 of the 83 families. Of the 46 families, 39 were microsatellite stable or low, four had microsatellite unstable cancers, and three had both stable and unstable cancers. Germline mutations in MLH1 and MSH2 were not identified in five of the seven MSI unstable cases. Two did not undergo sequencing.
A genome-wide scan on 151 samples from 70 kindreds was completed using deCODE's 1100 STR marker set at an average 4 cM density. All DNA samples successfully amplified, however two samples had low yield. Excluding these two samples, 91.9% of the genotypes were successfully obtained. Fine mapping on a total of 184 DNAs (163 affected and 21 unaffected) from 83 kindreds was done using 40 additional deCODE markers at peaks identified in the genome scan on chromosomal regions 4q24-4q26, 4q31-4q32, 7q11-7q22, and 7q31-7q33. In this 40 marker fine mapping run, 96.5% of the genotypes were successful.
Multipoint nonparametric linkage analysis identified three regions with NPL LOD scores greater than 2.0 on chromosomes 3q29 (tel), 4q31.3, and 7q31 (Figure 1 and Table 1). Of these, the 7q31.31 region spanned 13.6 cM (22.63 Mbp) with a maximum NPL LOD of 3.08 and a p=0.00008 (CEPH allele frequencies) or 2.01 and p=0.001 (study allele frequencies) at D7S643. The peak contains 96 RefSeq genes. No known familial cancer genes reside within this region, thus the identified region may contain a novel susceptibility gene responsible for common familial colorectal cancer. Two genes in this region are involved in cancer development 17: MET, a proto-oncogene and hepatocyte growth factor receptor involved in hepatocellular carcinoma and SMO, a G protein-coupled receptor involved in basal cell carcinoma. Candidate cancer genes in this region were also identified from a list of genes hypothetically involved in cancer based on similarity in Gene Ontology (GO term) and sequence properties to a set of 291 genes known to be involved cancer development 17,18. Seven of these genes were predicted to have greater than 80% probability of being involved in cancer development: PAX4, DOCK5, TFEC, FOXP2, NRF1, and IRF5. Other interesting candidate genes in this region are GPR37, G protein-coupled receptor 37 which is an endothelin receptor type B-like, and two WNT ligands involved in the APC/beta-catenin signaling pathway, WNT2 and WNT16.
The results of parametric analysis under dominant and recessive models using study allele frequencies are presented in the supplementary data (Figure S1). The parametric models do not fit the data very well, suggesting that the parameter values supplied are not very realistic. The recessive model identifies a novel peak with a LOD of 2.7 (p=0.0004) at D2S2358. The gene for epidermal growth factor receptor 4 (ERBB4) resides close to this peak and has been associated with a small percentage of colorectal cancers 19-21.
There were no major peaks near genes know to cause colon cancer syndromes (MLH1, MSH2, MSH6, PMS2, APC, MYH, SMAD4, BMPR1A, PTEN, STK11). However a minor peak is observed near MLH1 (62.7cM) at D3S1612 (60.05 cM) with a LOD of 1.25 (unadjusted p=0.008). This is only present when CEPH allele frequency is used, which may be an artifact or may represent a D3S1612 allele segregating with a common, mildly penetrant MLH1 variant predisposing siblings to developing colorectal cancer. Linkage analysis was also run excluding the 7 families with at least one microsatellite unstable cancer (supplementary data, Figures S2 and S3). The results of this analysis appear very similar to the full dataset and the peak near MLH1 is not eliminated (supplementary data, Figure S4).
The 8q24 region recently associated with colorectal cancer risk was not supported by this study 6,22,23. The chromosome 9q22.33 region previously reported to be linked in affected relative pairs with advanced adenomas and colon cancers, was also not confirmed by this study 7,9. Each of the markers in the 9q22.33 region gave negative LOD scores. However, the chromosome 3q21-q24 region reported to be linked in colorectal cancer relative pairs is supported by our study, albeit a minor peak at D3S1292 (LOD 0.9 p=0.02) 8.
Pathology features of the colorectal cancers from the affected sibling pairs were analyzed to determine if coinheritance at the 7q31 locus was associated with any particular trait. Pairwise covariation for age of CRC diagnosis, histologic grade (1-3), location of cancer (right vs. left sidedness), size of tumor, and lymph node involvement, and stage of CRC was regressed against the IBD sharing at this locus. Only age of diagnosis and grade of cancer contributed to the heritability, however, neither was significant (Table 2). Poorly differentiated tumors were more likely to be seen with increased sharing at 7q31. Of note is the trend toward younger age of onset of CRC with increased sharing at 7q31. Linear regression of age of diagnosis versus identity by descent (IBD) showed that the mean age of diagnosis decreases by 3.8 years as IBD goes from 0 to 1 alleles (std. err. ±3.5).
Using the affected sibling-pair approach to identify genetic regions involved in common familial colorectal, a major locus has been identified on chromosome 7q31. This region clearly stands out from the rest of the genome demonstrating linkage at multiple genetic markers over a 13.6 cM region when either study allele frequencies or CEPH allele frequencies are used in the analysis. A region on chromosome 7 with linkage (NPL LOD 1.85, p=0.032) is also reported by Kemp et al. when analysis was restricted to 38 families with colorectal cancer as the diagnosis 8. The reported peak spans approximately 80 to 110 Mbp, whereas our peak spans 107 to 129 Mbp on chromosome 7.
Interestingly, LOH of one or more markers in the 7q31.1-31.2 around marker D7S522 region has been reported in a series of 11 of 18 colon carcinoma cases, with the maximum LOH at D7S522 (115.9 Mbp), within the region identified in this study. The authors suggest that a tumor suppressor gene relevant to the development of epithelial cancers is present in this region 24. Certainly, the MET gene is in this region of LOH at 116.1 Mbp.
Additionally, the ERBB4 gene is near a peak identified through the recessive parametric model on 2q33.3. A functional polymorphism at -782G>T in this gene was recently reported as a risk factor in colorectal cancer (OR 2.21, 95% CI 1,22-3.99) and will be important to examine in this sibling pair population 19.
The study was originally designed to be much larger based on modeling and power estimates to detect linkage as described by Kerber et al., 2008 25(manuscript in press). Considering the minimum observed p-value of 0.0001 as a critical threshold, and taking the relatively conservative value of 0.2 as an estimated attributable risk for inherited susceptibility to colon cancer, we see reasonable (≥ 10%) power only if this locus accounts for the majority of the heritable effect, either because of high penetrance or high prevalence. It is not clear that the majority of heritable CRCs can be explained by variation at this locus, so we might wish to consider alternative explanations. Two details of the modeling may have resulted in over-conservative power estimates when compared to the study as we have actually carried it out: 1) the markers employed are far more heterogeneous than the 4-allele markers simulated; and 2) the multipoint methods employed in the present study using intervals of 1 cM may have considerably more power than the simulated tests. These two factors probably interact to increase power, so estimates may be quite conservative. Nevertheless, the p-values we have observed are surprisingly small considering the study design and sample size.
Although this is one of the largest colorectal cancer sibling-pair studies reported thus far, a major limitation is nonetheless its size, 75 families with DNA from two or more affected individuals. A strength as well as a challenge of our study was to restrict enrollment to individuals with invasive colorectal cancer or an adenoma with high grade dysplasia or carcinoma in-situ. This was a challenge due to the high mortality of colorectal cancer; it was difficult to enroll two siblings surviving colorectal cancer and willing to participate in research. Other groups have done similar studies, but included fewer families and less penetrant phenotypes, such as the presence of advanced adenomas (≥1cm) 7 or any colonic polyp including benign hyperplastic polyps 8. As Kemp et al. report, distinct genetic loci are identified when the study set is restricted to colorectal cancer.
Use of the CEPH allele frequencies for linkage analysis consistently results in higher LOD scores (vs. the study allele frequencies). This is most likely due to minor alleles being over represented in the CEPH data set. Since the CEPH allele set is limited to 13 individuals and 5% the size of the study allele set, this is not surprising. The one peak on chromosome 3, using only CEPH allele frequencies, is interesting in that it is close to MLH1 and might represent a marker commonly found in the study population and linked to a pathogenic MLH1 variant.
Further confirmatory studies as well as expansion of the study population size are needed to confirm this 7q31 locus and identify other genetic loci contributing to hereditary colorectal cancer risk in the population. Investigation of genes within this region will be equally important, especially regulatory or coding SNPs that may be enriched in the colorectal cancer population. Finally, evaluation of LOH in tumor blocks in regions showing linkage, especially the 7q31 region which has demonstrated LOH in sporadic colorectal cancers, would provide further evidence for region containing a tumor suppressor gene responsible for familial colorectal cancer.
Additional support was provided by a University of Utah General Clinical Research Center Grant M01-RR00064 and N01-PC-67000, Utah Cancer Registry grant N01-PC-35141 from the National Cancer Institute's SEER program with additional support from the Utah State Department of Health and the University of Utah, the Huntsman Cancer Foundation. We also acknowledge University of Colorado Comprehensive Cancer Center for the infrastructure and the information technology group for data collection, and Alicia Salkowski at the Karmanos Cancer Institute for protocol development, piloting, and study coordination.
Financial Support: Project is funded by NIH grant numbers U24-CA78174 (G.P.M.), PO1-CA073992 (R.W.B.), U24-CA78134 (H.A.), U01-CA78285 (H.A.), N01-PC35145 (A.G.S.), U24-CA78148(C.A.G.), CA74799 (D.JA.), U24-CA78142 (L.C.S), CA78157-05S1 (J.E.S.), and U01-CA078284 (D.M.F.).