|Home | About | Journals | Submit | Contact Us | Français|
Colorectal cancer is the fourth most common type of cancer and the second most common cause of cancer death. Less than 5% of colon cancers arise in the presence of a clear hereditary cancer condition; however, current estimates suggest that an additional 15-25% of colorectal cancers arise on the basis of unknown inherited factors. The aim of this study was to identify additional genetic factors responsible for colon cancer.
A large kindred with excess colorectal cancer was identified through the Utah Population Database and evaluated clinically and genetically for inherited susceptibility.
A major genetic locus segregating with colonic polyps and cancer in this kindred was identified on chromosome 13q with a nonparametric linkage score of 24 (LOD score of 2.99 and p=0.001). The genetic region spans 21 Mbp and contains 27 RefSeq genes. Sequencing of all candidate genes in this region failed to identify a clearly deleterious mutation; however, polymorphisms segregating with the phenotype were identified. Chromosome 13q is commonly gained and over expressed in colon cancers and is correlated with metastasis suggesting the presence of an important cancer progression gene. Evaluation of tumors from the kindred revealed a gain of 13q as well.
This identified region may contain a novel gene responsible for colon cancer progression in significant proportion of sporadic cancers. Identification of the precise gene and causative genetic change in the kindred will be an important next step to understand cancer progression and metastasis.
The lifetime risk of colorectal cancer in the U.S. is 6% and this tumor is the second leading cause of cancer death after lung cancer. Colorectal cancer (CRC) screening programs that include the removal of precancerous adenomatous polyps are the key to prevention, early diagnosis, and survival. Understanding the genetic and environmental risk factors that impact colon cancer initiation and progression constitute a complementary piece of these prevention efforts.
Sequentially ascertained pedigree and twin studies indicate that 20-30% of colon cancer cases appear to arise in the setting of inherited susceptibility 1-3. Three to five percent of colon cancer cases arise in the setting of well characterized inherited syndromes 4. These include syndromes in which colonic adenomatous polyps occur as a part of 1) familial adenomatous polyposis, 2) MUTYH associated polyposis or MAP, 2) hereditary nonpolyposis colon cancer (HNPCC or Lynch syndrome); and those where colonic hamartomatous polyps are found, 3) Peutz-Jeghers syndrome, 4) juvenile polyposis and, 5) Cowden syndrome. Each has now been associated with a gene or genes, that when mutated, gives rise to the condition. Although the inherited mutations are rare in the population, the genes involved have been found to be very important for initiation and progression of all colon cancers. The genetic basis of the remaining 15 to 25% inherited colon cancer susceptibility is poorly understood.
Individuals with more than one first-degree relative with colon cancer or a single first-degree relative with colon cancer diagnosed at age ≤ 50 years have a 3- to 6-fold greater risk than those with no family history 5-6. Multiple recent studies have characterized “high-risk colon cancer families” that fulfilled clinical criteria for HNPCC, but were not one of the inherited syndromes based on phenotype as well as tumor and germline genetic testing 7-10. This nonsyndromic type of susceptibility is less penetrant than observed in the known inherited syndromes. The average age of colorectal cancer diagnosis in these nonsyndromic cases is in the mid 50's to early 60's, a decade earlier than the general population (70 years) whereas the average age in Lynch Syndrome is 44 years. Defining the colon cancer etiology of this population may again reveal genes that are generally important in colon cancer development.
Association studies report several low-penetrance genetic variants associated with colon cancer risk that could account for a yet to be defined proportion of the familial colon cancer cases 11-12. A genome-wide association study of approximately 7000 colon cancer cases and controls identified a region on 8q24 with an odds ratio for colon cancer of approximately 1.2 13. The odds ratio climbed to 2.6 with coinheritance of single nucleotide polymorphisms (SNPs) on 8q24, 11q23 and 18q21 in a follow up study of 14000 cases and controls 14. Affected-relative-pair studies have also reported genetic regions that are co-inherited more often in first-degree relatives with CRC than those without. These include 7q31, 9q22.33, 3q21-24, and 11q23 with some minor peaks in agreement across studies 15-18. These reports support the paradigm that common inherited colon cancer arises from a number of susceptibility genes of lower penetrance than the well described syndromes of colon cancer.
Large families whereby precise inheritance can be correlated with phenotype offer another approach to identify isolated specific loci with well defined recombinant boundaries. Large families identified through Utah Population Database (UPDB), a genealogic resource linked to vital records and Utah and Idaho statewide cancer registries, are the foundation of what we have come to understand about both sporadic and inherited colon cancers 2 19-23. We report one such large family ascertained through UPDB and identified as having a statistical excess of colorectal cancer. Phenotypic and genetic analysis revealed a significant genetic locus on chromosome 13q that is linked to the colonic adenomatous polyp and cancer phenotype.
This study was approved by the Institutional Review Board of the University of Utah. Informed consent was obtained from all research participants.
The family (Kindred 5275) was identified from the Utah Population Database (UPDB), a genealogic resource containing over 7.5 million individual records of people who had a significant life event (birth, death, childbirth) in Utah or who are ancestral to current members of the Utah population. Probabilistic record linking methods, which take into account common identifiers to link records from one source to another, have been used to link approximately 94% of Utah Cancer Registry records (1966-present) to individuals in UPDB 24. K5275 was identified from UPDB as having a statistical excess of colorectal cancer as compared to the database as a whole 24. The probability that some number of cancer cases is observed among the descendants of a founder, given some number of person-years of risk among his or her descendants, is where x is the number of cancers observed and λ is the number expected given the total person time experienced in each of some number of risk strata based on age and sex. Considering only situations in which the observed number of cases (x) is greater than the expected number (λ), the probability of x or more cases being observed in a given family is . In this formula (j) is incremented from 0 to (x-1) and the sum of the Poisson probability of observing (j) cases of (λ) are expected over all the possible values of (j), then subtracted from 1. Pedigrees selected from UPDB by this method were reviewed for all cancers (to rule out obvious known syndromes), dominant inheritance patterns, and availability of age-appropriate participants as described in our previous report of six-such families including this one10. The p-value (not adjusted for multiple comparisons) calculated under the assumption of no familial aggregation of colorectal cancer based on Poisson probability of observed number of cases was p=0.002 24. The Familial Standardized Incidence Ratio (FSIR) of colorectal cancer (ratio of observed to expected colorectal cancers) was calculated at 12.4 for the 5-generation family as previously described 25.
Colorectal cancer cases in the family were contacted by the Utah Cancer Registry by mail requesting them, or their next-of kin, for permission to be contacted by the study. Before expanding the family, inherited CRC syndromes were excluded. Medical records were obtained on CRC cases and evaluated to rule out adenomatous and hamartomatous polyposis syndromes based on published guidelines 4. Hereditary nonpolyposis colorectal cancer (HNPCC, or Lynch Syndrome) was excluded by evaluating DNA microsatellite instability, a common feature of Lynch syndrome tumors, in archived tumor blocks from two index CRC cases (II-6 and III-13) 26-27. Tumor and normal DNA were extracted as described previously 28 and analyzed using the “reference marker panel” (BAT25, BAT26, D2S123, D5S346, and D17S250) 29. In addition, the MUTYH gene was sequenced from the germline DNA of individuals II-6, III-1, and III-15 (Figure 1) for the two common mutations, Y165C and G382D, which constitute approximately 85% of the mutations in the Caucasian population 30.
Once known inherited CRC syndromes were excluded, study staff contacted interested individuals and expanded the kindred through family referral. All reported CRC cases were confirmed by the cancer registry or pathology report. A medical history and physical exam was completed for each participant. Clinically indicated colonoscopy with polypectomy was performed by participating endoscopists with standard preparation and monitoring. Each polyp was noted by location and size before being removed and sent for histopathological evaluation. Individuals were coded as affected based on CRC status, size and number of adenomas and the age when they were first diagnosed with an adenoma. Three individuals had a single adenoma without advanced features (smaller than 10 mm and no villous histology) and were over age 50, the age at which the frequency of adenomas exceeds 10% in the general population 31. This included two individuals with a single <5 mm adenoma at ages 80 (II-4) and 55 (III-4) and one individual with single 7 mm adenoma at age 55 (III-5). Linkage analysis was run two ways 1) with all adenoma and CRC cases as affected, and spouses as unknown and 2) with the 3 noted cases run as unknown.
Genome-wide linkage was performed using a highly polymorphic, custom set of 325 short tandem repeat (STR) genetic markers. The average heterozygosity was 0.78 and the average spacing was 10.8 cM. Genotyping was performed on 13 individuals who were enrolled at the time (2 spouses, III-17, III-2, and 11 kindred members, II-4, II-6, III-1, III-11, III-12, III-15, III-18, IV-1, IV-2, IV-3, IV-22 in Figure 1) by using automated probe hybridization instruments designed and built at the University of Utah Human Genome Center 32. Genotype data were screened, and misinheritances were reviewed and resolved by two technicians; these reviews were carried out for <5% of the genotypes. Genome-wide pairwise two-point linkage analysis of the genotypes was performed using the MLINK subroutine of the FASTLINK (v.4.0) and LINKAGE (v.5.1) program 33-34. All markers were analyzed using the Marshfield genetic map 35 assuming equal allele frequency and an autosomal dominant model with a population frequency of 0.001 and a penetrance of 0.60.
The family was expanded and genotyping was also done using Affymetrix GeneChip Human Mapping 10,000 SNP array (HMA10K) on 21 individuals (4 spouses, 17 kindred members). Samples were processed according to the GeneChip Mapping Assay Manual DNAARRAY_WS2 protocol (Affymetrix) on the Affymetrix Fluidics Station 400. Arrays were scanned with the Affymetrix GeneChip® Scanner 3000 and analyzed with Affymetrix GeneChip DNA Analysis Software (GDAS) to generate genotype assignments for each of the SNP probes on the array. The deCODE map was used with Affymetrix allele frequencies. Multipoint nonparametric linkage analysis of SNP genotypic data was performed using the program GENEHUNTER (v2.1) 36-37 as part of the graphical user interface easyLinkage (v4.01). Analysis was completed both with and without removal of SNPs that were uninformative or in linkage disequilibrium, with virtually identical results. Five STR markers were used for fine mapping of the chromosome 13 locus in a total of 40 individuals (7 spouses, 33 kindred members with phenotypic information). FASTLINK and GENEHUNTER were used for the combined analysis of STRs and SNPs data at the chromosome 13 locus using the analysis programs on a UNIX platform.
Primers to amplify genes of interest were designed using the Exon Primer (Institut für Humangenetik, Munich, Germany) utility found on the UCSC Genome Browser with a maximal target size of 300 bp. Primers were designed for all exons, 5′untranslated region (UTR), 3′ UTR, and 2 kb of promoter for each gene. Amplicons were optimized using 2.5× LC Green Plus master mix (2.0mM, Idaho Technology), 1.0 μMol primers with 20 ng of DNA in a 10μl reaction with a temperature gradient of 62°-72° C. Melting acquisition was performed on a 96-well LightScannner high resolution melting instrument (Idaho Technology). The plate was heated from 76°C to 98°C and melting curve analysis was performed with LightScanner Software version 2.0 with normalization of data. A total of six individuals (4 affected, 2 controls) were screened for each amplicon. Samples giving abnormal curves were submitted to the University of Utah DNA sequencing core for analysis. DNA sequences were compared with published sequences using the UCSC Genome Browser BLAT utility found at (http://www.genome.ucsc.edu/cgi-bin/hgBlat?command=start).
Comparative Genomic Hybridization (CGH) was performed on archived formalin fixed paraffin embedded (FFPE) colorectal cancer from individual II-13 using the Agilent CGH arrays. Tumor and normal DNA were microdissected, deparaffinized, extracted from 5-10 micron sections of the block. Puregene DNA purification protocol (Gentra), with an extensive proteinase K treatment step was used. Genomic DNA was digested with AluI and RsaI and labeled with Cy3-dCTP (normal) or Cy5-dCTP (tumor) using the Agilent Genomic DNA Labeling Kit. Labeled DNA was hybridized to Agilent's Human Genome CGH Microarray Kit 44B with an average resolution of 35 kb. Hybridized microarray slides were washed, dried and scanned using an Agilent G2505B Microarray Scanner. Data was read and processed using Agilent's Feature Extraction Software to prepare microarray data for analysis. CGH Analytics software (Agilent) was used to check data quality and analyze statistically significant gains and losses.
FFPE tumor DNA (individuals II-6, III-13 and IV-1) and adenoma DNA (individuals III-11 and IV-4) were compared with normal DNA for somatic copy number changes at the chromosome 13 locus. DNA was PCR amplified using primers for STR markers in the region (D13S170, D13S251, and D13S265) and products were resolved and captured on an ABI3130xl capillary sequencing instrument and ABI GeneMapper 3.7 software. The ratio of the peak height of the two alleles in the tumor were compared with the ratio of the normal DNA. The ratio was calculated as: (peak area of tumor allele 2/peak area of tumor allele 1)/(peak area of normal allele 2/peak area of normal allele 1) and values over 1.5 were suggestive of copy number gain in the tumor.
The couple at the top of the kindred had 81 descendents recorded in UPDB (Figure 1). The extended 5 generation pedigree has been published and shows 5 documented colorectal cancers at ages 72, 86, 61, 42 and 35 years 10. The family clinically evaluated and used for genotyping and linkage analysis includes 4 lower generations and is shown in Figure 1. The average age of colon cancer in this branch was 46 and the average age when the first adenoma was detected was 49.5. DNA was obtained for genotyping on 40 individuals including spouses (Table 1). Colonoscopy was performed on 32 kindred members of which all have genotype data. Three family members had CRC, 9 family members and two spouses had adenomatous polyps, and 22 family members had no adenomatous polyps. Medical records from the CRC cases and colonoscopy procedures showed no evidence of known adenomatous or hamartomatous polyposis conditions. No one in the family had in excess of 5 adenomatous polyps. Two individuals had 8 hyperplastic polyps (III-5 and IV-13). Two CRC cases (II-6 and III-13) showed microsatellite stability, indicating that this family does not have Lynch Syndrome. We note that no affected family members had advanced adenomas (≥10 mm or advanced histology), however two of the three colon cancers were metastatic at diagnosis (age 42 and 35); a subtle suggestion that these adenomas may rapidly advance to a metastatic state.
Two separate genome-wide scans and additional fine mapping identified a single major locus on 13q31 which segregates with adenomatous polyps and colon cancer in Kindred 5275 (Figures 2, ,3,3, and and4).4). Both the STR scan and the HMA10K SNP scan supported linkage to this identical region. The linkage analysis, specifying an autosomal dominant model, generated a maximum 2-point LOD score of 2.43 for the STR marker D13S251 on chromosome 13. Two adjacent markers D13S170 and D13S265 also yielded positive LOD scores, 1.26 and 1.21 respectively. Nonparametric analysis of the chromosome 13 fine mapping region including both SNP (n=12) and STR (n=4) markers with high heterozygosity yielded a maximum nonparametric linkage (NPL) score of 24.12 (LOD score of 2.99 and p=0.001) at D13S251 when individuals II-4, III-4, and III-5 were coded as unknown (Figure 4). Phenotyping and analysis with these three individuals is described in Methods. When these individuals were coded as affected, the NPL score for this region increased to 30.25 (LOD score of 3.12 and p=0.0005), but the maximum LOD score is now found at D13S265. This result is due inclusion of individual III-5 who shares 2 markers with the minimal disease haplotype (Figure 5). When this individual is excluded, the nonrecombinant region spans rs1870836 (75,454,410 Mbp; 13q22.2) to D13S265 (89,171,101 Mbp; 13q31.3).
Although this is a large genetic region of 21 Mbp, there are only 27 RefSeq genes in the nonrecombinant region. Included are eight genes, KLF5, KLF12, LM07, c13orf7 (RNF219), SPRY2, GPC5, MYCBP2 and POU4F1 which have been implicated in cancer initiation or progression 38-39. Each of the exons in these 8 genes (or the RNA in the case of MYCBP2) was evaluated for genetic variants using the LightScanner followed by sequencing when a variant was detected. No unambiguously deleterious mutations have been identified in these genes; however SNPs segregating with the colon cancer and polyp phenotype were identified (Table 2). There is no evidence to indicate if these are in disequilibrium with the responsible change or if they are causative. Although the frequency of the noted SNPs in RNF219 and POU4F1 are not reported in the population, they have been observed in non-colon cancer individuals.
LOD scores were negative in regions surrounding genes known to cause familial colon cancer (APC, MYH, CTNNB1, MLH1, MSH2, MSH6, PMS2, STK11, PTEN, BMPR1A). The exception was D18S548 which is near SMAD4, with a LOD of 1.36; however additional markers close to SMAD4 (D18S363 and D18S858) gave negative LOD scores.
CGH analysis was performed on archived colon tumor and normal DNA from individual III-13 who is an obligate carrier of the chromosome 13 haplotype. The cancer was diagnosed at age 42 and had metastasized to five of 15 lymph nodes. CGH analysis revealed duplication of the majority of chromosome 13q (Figure S1, panel A); however at the region of linkage (Figure S1, panel B), there are only two small statistically significant amplifications (p=0.002) as indicated by the gold bar on the right. One region includes POU4F1, a gene sequenced and found to have a novel SNP ~ 50 bp upstream of the 5′UTR. Interestingly, this cancer had losses commonly observed in sporadic colorectal cancers including APC on chromosome 5, TP53 on the p-arm of 17, and DCC and SMAD4 on chromosome 18.
To further support this result, all 3 colon cancers from the family and two adenomas > 5 mm were evaluated for copy number gains by comparing peak areas of 3 STR markers at the chromosome 13 locus in neoplastic versus normal DNA. The adenomas showed no change, however all three cancers showed a copy number increase at one of three markers. The tumor from IV-1 had a ratio of 1.65 at D13S170, and tumors II-6 and III-13 had ratios of 1.89 and 1.67 respectively at D13S265. The other markers were either non-informative (homozygous, see Figure 5) or did not show copy number gain.
We describe a large extended family with excess colorectal cancer and no clinical or molecular features of the known hereditary colon cancer conditions. Colon cancers were identified at early ages (average of 59 years in the extended pedigree, and 46 years in the lower four generations who were enrolled in study), suggesting that underlying hereditary factors were at play. A genome-wide scan with linkage analysis identified a major locus on chromosome 13q22.1-13q31.3 that is statistically correlated with colon cancer and adenomatous polyps in this family. Although linkage of 13q to CRC has not been previously reported, chromosome 13q is frequently gained and over expressed in primary colon tumors 40-43. Consistent with these sporadic tumors, comparative genomic hybridization analysis of one tumor from K5275, showed gain of 13q, along with other chromosomal regions commonly altered in colon tumors. Two other cancers in the family show copy number gain at 13q with one of 3 markers. Genes within this region are thought to play an important role in CRC progression, but have not been precisely identified.
Linkage analysis in common inherited conditions like colorectal cancer can be challenging due to reduced penetrance, sporadic adenomas and the interplay of multiple genetic and environmental factors. Analysis was run coding three individuals with adenomas but not meeting the strict disease criteria as both affected and unknown. Because of the large size of the family, analysis could be restricted to the most informative branches, excluding all of the young (under 39 years of age) unaffected individuals in the lower generation which represent a mix of gene carriers and noncarriers (Figure 1). Inclusion of these additional individuals, even when assigning age liability classes with reduced penetrance, reduces the LOD score for marker D13S251 to 2.17 suggesting that many of the individuals in the younger generation are too young to express the phenotype. Having a family large enough to be able to eliminate phenotypic ambiguity may be essential to tease the signal from the noise in these rare inherited conditions that are also common in the general population.
As demonstrated by this study, investigation of large extended families, especially those identified through the UPDB, provides a powerful approach to define precise boundaries of genetic regions involved in CRC initiation and/or progression. One could argue that genetic regions found to increase colon cancer risk in these families are unique to the family and not applicable to the general population. However, the fact that 13q is gained in 30-50 percent of all CRCs and is particularly correlated with metastasis suggests that this is an important genetic region. Consistent with this correlation, it appears that simple adenomas may rapidly advance to metastatic colon cancer in this family as no family members had the intermediate phenotype of advanced adenomas.
Identification of the precise gene and causative genetic change in this kindred will be an important next step. Traditional exon sequencing of genes in this region has not yielded a clear answer. A more comprehensive approach using high throughput sequencing technology and evaluation of inherited copy number variation (CNV) will be applied to the entire locus from representative affected family members. There is a reasonable chance that a clear deleterious mutation will not be identified because evidence from sporadic CRCs suggests that there is a gain of function (oncogene) at 13q. The polymorphisms, noncoding sequences, and CNVs that are identified and that segregate with affected individuals will need to be evaluated for biologic significance. Once the gene is identified, additional cases/families can be evaluated to determine the precise fraction of colon cancer cases that this specific loci impacts. If this gene is also involved in sporadic cancer progression and metastasis, there may be opportunities for management of the molecular process through prevention or treatment interventions.
University of Utah's sequencing, genotyping, and microarray core facilities provided services in support of this study. We are grateful to the study coordinators, Amy Lee Dalton, David Nilson, Jennifer Lilley and Michelle Lewandowski for their tireless work with the family. We also thank Cindy Solomon, Rebecca Hulinsky, Katrina Lowstuter and Kory Jasperson for counseling the families on cancer risk and Michelle Condie-Done for analysis of the tumor samples.
Funding: This study was supported by National Cancer Institute grants R01-CA40641 and PO1-CA73992; additional support was provided by a Cancer Center Support Grant P30-CA42014, General Clinical Research Center Grant M01-RR00064 and N01-PC-67000, Utah Cancer Registry grant N01-PC-35141 from the National Cancer Institute's SEER program with additional support from the Utah State Department of Health and the University of Utah, and by the Huntsman Cancer Foundation. Database support to the Utah Population Database is provided by the Huntsman Cancer Foundation.
Disclosures: No potential conflicts of interest are disclosed.
Licence for Publication: The Corresponding Author has the right to grant on behalf of all authors and does grant on behalf of all authors, an exclusive licence (or non-exclusive for government employees) on a worldwide basis to the BMJ Publishing Group Ltd and its Licensees to permit this article (if accepted) to be published in Journal of Medical Genetics and any other BMJPGL products to exploit all subsidiary rights, as set out in our licence (http://group.bmj.com/products/journals/instructions-for-authors/licence-forms).