|Home | About | Journals | Submit | Contact Us | Français|
In around 30% of families with colorectal adenomatous polyposis, no germline mutation in the previously-implicated genes APC, MUTYH, POLE, POLD1, or NTHL1 can be identified, although a hereditary etiology is likely. To uncover further genes with high-penetrance causative mutations, exome sequencing of leukocyte DNA from 102 unrelated individuals with unexplained adenomatous polyposis was performed. We identified two unrelated individuals with differing compound-heterozygous loss-of-function germline mutations in the mismatch repair gene MSH3. The impact of the MSH3 mutations (c.1148delA, c.2319-1g>a, c.2760delC, c.3001-2a>c) was indicated on RNA and protein level. Analysis of the diseased individuals’ tumor tissue demonstrated high microsatellite instability of di- and tetranucleotides (EMAST) and immunohistochemical staining illustrated a complete loss of nuclear MSH3 in normal and tumor tissue, confirming the loss-of-function effect and causal relevance of the mutations. The pedigrees, genotypes, and the frequency of MSH3 mutations in the general population are consistent with an autosomal recessive mode of inheritance. Both index persons had an affected sibling carrying the same mutations. The tumor spectrum in these four persons comprised colorectal and duodenal adenomas, colorectal cancer, gastric cancer, and an early-onset astrocytoma. Additionally, we detected one unrelated individual with biallelic PMS2 germline mutations, representing Constitutional Mismatch Repair Deficiency Syndrome (CMMRD). Potentially causative variants in 14 more candidate genes identified in 26 other individuals require further workup. In the present study we describe biallelic germline mutations of MSH3 in individuals with a suspected hereditary tumor syndrome. Our data suggest that MSH3 mutations represent an additional recessive subtype of colorectal adenomatous polyposis.
Adenomatous polyposis syndromes of the colorectum are precancerous conditions characterized by the presence of dozens to thousands of adenomatous polyps, which, unless detected early and removed, invariably result in colorectal cancer (CRC). The phenotypic spectrum ranges from an early-onset manifestation with high numbers of adenomas and a positive family history to isolated late-onset cases with low polyp burden.
To date, two major inherited monogenic forms of colorectal adenomatous polyposis can be delineated by molecular genetic analyses: (i) the autosomal dominant Familial Adenomatous Polyposis (FAP [MIM: 175100]), caused by heterozygous germline mutations in the tumor suppressor gene APC (MIM: 611731);1; 2 and (ii) the autosomal recessive MUTYH-Associated Polyposis (MAP [MIM: 608456]), caused by biallelic germline mutations of the base excision repair (BER) gene MUTYH (MIM: 604933).3; 4
Currently, high-throughput sequencing approaches, in particular whole exome sequencing (WES), are considered the most powerful tools to detect causative variants in novel genes in as yet unexplained Mendelian conditions.5; 6 Very recent WES investigations have identified two rare forms of colorectal adenomatous polyposis: First, the autosomal dominant Polymerase-Proofreading-associated polyposis (PPAP [MIM: 612591]), caused by specific germline missense mutations in the polymerase genes POLE (MIM: 174762) and POLD1 (MIM: 174761).7-9 Second, another very rare autosomal recessive colorectal adenomatous polyposis (MIM: 616415) was identified, caused by a biallelic mutations in NTHL1 (MIM: 602656).10 The WES approach has also detected ZSWIM7 (MIM: 614535) and PIEZO1 (MIM: 611184) as promising candidate genes carrying variants causing colorectal adenomatous polyposis.11
However, in around 30% of polyposis cases, no underlying germline mutation is identified although a genetic basis is likely. Here, classical approaches to gene identification such as linkage analysis are not feasible, since most of these cases are either sporadic or are characterized by an uncertain family.12-15 Over the past two decades, a number of candidate gene studies have been performed without convincing results.16-19 Neither loss of heterozygosity (LOH) analyses nor profiling of somatic mutations has contributed to the identification of promising novel genetic causes. A fraction of cases might be explained by deep intronic APC mutations;20 APC mutational mosaicism;21 rare APC missense mutations;22 or other cancer predisposition genes.23-26 In addition, rare germline copy number variants (CNV) and low-penetrant variants might contribute to the genetic predisposition for formation of colorectal adenomas.15; 27; 28
Another hereditary colorectal cancer syndrome, Lynch syndrome, is not accompanied by a florid colorectal polyposis and is characterized by microsatellite instable tumors. The underlying cause is a heterozygous germline mutation in one of the mismatch repair (MMR) genes MLH1 (MIM: 120436), MSH2 (MIM: 609309), MSH6 (MIM: 600678), PMS2 (MIM: 600259), or in EPCAM (MIM: 185535).29; 30 Biallelic mutations in these genes lead to constitutional MMR-deficiency (CMMRD [MIM: 276300]) with multiple tumors and onset in childhood.31; 32 In contrast, familial CRC without polyposis or microsatellite instability is etiologically very heterogeneous. In some of these families germline mutations in FAN1, encoding a nuclease involved in DNA repair, were recently identified using a WES approach.33
To uncover additional genes with high-penetrance mutations causing colorectal polyposis, the germline exomes of 102 unrelated individuals with unexplained adenomatous polyposis were sequenced. The identification of further genetic causes will extend knowledge of disease mechanisms, biological pathways, and potential therapeutic targets.
All 102 individuals had unexplained colorectal adenomatous polyposis, i.e. no germline mutation in APC or MUTYH was identified by Sanger sequencing of the coding regions or deletion/duplication analysis by Multiplex ligation-dependent probe amplification (MLPA).34 All participants were screened for APC mutations in mosaic state.21 All persons were examined for pathogenic deep intronic APC mutations: 67 persons were screened via transcript analysis and 35 persons were tested for known intronic mutations.20 Furthermore, the two hotspot mutations in POLE and POLD1 were excluded.9 In addition, a single nucleotide polymorphism (SNP) array-based CNV analysis was performed in all individuals, as described elsewhere.28
For all 102 persons included in this study, a hereditary cause of the disease was considered highly likely. The inclusion criteria were the presence of at least 20 synchronous, or 40 metachronous, histologically confirmed colorectal adenomas, irrespective of inheritance pattern or extraintestinal lesions. All participants were of central European origin according to family name and self-report. Relatives were only considered to be affected if their medical records confirmed fulfilment of the inclusion criteria. The study was approved by the local ethics review board (Medical Faculty of the University of Bonn, board no. 224/07), and all participants provided written informed consent prior to inclusion.
Genomic DNA was extracted from peripheral EDTA-anticoagulated blood samples using the standard salting-out procedure. WES was performed at the Yale Center for Genome Analysis via capture using NimbleGen 2.1M human exome array, followed by paired-end sequencing on a HiSeq 2000 instrument (Illumina), as described elsewhere.35 Targeted bases were covered by a mean of 67 independent reads, with an average of 94% of all bases being covered eight or more times (Table S1). Reads were aligned to the hg19 human reference genome using ELAND (Illumina). SAMtools software was used to mark duplicated reads, perform local realignment around short insertion and deletions, recalibrate the base quality scores, and call single nucleotide variants (SNVs) and short Indels.
Variant call quality was assessed using SAMtools. A minimum quality score of 100 and a minimum coverage of 10x was required. Synonymous or intronic variants other than those affecting consensus splice sites were excluded from further analysis.
The resulting variants were filtered for (i) rare truncating (loss-of-function, LoF) alterations (nonsense mutations, frameshift deletions/insertions, and mutations at highly conserved splice sites), and (ii) missense variants at highly conserved nucleotide positions, which were predicted to be disease causing/damaging/deleterious by at least two out of three in silico analysis tools (PolyPhen-2, MutationTaster, and SIFT).
The variants were selected according to a recessive (presumed biallelic mutations) or a dominant (heterozygous mutations) mode of inheritance and an estimated disease frequency in the population of 0.01%. In the dominant and recessive disease models, variants with a minor allele frequency (MAF) of ≥ 0.01% and ≥ 1%, respectively, were considered benign polymorphisms or low-penetrance variants and excluded from further analysis. In addition, recurrent dominant (heterozygous) variants were selected using a less stringent frequency threshold (MAF 1%). Population allele frequencies are based on data from dbSNP; the 1000 Genomes Project (TGP); the Exome Variant Server (EVS); the Exome Aggregation Consortium (ExAC); and a large in-house exome database, which contains all germline variants identified in 2,816 exomes of individuals without known tumor disease sequenced under similar conditions. To exclude obvious sequencing artifacts, a detailed visual inspection of the remaining variants was performed using a read browser (Integrative Genomics Viewer, IGV).
We considered only genes affected by potentially pathogenic variants in at least two alleles of the cohort (heterozygous in ≥ 2 individuals or homozygous/compound-heterozygous in ≥ 1 individual). Finally, all genes carrying the remaining rare variants were inspected for the presence of rare, non-polymorphic heterozygous CNVs of ≥ 10 kb in order to identify additional recurrently mutated genes or biallelic, compound-heterozygous variants.28
Splicing efficiencies of the normal and mutant sequences were calculated using the following splice prediction programs: Human Splicing Finder;36 GeneSplicer from the University of Maryland Center for Bioinformatics and Computational Biology, MaxEntScan;37 and NNSPLICE 0.9 from the Berkeley Drosophila Genome Project.
The etiological relevance of the mutations was further explored by evaluation of their genetic intolerance to functional variation, as measured by the Residual Variation Intolerance score (RVIS);38 and the likelihood of haploinsufficiency, as measured by the Haploinsufficiency Score (from dataset S2, including imputated values).39 The expression of candidate genes was determined using the EST profiles of human colon tissue and protein detection data from human colon tissue (glandular cells) provided by UniGene and Human Protein Atlas.
Data concerning the frequency (percentage) of colorectal tumors with somatic mutations in candidate genes were obtained from the exome database of The Cancer Genome Atlas (TCGA). Somatic variants identified in exome data from colonic (n = 273) and rectal (n = 116) adenocarcinomas were downloaded from the TCGA data portal. To correct the data for the presence of passenger mutations, hypermutated tumors were excluded from the dataset. Therefore, the distribution of somatic variants in the TCGA exomes was analyzed, and all tumors with > 200 variants (24% of the tumors) were excluded. The remaining 295 exomes (76% of tumors) were used to calculate the frequency of tumors with somatic mutations in candidate genes.28
The identified truncating variants were validated via Sanger sequencing of the corresponding region using standard protocols. Genomic leukocyte-derived DNA was used to amplify the genomic region of the respective variant. PCR products were purified and sequenced on an ABI 3500xl Genetic Analyzer (Applied Biosystems, Darmstadt, Germany). Sanger sequencing of PMS2 was based on long-range PCR with primers specific for PMS2 to avoid pseudogene amplification as described previously.40
Venous blood samples were collected into PAXgene blood RNA tubes (Becton Dickinson, Heidelberg, Germany). Total RNA was extracted using the PAXgene blood RNA kit (Qiagen, Hilden, Germany), in accordance with the manufacturer's protocol. First strand cDNA was synthesized from 2 to 3 μg of total RNA using random hexamer-primed reverse transcription and the SuperScript first strand system for reverse transcriptase (RT)-PCR (Invitrogen GmbH, Karlsruhe, Germany), in accordance with the manufacturer's protocol. RT-PCR fragments were obtained according to standard PCR protocols, and different primers were used to generate the appropriate fragments. RT-PCR products were separated on 2% agarose gel and visualized with ethidium bromide using an UV imaging system (Bio-Rad, Hercules, CA). Individual bands were excised from the gel and eluted using the High Pure PCR product purification kit (Roche Diagnostics GmbH, Mannheim, Germany). Eluted DNA was reamplified with the same primer pairs, and sequenced as described above.
MSH3 (mutS homolog 3, MIM 600887) frameshift mutations, c.1148delA and c.2760delC, were generated via site-directed mutagenesis (QuikChange II Kit, Stratagene; primers: 5’-gttagggacaaaaaaagggcaacatt-3’ and 5’-aatgttgccctttttttgtccctaac-3’ or 5’-ggctcagattggctctatgttcctgcagaag-3’ and 5’-cttctgcaggaacatagagccaatctgagcc-3’, respectively) using the pcDNA3.1-/MSH3-wild type (wt) vector (kindly provided from Grazia Graziani, Italy).41 Plasmids were confirmed by sequencing. To mimic the splice site variants MSH3:c.3001-2a>c and MSH3:c.2319-1g>a, resulting cDNAs including the premature stop codons were synthesized (Gene Art) and subcloned in pcDNA3.1+ expression vector.
Transient transfection was carried out using HEK293T cells as described previously.42 In brief, HEK293T cells were transfected at 50-70% confluence with expression plasmids, pcDNA3.1-/MSH3 wt, pcDNA3.1-/MSH3 c.1148delA, pcDNA3.1-/MSH3 c.2760delC, pcDNA3.1+/MSH3 c.3001-2a>c or pcDNA3.1+/MSH3c.2319-1g>a (0.5μg/ml, respectively) using 2μl/ml of the cationic polymer polyethylenimine (Polysciences, Warrington, PA; stock solution 1mg/ml). 48h post transfection, cell extracts were prepared for Western blot analysis using anti-MSH3 (H-300, Santa Cruz Biotechnologies) and anti-ß-actin (Sigma). Fluorescence signals (680, 800, LiCor) were detected in a FLA-9000 (Fujifilm).
The effect of whole exon deletions on protein structure was illustrated in silico. MSH3 structure was obtained from Protein Data Bank (PDB) as pdb file “3thy” (MutSβ complexed with an insertion/deletion loop of two bases and ADP).43 We mapped the amino acids coded by exons 17 and 22 to the MSH3 structure using the PyMOL Molecular Graphics System, version 22.214.171.124, Schrödinger, LLC.
Immunohistochemical (IHC) staining of Formalin-fixed, Paraffin-embedded (FFPE) tissue samples was performed following established routine procedures on a fully automated Bond-III IHC stainer (Leica, Wetzlar, Germany) according to the manufacturer's protocol with the following primary antibodies: MLH1, MSH2, MSH6, PMS2 (all antibodies were purchased from Leica). The level of protein staining in tumor cells was compared to that in normal tissue. MMR protein level was considered deficient if the nuclei showed no, or only very weak immunostaining relative to normal tissue.
IHC of MSH3 was performed on 2-3 μm FFPE tissue specimens using an automated staining system (Medac 480 S Autostainer; Medac, Wedel, Germany). For antigen retrieval a Pre-Treatment (PT)-module (Medac) was used. A rabbit polyclonal antibody for MSH3, raised against an NH2-terminal polypeptide comprising amino acids 1–200, was used at a dilution of 1:100.44; 45 The reaction was developed with horseradish peroxidase (HRP)-conjugated detection system (C-DPVB 500 HRP, Medac) and the 3,3’-Diaminobenzidine (DAB) system (495192F, Medac).
DNA was extracted from 10 μm FFPE tissue sections. After deparaffinization, tumor tissue was macrodissected from unstained slides. A previously marked haematoxylin and eosin-stained slide served as a reference. Extraction of FFPE embedded tissue DNA was carried out using the BioRobot M48 Robotic Workstation and the corresponding MagAttract DNA Mini M48 Kit (Qiagen, Hilden, Germany), or the Maxwell™ 16 FFPE Tissue LEV DNA Purification Kit (Promega Corp.) in accordance with the manufacturer's protocol. Analysis of microsatellite status was performed with the previously described methods.46 For examination of somatic APC mutations, targeted sequencing with high coverage (read depths > 1000) was performed using the FAP MASTR Kit (Multiplicom) on a MiSeq platform (Illumina) in one person (individual 1661.1). The results were analyzed with the SeqPilot software (JSI Medical Systems).
Microsatellite analysis was performed on matched tumor and normal DNA samples using conventional fragment analysis or NGS-based analysis as previously described.46 This involved use of the National Cancer Institute (NCI) reference marker panel for the evaluation of microsatellite instability (MSI) in colorectal cancer. This panel consists of two mononucleotide (BAT25, BAT26), and three dinucleotide (D2S123, D5S346 and D17S250) repeats.47-49 Tumor DNA was extracted from microdissected tumor tissue. Normal DNA was extracted from normal tissue or peripheral blood leukocytes. Tumors were scored as highly instable (MSI-H) if two or more of these five markers exhibited additional alleles, and as stable (MSS) if none of the five markers showed instability.
A second panel of five markers was complemented. This consisted of four dinucleotide and one tetranucleotide (BAT40, D10S197, D13S153, MYCL1, and D18S58) repeats. The tumor was classified as MSI-H if two or more of the ten markers exhibited instability, and as MSI-low (MSI-L) if only one marker exhibited additional alleles.
To detect elevated microsatellite alterations at selected tetranucleotide repeats (EMAST), DNA from tumor and normal tissue was analyzed using five more tetranucleotide repeat markers (D20S82, D2S443, D21S1436, D9S747, and UTS037), as described elsewhere.50
To identify high-penetrance germline mutations causing colorectal adenomatous polyposis and located in genes not related to polyposis so far, whole-exome sequencing of leukocyte-derived DNA was performed in 102 unrelated individuals with unexplained adenomatous polyposis. Most of the cases presented with an attenuated colorectal phenotype (late onset disease and/or < 100 colorectal adenomas). The mean age at diagnosis was 44 years (range 14-73 years). The majority of persons in the whole cohort had no evident extracolonic lesions and most were sporadic or isolated cases. The basic clinical features of the cohort are summarized in Table S2 and S3.
The median coverage of mapped reads was 56x (66% on-target), and 84% of bases were covered at ≥ 20x. The overall performance of exome sequencing is described in Table S1. A principal component analysis demonstrated that all but one of the participants were of central European origin (Fig. S1). The outlier was excluded from further analysis. Two further persons were removed due to low coverage. A mean of 30,152 SNV per sample was called in the coding and flanking intronic regions. A number of stringent filter steps were applied to select for rare, non-polymorphic, truncating (loss-of-function, LoF) variants, assuming a dominant or recessive disease model.
We identified two unrelated individuals each carrying two different mutations in MSH3 (mutS homolog 3, [MIM 600887]) (Fig. 1, Fig. 2, Fig. S2). We also detected one unrelated person with two different mutations in PMS2 (PMS1 homolog 2, [MIM 600259]). Furthermore, potentially pathogenic germline variants were identified in additional 14 protein coding genes (Fig. S3, Table S4).
Case 1275.1 (II-4 in Fig. 1A) was a female diagnosed with colorectal adenomatous polyposis at age 36 years. She underwent a preventive sigmoidectomy at age 48, and a right hemicolectomy at age 53. Histology results were available for > 40 polyps, all of which were tubular or tubulovillous adenomas with low to intermediate dysplasia, often accompanied by inflammatory infiltration. Three distal hyperplastic polyps were also documented. In addition, this person had a history of proliferative disorders in other organs: thyroid adenoma at age 35; a small polyp of the corpus uteri and uterine leiomyomas at age 44; multiple small intraductal papillomas of (peripheral) mammary glands at age 44; and multiple adenomatous polyps in the duodenum at age 50. Hypertrophy of the retinal pigment epithelium was excluded by ophthalmological examination at age 50.
Case 1661.1 (II-2 in Fig. 1B) was a female diagnosed at age 32 years with colorectal tubular and tubulovillous adenomas with low grade intraepithelial neoplasia. At age 42, she underwent proctocolectomy and excision of large duodenal adenomas. This individual had a striking past medical history: at age 26, a grade II astrocytoma was diagnosed and surgically treated. At age 27, she underwent oophorectomy due to the presence of ovarian cysts, including one dermoid cyst. A hysterectomy was performed due to a myoma at age 34, and a thyroidectomy due to follicular adenomas at age 42. At age 43, she showed a cutaneous fibrolipoma, and at age 46, a flat epithelial atypia, multiple peripheral small intraductal papillomas, usual ductal hyperplasias, and cysts with apocrine metaplasia were detected in the mammary glands.
Both index persons had one affected sibling whereas the respective parents had no reported history of malignant gastrointestinal disease (Fig. 1). A sister (ID 1275.2, II-1 in Fig. 1A) of individual 1275.1 (II-4 in Fig. 1A) was diagnosed with a rectal adenocarcinoma at age 56, and a signet cell gastric carcinoma at age 59. The available histology reports described multiple tubulovillous adenomas of the entire colon and proximal duodenum, with up to high grade intraepithelial neoplasia, and two hyperplastic polyps of the transverse colon. Small bilateral renal cysts were reported as a secondary finding. The brother (ID 1661.2, II-3 in Fig. 1B) of individual 1661.1 (II-2 in Fig. 1B) was diagnosed with colorectal polyps at age 33 years; he underwent colectomy at age 37.
In total, the two index persons harbored four different MSH3 variants, all with a putative LoF effect. MSH3 (RefSeq NM_002439.4) on 5q14.1 is one of the six MMR genes identified to date in eukaryotic cells.51 It consists of 24 exons and encodes a protein comprised of 1137 amino acids, including several functional domains (Fig. 2). In both families, each affected individual carried one frameshift and one splice site mutation (Fig. 1, Fig. 2). All mutations were validated by Sanger sequencing (Fig. 2).
The frameshift mutation c.1148delA;p.Lys383Argfs*32 (g.chr5:79970921delA) of exon 7 in family 1275 is predicted to result in a premature stop codon after 31 amino acids. It corresponds to a known somatic cancer mutation located in a poly-A(8) tract of exon 7. The frameshift mutation c.2760delC;p.Tyr921Metfs*36 (g.chr5:80109507delC) of exon 20 in family 1661 is predicted to result in a premature stop codon after 35 amino acids. After transfection of MSH3 plasmids with each frameshift mutation in HEK293T cells, we demonstrated via Western blot that the altered proteins were shortened by the expected length (Fig. S4A and C).
The splice site mutations c.2319-1g>a (g.chr5:80074538g>a) of intron 16 in family 1661 and c.3001-2a>c (g.chr5:80160630a>c) of intron 21 in family 1275 are both located in the highly conserved splice acceptor sites, and are predicted to alter splicing in four out of four prediction tools. To demonstrate their functional effect, we performed a transcript analysis using primers located in flanking exons (Fig. 3A-B). RT-PCR products obtained on cDNA of individual 1661.1 carrying the c.2319-1g>a mutation, showed two bands on an agarose gel. Sequencing the shortened transcript confirmed a loss of exon 17, which is predicted to result in an in-frame loss of 39 amino acids (aa 774-812) on protein level. These amino acids are involved in DNA recognition (Fig. 2, Fig. S5).43 Sequencing the short RT-PCR fragment of individual 1275.1 (II-4 in Fig. 1A) with the c.3001-2a>c mutation confirmed a loss of exon 22. This is predicted to result in a frameshift mutation with a premature stop codon after 16 amino acids, altering the dimerization domain (Fig. 2, Fig. S5). In addition, we demonstrated via Western blot that MSH3 cDNA variants lacking the respective exon lead to altered proteins shortened by the expected length (Fig. S4B-C).
In controls (ExAC database), the variant c.1148delA is listed ten times (as chr5:79970914CA/C), and the variant c.2760delC is listed twice (as chr5:80109505TC/T) in a heterozygous state, corresponding to an allele frequency of 0.008% and 0.0016%, respectively. The two other MSH3 variants are not reported in the general population, and none of the variants are listed in the Human Gene Mutation Database (HGMD). To determine the frequency of MSH3 LoF mutations in the normal population, we queried large exome data sets from controls (TGP, EVS, ExAC). These listed heterozygous LoF MSH3 mutations with a MAF of < 0.2%, however, no homozygous mutation was reported.
We were able to obtain paraffin embedded tumor and adjacent normal tissue from the affected sister 1275.2 (II-1 in Fig. 1A) and a blood sample from the affected brother 1661.2 (II-3 in Fig. 1B) and found the same two MSH3 mutations as those present in the index case (data not shown).
To confirm compound heterozygosity, a leukocyte-derived DNA sample from one unaffected sibling (II-7 in Fig. 1A) of individual 1275.1 (II-4 in Fig. 1A) was gathered. Examination of both regions mutated in the index person revealed that the unaffected sibling carried only the frameshift mutation in heterozygous state and not the splice site mutation (Fig. 1, Fig. 3C). For individual 1661.1 (II-2 in Fig. 1B), the biallelic genotype was confirmed via transcript analysis: By creating an amplicon spanning both mutated regions, we could demonstrate that the transcript of normal length carried the single nucleotide deletion resulting in a frameshift mutation. In contrast, this single nucleotide deletion was not detected in the shortened product transcribed from the allele carrying the splice site mutation (Fig. 3D).
To assess potential additional causal alterations, we performed supplementary sequencing and MLPA of genes with influence on MSH3: Neither of the two index persons showed a germline mutation in other MMR genes (MLH1, MSH2, MSH6, PMS2), EPCAM, or TP53 (MIM: 191170), or any evidence of APC mosaicism in leukocyte DNA and no somatic MSH2 or MSH6 mutations in tumor tissue. Furthermore, immunohistochemical staining of two adenomas per index person demonstrated strong presence of the MMR proteins MLH1, MSH2, MSH6, and PMS2 (Fig. S6).
Immunohistochemical staining with a rabbit polyclonal antibody proved complete nuclear loss of MSH3 in normal colon mucosa of individual 1275.1 and in adenomas of both index persons (Fig. 4). In the control samples, non-tumorous mucosa of an independent person with colorectal cancer showed MSH3 predominantly located in the nuclei.
In adenoma-derived DNA from both index persons, we found stability of mononucleotide repeats. Complementarily, we examined several dinucleotide and tetranucleotide markers in order to focus on lesions that are processed by MSH3. Individuals 1275.1 and 1661.1 exhibited instability of 1/4 (low microsatellite instability, MSI-L) and 3/4 (high microsatellite instability, MSI-H) dinucleotide markers, respectively. In addition, individuals 1275.1 and 1275.2 showed instability of 3/6 and 3/5 tetranucleotide markers, while individual 1661.1 displayed instability of 4/6 tetranucleotide markers (Fig. 5; Fig. S7). Thus, we demonstrated EMAST in tumors of all three individuals that were examined.
In addition, the adenoma-derived DNA of individual 1661.1 was used to screen for somatic APC mutations. Using targeted deep sequencing, we compared four independent polyps with leukocyte DNA, and found seven different somatic APC mutations (1-2 per polyp) in 6-36% of the reads (Fig. S8). All mutations were small deletions of two to eight nucleotides. In 4/7 of the mutations, the sequence context proved to be di- or trinucleotide repeats.
Individual 1138 harbored the PMS2 germline mutations c.2T>A;p.Met1? in exon 1 and c.863delA;p.Gln288Argfs*19 in exon 8 (RefSeq NM_000535.5) (Fig. S9A). Both mutations were validated by Sanger sequencing (Fig. S9B). Compound heterozygosity could be confirmed since the healthy mother only carried the mutation c.863delA in exon 8. The start-loss mutation c.2T>A;p.Met1? was predicted to be pathogenic/damaging by 2/3 in-silico tools. In accordance with the assumed protein truncation caused by the two germline mutations, immunohistochemical staining showed complete loss of PMS2 in both tumor and normal tissue (Fig. S9C).
Additional clinical information and careful re-evaluation of the medical history revealed that individual 1138 was diagnosed with early-onset colorectal polyposis with 20-25 adenomas at age 14 years, and had undergone proctocolectomy with pouch-anal anastomosis (IPAA) at age 16. In addition, a primitive neuroectodermal tumor (PNET) of the cerebellum had been diagnosed at 4 years, and a history of a pilomatrixoma, thyroid cysts, and three café-au-lait spots was reported. The family history was unremarkable.
In the remaining individuals with unexplained polyposis, 29 different rare mutations in 14 protein coding genes were found in 26 other individuals (Table S4; Fig. S3). All genes are reported to be expressed in colon tissue, apart from one (DNAJB7 [MIM: 611336]). Two genes (MAGT1 [MIM: 300715], SLC27A5 [MIM: 603314]) were affected by a homozygous LoF mutation in one individual respectively, and seven genes (BTBD9 [MIM: 611237], CD36 [MIM: 173510], ECHDC3, SSC5D, UGGT2 [MIM: 605898], WDR35 [MIM: 613602], ZC3H8) were affected recurrently by heterozygous LoF mutations. Of these, three genes (CD36, WDR35, ZC3H8) have been implicated in cell adhesion or apoptosis. Five individuals carried > 1 heterozygous mutation. Based on the CNV data, we identified no further heterozygous or additional biallelic large duplication or deletion in these genes.
The majority of colorectal adenomatous polyposis cases is attributable to heterozygous germline mutations of the tumor suppressor gene APC, and thus diagnosed as FAP. However, the few novel subtypes delineated in recent years are caused by genes involved in DNA repair. While heterozygous mutations of the proofreading domain of the DNA polymerase genes POLE and POLD1 lead to the rare, dominantly inherited PPAP, the recessive MUTYH-associated polyposis and NTHL1-associated polyposis are caused by biallelic germline mutations of BER genes.3; 7; 10 After causal variants in those genes were initially described in only a few families, identification of additional cases expanded the mutation spectrum and allowed refinement of the respective phenotypes.8; 9; 52; 53
In a number of individuals with colorectal adenomatous polyposis, however, no germline mutation in the established genes can be identified. Although the syn- or metachronous occurrence of dozens to hundreds of adenomas is strongly suggestive of an underlying genetic basis, it remains unclear so far, whether the predisposing genetic factors mainly act in a monogenic fashion, or contribute as low or moderate penetrance variants to a more complex, oligo/polygenic trait.
Interestingly, increasing evidence suggests that biallelic germline mutations of the MMR genes can result in a phenotype with overlapping features of colorectal polyposis. Typically, these conditions are designated as constitutional or biallelic MMR-deficiency (CMMRD, BMMRD), and are characterized by early-onset CRC, brain tumors, hematological malignancies, and café-au-lait skin macules.31; 32 Nonetheless, several individuals with homozygous or compound-heterozygous PMS2, MSH2, or MSH6 germline mutations and an early-onset colorectal adenomatous polyposis in the second or first decade of life have been described;23; 31; 54 the majority of these cases were until then misclassified as mutation-negative FAP. In some individuals, in particular those with biallelic PMS2 mutations, however, the colorectal phenotype becomes manifest not before the third or even fourth decade of life, resembling the clinical presentation in the present polyposis cohort.
To uncover further monogenic causes, we performed exome sequencing of leukocyte DNA in a cohort of 102 unrelated individuals with histologically confirmed, genetically unexplained adenomatous polyposis. The clinical and family characteristics of the participants are consistent with published data from other mutation-negative polyposis cohorts.14; 15; 19; 55 Using this approach, we identified two families with biallelic LoF germline mutations in the MMR gene MSH3, a genotype which has not yet been described as causative for a polyposis phenotype. In addition, we found one individual with biallelic PMS2 mutations, and several persons who harbored homo- and heterozygous LoF variants.
The genotypes and pedigrees of the two unrelated persons with compound-heterozygous MSH3 germline mutations are in full agreement with a recessively inherited trait. Interestingly, unlike the majority of the examined cohort, these two individuals did have affected siblings and documented extraintestinal neoplasias. Neither index person had a germline mutation in any of the known genes associated with gastrointestinal polyposis, nor any further mutation in other MMR genes or EPCAM. In addition, IHC staining of the MLH1, MSH2, MSH6, and PMS2 was normal. Similarly, the haploinsufficiency score of MSH3 indicated a rather low probability of haploinsufficiency (score 0.486 (16.2%)).39 In large sets of controls, none of the LoF MSH3 germline mutations were identified in homozygous state, and the frequency of heterozygosity is compatible with a rare recessive disease.
The confirmation of compound-heterozygosity is critical to demonstrate a recessive inheritance: In both families we clearly showed that the two mutations are located on different alleles, either by examination of an unaffected sibling who was heterozygous for just one mutation (family 1275), or by transcript analysis indicating that both mutations are located on different alleles (family 1661). Taken together, these data strongly support the hypothesis that deleterious MSH3 mutations follow a recessive mode of inheritance.
The MMR system is a critical pathway, which corrects base:base and insertion/deletion mispairs occurring as the result of errors during replication, thus increasing the fidelity of DNA replication.56 Defects in the MMR system result in a mutator phenotype, which manifests as MSI in the DNA of affected cells. In tumors with MSI, microsatellite loci containing mono-, di-, tri-, and tetranucleotide repeats may be affected.57
During DNA repair, mispaired bases are recognized by two heterodimers of MutS homologs (DNA mismatch recognition complex), MSH2-MSH6 (MutSα), and MSH2-MSH3 (MutSβ) with partially overlapping mispair recognition specificities.56 In humans, MutSα efficiently binds single-base substitutions and small (single-base) insertion/deletion mispairs, whereas MutSβ has a stronger affinity for larger base insertion/deletion loops (IDL) with up to ten unpaired nucleotides.57; 58 Thus, loss of MutSβ due to MSH3 inactivation in human cells not only results in MSI at loci containing dinucleotide repeats; it also results in MSI at certain loci with tetranucleotide repeats, termed EMAST.57; 59 It is known that di- and tetranucleotide repeats are affected in the majority of CRC with MSI-L.58 MMR-deficiency can also result from an imbalance in the relative levels of MSH3 or MSH6.60
All four MSH3 germline mutations detected in the present cohort are strongly predicted to have a LoF effect. According to previous work, somatic MSH3 frameshift mutations at the (A)8 repeat in exon 7 result in a loss of MSH3.45; 61 To evaluate the pathophysiological consequences of the four mutations in more detail, we performed several experiments. Using transcript analysis, we confirmed aberrant splicing caused by the two mutations located within the conserved consensus splice motifs. This would affect regions relevant for dimerization and for DNA recognition, according to the MSH3 structure described by Yang's group (Fig. S5).43
Three of the four identified MSH3 mutations are predicted to result in premature stop codons and thus might lead to nonsense-mediated mRNA decay (NMD). However, the mRNA analysis, performed with fresh blood samples not treated with NMD-inhibitors, demonstrated that the affected transcripts are expressed. This might be due to NMD-escape, a phenomenon that is known from several mutations in other polyposis and MMR genes and that was also described for MSH3. You et al. found that MSH3 transcripts with a frameshift mutation at the (A)8 repeat in exon 7 are not degraded by NMD, but instead experience repression of protein translation. 62
The Western blot experiments illustrated that the altered MSH3 proteins are shortened by the expected length (Fig. S4), which would lead to a loss of the conserved C-terminal dimerization domain (Fig. 2, Fig. S5).43 Since the stable proteins were obtained in vitro using a human embryonic kidney cell line and a strong promoter for high-level expression, this observation is not per se transferable to the in vivo situation. In fact, immunohistochemical staining clearly demonstrated loss of nuclear MSH3 in both normal and colorectal tumor tissues of the affected individuals confirming the expected MSH3-deficiency. Different mechanisms such as repressed protein translation or blocked nuclear transport by hampered dimerization or changes in protein conformation might explain the nuclear absence of MSH3.
Microsatellite analysis of adenoma-derived DNA demonstrated EMAST, high and low instability at dinucleotide markers, and no instability at mononucleotide repeats in any of the examined tumors. These findings further confirm the functional relevance of the MSH3 mutations. In addition, presumed effects of the MSH3-deficiency are well reflected by the inflammatory infiltration, a characteristic feature of MSI colorectal tumors, and the somatic APC mutation spectrum observed in the adenomas.
Several lines of evidence support the causal relevance of MSH3-deficiency to initiate genetic instability and tumorigenesis. Around 50% of MSI tumors contain somatic frameshift mutations in the (A)8 tract in codons 381-383 of MSH3.57; 63; 64 The detection of LOH in some of these tumors supports the role of MSH3 and MSH6 as primary mutators. In CRC and human colon epithelial cells, MSH3-deficiency is associated with EMAST ([AAAG]n repeats) and MSI-L at dinucleotide repeats, and results in the formation of double strand breaks (DSBs) and significant changes in the proteome.57; 65
In yeast and extracts of Msh3 −/− cells, Msh3-deficiency leads to a partial MMR defect and MSI.66-68 In mouse models, elimination of either Msh3 or Msh6 alone still maintains some functional MMR activity, which is consistent with the persistence of the MutSα or MutSβ heterodimers, respectively. Of all MMR-knockout mice, Msh3-deficient mice exhibited the lowest, yet still significantly elevated, mutation frequencies compared to wild type mice.68
While MLH1, MSH2, MSH6, and PMS2 are established genes associated with Lynch syndrome, the causal relevance of MSH3 germline variants in cancer predisposition has remained uncertain till now. To date, MSH3 mutations have neither been consistently linked to a Lynch-like phenotype, nor described in polyposis cases. In several previous studies, common MSH3 polymorphisms were significantly associated with CRC and prostate cancer as low-penetrance risk alleles.69-72 In contrast, a potentially high-penetrance pathogenic MSH3 germline mutation has very rarely been identified in persons with a suspected predisposition to cancer.73; 74 Msh3-deficient mice develop late-onset MSI gastrointestinal cancers. However, given the small number of reported tumors, the significance of this finding remains unclear, and survival did not differ significantly from that of wild type control animals.67; 75; 76
In a Chinese cohort with suspected familial breast cancer, Yang et al. found three heterozygous MSH3 germline variants (two in-frame deletions and one frameshift mutation) (Table S5). They examined eight tumor samples from three families, all showing MSS in the standard marker panel (BAT25, BAT26, D2S123, D5S346, D17S250) and in average two instable loci in nine additional dinucleotide/EMAST markers. Two individuals showed a reduction in MSH3 staining of tumor (breast, ovary) compared to normal tissue.74 One family showed no evidence for Lynch syndrome, the MSH3 in-frame variant segregated incompletely with the disease and the MSH3 staining showed no relevant deficiency in the tumors. The second family met the clinical criteria for Lynch syndrome and the MSH3 in-frame germline variant segregated well with the disease in three generations. However, the tumor spectrum was broad, including breast, ovarian, renal, and colon cancer. It was not reported whether genetic causes of Lynch syndrome had been excluded systematically. The third family carried the MSH3 frameshift variant, which incompletely segregated with the disease. A comparison with currently available frequency data in the general population (ExAc) suggests that both in-frame deletions are likely to be polymorphisms (Table S5).
In a family with suspected, but genetically unexplained Lynch syndrome, Duraturo et al. found that two brothers with three metachronous CRC respectively had a compound-heterozygous MSH3 genotype, comprised of a potentially pathogenic missense variant and a silent variant.73 However, no functional data to confirm the pathogenicity of the variants was reported. Moreover, the silent MSH3 variant is meanwhile listed as a frequent polymorphism (rs1805355, Table S5).
All the biallelic MSH3 mutation carriers identified in this study presented with an attenuated colorectal and duodenal involvement with no or late-onset cancer. This is similar to the phenotype observed in persons with MAP or attenuated FAP, and is consistent with the phenotype described in MSH3 knockout mice. Two of the four biallelic MSH3 mutation carriers are reported to have extraintestinal tumors: while various thyroid neoplasias also occur in FAP and MAP individuals, the early onset astrocytoma fits well to the tumor spectrum observed in CMMRD.
A high frequency of EMAST was also observed in a wide range of extraintestinal sporadic malignancies such as skin, bladder, kidney, lung, ovarian, head and neck cancer,59; 77; 78 although the underlying mechanism remained unclear and an association with MSH3 impairment was not proven. Recent studies, however, provide strong evidence that EMAST formation is driven by MSH3-deficiency, either due to MSH3 mutations or e.g. by a nuclear-to-cytosol shift induced by oxidative stress.79; 80 Thus, it can be speculated that MSH3-induced EMAST is more common than previously thought and may occur in different tumor types. Consequently, the tumor spectrum in individuals with biallelic MSH3 germline mutations might include a much broader extraintestinal tumor spectrum than observed in the persons identified in the present study.
Although the clinical information and underlying molecular changes point to a broader tumor spectrum and some degree of overlap with CMMRD, further individuals with biallelic MSH3 mutations are needed to explore the whole oncologic phenotype.
The identification of one individual with a biallelic PMS2 mutation demonstrates that CMMRD is a rare but important cause in adenomatous polyposis cohorts. The c.2T>A;p.Met1? start-loss mutation is listed twice in ClinVar, and is considered pathogenic. According to ExAC data, the allele frequency of this mutation in the European population is 0.003%. Apart from the individual reported here, we recently identified another person with a CMMRD phenotype (B-cell lymphoma, acute lymphatic leukemia, carcinoma of the rectum, and a multifocal grade III-IV astrocytoma occurring between the ages of 9 to 15 years) in a multiple tumor cohort, who carried a similar start-loss (c.1A>T;p.Met1?) and a frameshift mutation (c.2117delA;p.Lys706Serfs*19) in PMS2. A similar potential founder mutation in the start codon (c.1A>G) was found in three unrelated individuals with CMMRD, all of whom had a compound-heterozygous PMS2 genotype and an isolated loss of the PMS2 in IHC.81 The PMS2 locus-specific database (LOVD) lists a fourth family with the same genotype. Although the person identified in the present study had extracolonic features suggestive of a CMMRD, these manifestations are often unspecific and may remain unreported or unrecognized (e.g. café-au-lait spots). This suggests that CMMRD is an underdiagnosed condition, which should be included in the differential diagnosis of any unexplained early-onset case of adenomatous polyposis.
Assuming a monogenic mode of inheritance with high penetrance, the frequency of causative germline mutations in the general population is expected to be low. In 26 of the 96 remaining individuals (excluding three samples after quality control and three resolved cases), we identified unique (i.e., not present in controls) or rare (i.e., frequency < 0.01% for the dominant or 1% for the recessive model in controls), potentially pathogenic germline variants in 14 protein coding genes. The causative relevance of these interesting candidate genes awaits exploration in larger patient cohorts and via functional analysis.
In the present study, mutations may have been overlooked, e.g. in low coverage regions or within repeat tracts in coding sequences. Moreover, some causative mutations might be located beyond the exome, e.g., in non-coding regions or in unannotated genes.
In conclusion, this study describes the identification of biallelic pathogenic MSH3 germline mutations as cause of an inherited tumor syndrome. Specifically, biallelic LoF MSH3 germline mutations appear to cause an additional rare recessively inherited subtype of colorectal adenomatous polyposis present in 2% of the study participants. Data from the present and previous studies consistently observe that mutations in newly identified genes causing inherited tumor predisposition syndromes are very rare (0.3-0.5% in unexplained polyposis cohorts with familial cancer).7; 82 At least some of these syndromes appear to show extreme genetic heterogeneity, and large cohorts are therefore required to identify recurrently mutated genes.
Preliminary experiments indicate that MSH3-deficient cells are more sensitive to cisplatin treatment or platinum-based adjuvant treatment for CRC.57 Thus, MSH3-deficiency might also be of therapeutic relevance for individuals with a MSH3-associated polyposis.
We thank the patients and their families for participating in the study and Prof. G. Graziani (University of Rome Tor Vergata, Italy) for generously providing the pcDNA3.1-/MSH3-wild type vector.
This work was supported by the German Cancer Aid (Deutsche Krebshilfe e.V. Bonn, Grant number 108421); the Gerok-Stipendium of the University Hospital Bonn (Grant no. O-149.0098); and NIH Centers for Mendelian Genomics (5U54HG006504). RCB and MMN are members of the Excellence Cluster ImmunoSensation, funded by the German Research Foundation (Deutsche Forschungsgesellschaft, DFG).
The funding sources had no involvement in the study design; the collection, analysis, or interpretation of data; the writing of the report; or the decision to submit the paper for publication. The corresponding author had full access to all the data in the study, and had final responsibility for the decision to submit the manuscript for publication.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
dbSNP (The Single Nucleotide Polymorphism database): www.ncbi.nlm.nih.gov/SNP/
Ensembl Genome Browser (release 54): http://may2009.archive.ensembl.org/index.html
EST profiles: www.ncbi.nlm.nih.gov/unigene/
EVS (Exome Variant Server): http://evs.gs.washington.edu/EVS/
ExAC (Exome Aggregation Consortium): http://exac.broadinstitute.org/
HapMap Project: www.hapmap.org
HGMD (Human Gene Mutation Database): http://hgmd.cf.ac.uk
Human Protein Atlas: http://www.proteinatlas.org/
IGV (Integrative Genomics Viewer): www.broadinstitute.org/igv/
NNSPLICE 0.9: www.fruitfly.org/seq_tools/splice.html
Online Mendelian Inheritance in Man: http://www.omim.org
PMS2 locus-specific mutation database: www.lovd.nl/PMS2
Primer3 v.0.4.0: http://frodo.wi.mit.edu/primer3/input.htm
SIFT (Sorting Intolerant From Tolerant): http://sift.jcvi.org/
TCGA (The Cancer Genome Atlas): https://tcga-data.nci.nih.gov/tcga/
TGP (1000 Genomes Project): www.1000genomes.org
UCSC Genome Browser: http://genome.ucsc.edu
SUPPLEMENTAL DATA DESCRIPTION
Supplemental Data include nine figures and five tables.
The authors have no conflicts of interest to declare.