|Home | About | Journals | Submit | Contact Us | Français|
Reliable methods for predicting functional consequences of variants in disease genes would be beneficial in the clinical setting. This study was undertaken to predict, and confirm in vitro, splicing aberrations associated with mismatch repair (MMR) variants identified in familial colon cancer patients. Six programs were used to predict the effect of 13 MLH1 and 6 MSH2 gene variants on pre-mRNA splicing. mRNA from cycloheximide-treated lymphoblastoid cell lines of variant carriers was screened for splicing aberrations. Tumors of variant carriers were tested for microsatellite instability and MMR protein expression. Variant segregation in families was assessed using Bayes factor causality analysis. Amino acid alterations were examined for evolutionary conservation and physicochemical properties. Splicing aberrations were detected for ten variants, including a frameshift as a minor cDNA product, and altered ratio of known alternate splice products. Loss of splice sites was well predicted by splice site prediction programs SpliceSiteFinder (90%) and NNSPLICE (90%), but consequence of splice site loss was less accurately predicted. No aberrations correlated with ESE predictions for the nine exonic variants studied. Seven of eight missense variants had normal splicing (88%), but only one was a substitution considered neutral from evolutionary/physicochemical analysis. Combined with information from tumor and segregation analysis, and literature review, 16/19 variants were considered clinically relevant. Bioinformatic tools for prediction of splicing aberrations need improvement before use without supporting studies to assess variant pathogenicity. Classification of mismatch repair gene variants is assisted by a comprehensive approach which includes in vitro, tumor pathology, clinical, and evolutionary conservation data.
Lynch syndrome is an autosomal dominantly inherited disorder of cancer predisposition, caused by a functional defect in the DNA mismatch repair (MMR) complex. Identified mutations in the MMR genes MLH1 (MIM# 120436) and MSH2 (MIM# 609309) account for approximately 90% of Lynch syndrome families with detectable mutations, MSH6 (MIM# 600678) for up to 10%, and PMS2 (MIM# 600259) only very occasionally (Hampel, et al., 2005; Viel, et al., 1998; Wang, et al., 1999). Although Lynch syndrome families often exhibit a clinical phenotype of multiple cases of early onset colorectal and endometrial cancers across several generations (Jass, 1998; Lynch, et al., 1998), a more definitive diagnosis can be readily made on two important molecular pathological features of the tumour. Defects in MMR lead to tumour DNA microsatellite instability (MSI), detected in the laboratory as numerous insertion or deletion mutations in short repetitive sequence elements. In addition, loss of MMR protein expression is common, with no immunohistochemical (IHC) evidence of protein expression in >90% of cases with clearly pathogenic mutations. MSI or loss of MMR protein expression in the tumour of a young onset colorectal cancer case is indicative of Lynch syndrome (Caldes, et al., 2004; Lindor, et al., 2002). While IHC measurement of MMR protein appears to be the most sensitive and specific test for truncating and other immuno-unstable mutations (Southey, et al., 2005), this would obviously not be true for the subset of mutations encoding immuno-stable proteins with impaired function.
MMR gene truncation mutations considered to be pathogenic are identified in ~50% of colorectal cancer cases with presumptive Lynch Syndrome based on clinico-pathological features (Loughrey, et al., 2007). However MMR gene sequence variants of uncertain clinical significance (UVs) have been identified in a considerable proportion of such cases. UVs include rare nucleotide changes predicted to cause missense substitutions, small in-frame deletions, and sequence changes that may alter splicing due to their location near (but not within) consensus splice sites. Although individually rare, they collectively constitute a significant proportion of variants. At the time of study initiation, UVs had been identified in almost 10% of colorectal cancer case probands (n=37/383) ascertained through the Australasian site of the Colon Cancer Family Registry. Moreover, an analysis of the distribution of UVs by count and ascertainment revealed that UVs seen >10 times are detected in virtually equal numbers in clinic-based and population-based cases, but UVs detected once are over-represented at a ratio of 5:1 in “familial” clinic-based cases (Supp. Figure S1). Importantly, clinic-based participants in these series were screened for mutations irrespective of tumour IHC results. This suggests that rare UVs detected in individuals selected from family cancer clinics are more likely to be pathogenic. UVs present a challenge for genetic counselling, and thorough evaluation of their clinical significance will have direct impact on families concerned.
Estimates of pathogenicity are particularly difficult for UVs in MMR genes. They are individually rare, so classical family studies such as segregation analysis provide little power to provide precise estimates of disease risk on a per-family basis. Moreover, although multifactorial likelihood models incorporating genetic, tumour pathology, and evolutionary data have been developed for the clinical evaluation of UVs in other high risk genes (Chenevix-Trench, et al., 2006; Goldgar, et al., 2004; Spurdle, et al., 2008b), a similar application for MMR genes is complicated by the fact that many clinical laboratories precede gene screening with IHC testing to prioritize the relevant MMR gene for full sequencing. That is, there is a priori an increased likelihood that causal genetic variation will be detected in the gene selected for screening, but there is also a greater chance that a neutral variant will occur in cis with an undetected mutation, and appear to be causal from family inheritance studies. While recombinant vector-based functional studies may provide a more direct test of compromised function of individual variants, these are time-consuming and require specialised laboratory skills. Therefore, use of non-laboratory-based bioinformatic criteria for prediction of the functional consequences of UVs would be beneficial in the clinical setting, where resources for functional studies are usually limited.
Splicing aberrations underlie an increasingly recognized group of mutations in high-risk genes, including MMR genes. Mutations in the two highly conserved intronic dinucleotides 5'GT and 3'AG flanking exons are well recognized to result in exclusion of at least the adjacent exon from the transcript, and in the clinical setting are generally classified as pathogenic on basis of sequence information alone. However, mutations in intronic nucleotides in close proximity to splice sites, and in cryptic sites associated with splicing can also produce a non-functional protein (Davoodi-Semiromi, et al., 2000). Furthermore, there is evidence that variants predicted to cause missense substitutions can alter exonic splice enhancer (ESE) sites at the nucleotide level, to cause aberrant splicing (Aretz, et al., 2004; Cravo, et al., 2002; McVety, et al., 2005; Sharp, et al., 2004; Zatkova, et al., 2004). While these differ from vector-based studies in that germline material can be tested for specific splicing aberrations predicted to occur, extensive testing using even these relatively simple assays generally falls outside the resources of clinical laboratories. We tested a panel of UVs in vitro for possible splicing aberrations, and compared the accuracy of a variety of bioinformatic approaches in predicting the observed splicing aberrations associated with each UV.
Written informed consent was obtained from all family and control participants. Recruitment studies and this laboratory-based study have been approved the Human Research Ethics Committee of participating Institutions.
Nucleotide numbering reflects cDNA numbering with +1 corresponding to the A of the ATG translation initiation codon in the reference sequence, with the initiation codon as codon 1. cDNA references were NM_000249.2 for MLH1, and NM_000251.1 for MSH2.
MMR gene variants within consensus splice sites, or missense and small in-frame deletion variants, were identified by DHPLC and sequencing of MLH1 and MSH2 exonic regions in patients ascertained via the Australian site of the Cooperative Family Registry for Colorectal Cancer Studies (Colon CFR) (Jenkins, et al., 2006b). The majority of variant carriers were probands from families recruited through family cancer clinics, screened for mutations irrespective of tumour microsatellite instability characteristics, but a small subset were population-based probands recruited through a state-wide cancer registry and selected for mutation screening as a consequence of a microsatellite unstable colorectal tumour phenotype (4 families). Variants that were observed to co-occur with a known deleterious mutation were excluded from the study. A considerable proportion of these samples (69%) had also been previously analysed using MLPA for exonic deletions in MLH1 and MSH2, using the Salsa MLPA Kit P003 (MRC-Holland, Holland). The final sample set encompassed a total of 22 variants in 28 families, selected on the basis of lymphoblastoid cell line (LCL) availability from colorectal case probands for functional studies (See Table 1). At the time of study initiation, four of these 22 variants were considered deleterious from position alone but had no functional confirmation (MLH1 c.1668-1G>A, MLH1 c.1732-1G>A, MLH1 c.1990-1G>A, MSH2 c.1077-1G>T). The rest were designated UVs, including the more common variants MSH2 c.943+3A>T and MSH2 c.1906G>C p.Ala636Pro now generally considered pathogenic (See Table 2). The Colon CFR variant classifications at the start of this study were as follows: a deleterious mutation was any variant producing a stop codon or nonsense transcript, or having a change at the +1 or -1 position in an intron, or established from a thorough literature and International Society for Gastrointestinal Hereditary Tumours (InSiGHT) database search to portray the characteristics of a pathogenic mutation; a polymorphism was defined by prevalence within normal populations of 1% or greater; and remaining variants were considered to be unclassified. Colorectal case probands all reported white/Caucasian ancestry, with the exception of the proband carrying the MLH1 c.1732-1G>A sequence variant.
Variants were screened in 180 cancer-unaffected population controls reporting Caucasian ethnicity, ascertained as blood donors via the Australian Red Cross Blood Services (Marsh, et al., 2006), using direct sequencing of exonic regions containing variants under study. Variants found to be at polymorphic frequency (>1%) were not pursued further.
Tumor characteristics were determined for a subset of cancer-affected individuals using methods described previously. Briefly, formalin fixed, paraffin embedded tissue sections were stained for MMR proteins MLH1, MSH2, MSH6 and PMS2 (Lindor, et al., 2002). Tumors were analysed for MSI status (Lindor, et al., 2002) using 10 microsatellite markers (BAT25, BAT26, BAT40, BAT34, D5S346, D17S250, ACTC, D18S55, D10S197 and MYCL), comparing to normal tissue as reference where possible, and classified according to number of markers demonstrating instability: MSI-H for ≥3; MSI-L for 1-2; MSS for 0 unstable markers. Tumours with microsatellite unstable phenotype at entry into this study of unclassified variants were from probands carrying the following variants: MLH1 c.113A>G p.Asn38Ser (MSI-H and PMS2 loss only), MLH1 c.1852_1854delAAG p.Lys618del (MSI-H and MLH1/PMS2 loss), MSH2 c.913G>A p.Ala305Thr (MSI-L and normal protein expression), and MSH2 c.942+3A>T (MSI-H and MSH2/MSH6 loss).
Lymphoblastoid cell lines (LCLs) were treated with cycloheximide (two hours at 100μg/ml) as previously described (Kirschner et. al. 2000), and RNA extracted using the Qiagen Mini RNA Extraction Kit (Qiagen, Hilden, Germany). First-strand cDNA was then synthesized from 2μg of RNA using the Eppendorf cMaster RT kit primed with random hexamers, following the standard protocol. cDNA was amplified in 25μl reactions using HotMaster Taq and buffer (Eppendorf, Hamberg, Germany) with 20pmol of each primer and 10ng of DNA, using established primer sets and cycling conditions verified to selectively amplify the target amplicon only (details available on request). cDNA template was sequenced directly after clean-up with Millipore Montage PCR96 Cleanup Plates (Millipore, Bedford, MA, USA) using the standard protocol adjusted for optimal recovery of small PCR products. 1μl of the cleaned product was used in for a 1/8-strength sequencing reaction using the BigDye Terminator v3.1 reagents and protocol (Applied Biosystems, Foster City, CA, USA) in a 12μl reaction, with 2pmol of the appropriate primer. Reactions were performed in both forward and reverse directions, except for control and relative screening for known variants. Sequencing product was cleaned with the DyeEx 96 Kit (Qiagen, Hilden, Germany) using the standard protocol. The product was dried, resuspended in HiDi Formamide (Applied Biosystems, Foster City, CA, USA) and analysed on an ABI PRISM 3100 (Applied Biosystems, Foster City, CA, USA). In addition, 10μl of PCR product was run out on 2% agarose gels to visualize aberrant bands using ethidium bromide staining and UV light. Visible bands were excised manually and processed using QIAquick Gel Extraction Kit (Qiagen, Hilden, Germany) for sequencing, as above. Results were compared to reference sequence NM_000249.2 for MLH1, and NM_000251.1 for MSH2. Splicing aberrations were interpreted by comparison of observed results to carriers of other MLH1 and MSH2 variants screened at the same time.
Default thresholds were used for all initial analyses.
Exonic variants were analysed for possible ESE disruption with:
ESEfinder 2.0 (http://rulai.cshl.edu/tools/ESE)
Intronic variants were analysed using:
NNSPLICE V0.9 (http://www.fruitfly.org/seq_tools/splice.html)
For ESEfinder, increased custom thresholds were applied post-analysis (2 for SR protein ASF, and 3 for the other SR proteins), in an attempt to improve the sensitivity of ESE detection, as suggested in the literature (Gorlov, et al., 2004; Pettigrew, et al., 2005)
DNA was available from 3 to 53 family members per proband (average 16) for segregation analysis. DNA of relatives was screened for the family variant by direct sequencing using genomic DNA primers (details available on request), using the protocol described above. Results were compared to reference sequence NC_000003.10 for MLH1, and NC_000002.10 for MSH2. Information for analysis, including cancer status and type of cancer, age at interview/diagnosis, and pedigree structure, was available through the Colon CFR. Bayes factor analysis was performed by computing the likelihood ratio of the pedigree and genotype data under causality compared with neutrality using the approach described in Thompson et al (Thompson, et al., 2003). For this analysis, we assumed the age-specific relative risks estimated in Quehenberger et al (Quehenberger, et al., 2005), and derived penetrance functions using these relative risks and Australian (NSW) population-based age- and sex-specific incidence rates. The derived penetrance model included 14 liability classes, with unaffected male (4 classes: age groups <50, 50-59, 60-69, ≥70), unaffected female (4 classes: age groups <50, 50-59, 60-69, ≥70), affected colorectal cancer (4 classes: age groups <50, 50-59, 60-69, ≥70), endometrial cancer (any age), and minor cancer (small bowel/stomach/ovary/urinary tract, diagnosis <70 only). For individual variants, the probability of pathogenicity was derived from the Bayes scores, assuming a prior probability of pathogenicity of 0.5. This probability of pathogenicity based on segregation was then interpreted according to the 5 class variant classification system of Plon et al (Plon, et al., 2008) (see below). It is implicit in the penetrance functions used that the term pathogenic indicates a variant has a high risk similar to the average MMR mutations estimated in Quehenberger et al (Quehenberger, et al., 2005), and that the methods used cannot exclude the possibility that a variant may be associated with a moderate or low risk of cancer.
MLH1 and MSH2 protein multiple sequence alignments were built using M-Coffee (Wallace, et al., 2006), followed by minor hand editing, from sequences available at GenBank, the UC Santa Cruz genome browser, or Ensemble on January 2, 2008. The alignments contained MLH1/MSH2 orthologs from several mammals, plus chicken, frog, pufferfish, zebrafish, tunicate, sea urchin, and starlet anemone. These alignments, or updated versions thereof, are available online at <http://agvgd.iarc.fr/alignments.php>. Sequence conservation at relevant positions in the alignment was assessed by calculating the Grantham Variation (GV), the fit between missense substitutions and the observed range of variation was assessed by calculating the Grantham Deviation (GD), and then the GV-GD coordinate of each substitution was reduced to a 1-dimensional graded score (Tavtigian, et al., 2008; Tavtigian, et al., 2006). The program SIFT was also used, with the same sequence alignments, to assess if amino acid substitutions were likely to affect function (Ng and Henikoff, 2002). Amino acid substitutions were also assessed for pathogenicity using the MAPP-MMR algorithm (Chao, et al., 2008), using the web-based link http://mappmmr.blueankh.com. The greatest possible substitution score was noted for the two amino acid deletion variants MLH1 c.1852_1854delAAG p.Lys618del and MSH2 c.571_573delCTC p.Leu191del.
Previous reports of MMR gene variants under study were identified via the Mismatch Repair Genes Variant Database, which provides documentation of published reports of MMR gene variation (www.med.mun.ca/MMRvariants) (Woods, et al., 2007), and from an extensive review of functional analysis of mismatch repair unclassified variants (Ou, et al., 2007), and single recent extensive study of functional abrogation of MLH1 variants (Takahashi, et al., 2007). The classification of variants as recorded on the InSiGHT MMR gene mutation database (http://www.insight-group.org/) was also assessed, as was information on likely pathogenicity, functional assays, and clinical reports on the MMR Gene Missense Mutation Database (http://www.mmrmissense.net/db/query/query.aspx). Each variant was placed in one of 5 classes describing their likely pathogenicity, according to the variant classification system of Plon et al (Plon, et al., 2008). For segregation data, class was derived directly from the probability score, as follows: class 5, definitely pathogenic, probability >0.99; class 4, likely pathogenic, probability 0.95-0..99; class 3, uncertain, probability 0.05-0.95; class 2, likely not pathogenic or of little clinical significance, probability 0.001-0.049; class 1, not pathogenic or of little clinical significance, probability <0.001. For in vitro splicing data, class was derived by interpretation of cDNA splicing products from variant carriers, according to the suggestions of Spurdle et al (Spurdle, et al., 2008a). For a subset of three variants only, there was insufficient evidence for classification from the current study, but interpretation of published information reporting abrogated function allowed their categorization as Class 5, Pathogenic.
Three of the 22 variants were observed at polymorphic frequency: MLH1 c.1039-8T>A (which co-occurred with MLH1 c.198C>T, 9.6%), MLH1 c.*35_37delCTT (3.3%) and MSH2 c.965G>A p.Gly322Asp (1.7%). These were not studied further. The results from prediction and in vitro analysis of splicing aberrations for the remaining 19 variants are shown in Table 1, and Figure 1 (a-f). Eight variants were associated with major splicing aberrations generally considered deleterious in a clinical setting (bold, with gray shading). These included the four variants at the +/-1 position and classified deleterious by the Colon-CFR on the basis of nomenclature at study initiation, another variant at the +2 position within the canonical splice junction GT-AG dinucleotides, and three intronic variants at the +5, +3 and -3 positions. In addition, two variants resulted in experimentally detectable splicing defects for which the clinical consequences are more difficult to predict (bold only): MLH1 c.113A>G p.Asn38Ser caused a frameshift as a minor cDNA product; MLH1 c.790+2_+3insT altered the ratio of known alternate splice products not observed in any other cyclohexamide-treated samples (Figure 1b).
Bioinformatic prediction of splice site loss for in vitro-confirmed splicing aberrations associated with variants at or near intron-exon boundaries was equally good for the splice site prediction programs SpliceSiteFinder (9/10) and NNSPLICE (9/10), but poorer for NetGene2 (6/10). However, while loss of splice sites leading to aberrations was well predicted, this was not the case for existence and alteration of ectopic splice sites, preventing accurate prediction of the consequences of splice site loss. Existence of ectopic splice sites was not predicted at all by NetGene2, and was not absolute and did not always correlate for the other two splice site prediction programs.
No splicing aberrations correlated directly with ESE predictions for the nine exonic variants studied. There was no consistency in the prediction across the three programs used, in terms of number of sites altered, and effect of the variant (site created, destroyed, increased or reduced). Overprediction of ESEs was least for PESX (potential ESE/ESS alterations predicted for three variants), and poorer for RESCUE-ESE (six variants predicted to destroy/create/alter ESEs). Prediction using standard thresholds was poorest for ESEfinder (eight variants predicted to alter ESEs), but improved slightly with increased thresholds, which eliminated two of the original predictions.
In conjunction with results from splicing studies, an assessment of the likely pathogenicity may be provided by assessing whether multiple tumors from variant carriers show the features expected for a pathogenic MMR gene variant, namely IHC MMR protein loss consistent with the gene in which the variant exists, and MSI-H phenotype. In addition, segregation of the variant with Lynch syndrome tumors is another feature used to assess pathogenicity of rare gene variants. Results from such analyses are shown in Table 2. Tumor protein expression data was available from variant carriers (probands or family members) for 18 of the 19 variants. The data were consistent with loss of expression of the relevant MMR protein for the vast majority of variants. For 10/18 variants (highlighted in bold), the loss of expression was observed for multiple tumors from carriers of a specific variant, indicating a causative relationship between variant and loss of MMR protein expression. Information in support of causality for another three variants was based on a single tumor only. Exceptions included PMS2 loss only for MLH1 c.113A>G p.Asn38Ser (2/3 results) and MLH1 c.198C>T (1 tumor screened), and normal MMR protein expression for MSH2 c.913G>A p.Ala305Thr and MLH1 c.2059C>T p.Arg687Trp (1 tumor screened each). MSI data was available for 15/19 variants, and was consistent with the variant being associated with MMR defects for all but one variant, MSH2 c.913G>A p.Ala305Thr (1 tumor screened). MSI-H status was recorded for multiple tumors from carriers of 10 different variants (highlighted in bold). A general assessment of variant carrier status in genotyped family members affected with Lynch Syndrome associated cancers provided little information for four variants for which there was only a single family with only a single affected carrier. However, in the vast majority of the remaining families, all or most of the cancer-affected relatives carried the variant identified in the proband (noted in bold). Formal analysis of causality gave evidence of pathogenicity for four variants (probability ≥0.99), and suggested causality for another five variants (probability 0.88-0.99). There was no convincing evidence against causality for the remaining variants, in part due to limited information.
Assessment of amino acid evolutionary conservation and physicochemical properties for the eight variants predicted to affect a single amino acid position complemented results from splicing analysis and supported results from causality analysis. There were no splicing aberrations observed for seven of the eight missense alteration variants. Of these seven variants, two (MLH1 Lys618del and MSH2 Pro622Leu) fell into the almost certainly pathogenic grade C65, two more (MSH2 Leu191del and Ala305Thr) in the probably pathogenic grade C55, and two more (MLH1 Arg687Trp and Thr117Met) in the possible pathogenic grades C25 and C15. SIFT predicted all six of these to affect protein function (score 0.00). The remaining missense substitution, MSH2 Ala636Pro, occurred at a site demonstrating considerable cross species variability (Ala, Gly, Ser, and Gln were all observed in our alignment at this position), was scored in the likely neutral grade C0 by Align-GVGD, and received an innocuous score of 0.17 from SIFT. The minor product frameshift splicing aberration observed for the conservative substitution MLH1 Asn38Ser occurred at an evolutionary constrained site, fell in the probably pathogenic Align-GVGD grade C45, and received a likely pathogenic SIFT score of 0.00. MAPP-MMR predicted two variants to be borderline deleterious, and the remainder as deleterious, including MLH1 Asn38Ser, and MSH2 Ala636Pro.
An assessment of variant classification as recorded on the InSiGHT database (http://www.insight-group.org/) showed that 8 variants were reported, one considered unlikely to be pathogenic, and 7 considered pathogenic. A literature review facilitated by accessing the Mismatch Repair Genes Variant Database (www.med.mun.ca/MMRvariants) identified at least one published report for 14 variants, and the MMR Gene Missense Mutation Database (Uhttp://www.mmrmissense.net/db/query/query.aspxU) was accessed for a review of functional studies of missense variants specifically. For the MSH2 c.965G>A p.Gly322Asp variant found at polymorphic frequency (>1%) in our control series, there was a single report of reduced function - measured as increased mutation rate in a yeast reporter gene assay (Drotschmann, et al., 1999). There was evidence for functional consequences of the variant(s) under study for 7/20 variants pursued in this study as possibly deleterious (Casey, et al., 2005; Luce, et al., 1995; Otway, et al., 2005; Ou, et al., 2007; Takahashi, et al., 2007), but only three variants had previously been subjected to RNA analysis (from lymphoblastoid cells (Otway, et al., 2005) or LCLs (Luce, et al., 1995), or by conversion analysis - separation of alleles into hybrids and subsequent analysis of mRNA expression and cDNA sequence changes (Casey, et al., 2005)).
For the four variants with no previous report in the InSiGHT, MMR Variant or MMR Gene Missense Mutation databases, three were likely pathogenic or pathogenic from splicing analysis, tumor and/or segregation data (MLH1 c.113A>G; MSH2 c.1077-1G>T, MSH2 c.571_573delCTC). Although the three reports for the MLH1 c.198C>T variant (Ghimenti, et al., 1999; Palicio, et al., 2002; Wehner, et al., 1997) suggested this variant was not pathogenic, this appeared to be based solely on the fact it is a synonymous substitution, with no functional, tumor or segregation analysis reported.
There were numerous previous reports for MLH1 variants c.350C>T p.Thr117Met and c.1582_1854delAAG p.Lys618del, supporting the pathogenic classification for these variants by InSiGHT. Importantly, functional studies suggest that these missense alterations result in functional abrogation (Ou, et al., 2007; Takahashi, et al., 2007), as does MLH1 c.2059C>T p.Arg687Trp, consistent with the fact that no splicing aberrations were detected for these variants in our study. Similarly MSH2 c.1865C>T p.Pro622Leu and c.1906G>C p.Ala636Pro were generally considered pathogenic, have been shown to be functionally compromised (Ou, et al., 2007), and showed no splicing aberrations in this study.
The single report for MLH1 c.116+5 G>C used conversion analysis to identify a 227 nucleotide inclusion of intron 1 (Casey, et al., 2005), as we have done here using relatively straightforward experimental methods. The single report for MLH1 c.208-3C>G provided evidence from functional analysis of LCL-derived cDNA for an in-frame deletion of exon 3 (Otway, et al., 2005), confirmed in this study. The variant was identified in an individual from a family fulfilling Amsterdam Criteria, but no other information was provided regarding its likely pathogenicity (Otway, et al., 2005). Several reports indicated that MLH1 c.589-2A>G was associated with features of a high-risk mutation, and a single study using an in vitro transcription/translation assay identified this variant to be associated with protein truncation (Luce, et al., 1995), as we would predict from our splicing analyses.
Evidence from reports for MLH1 c.1668-1G>A, c.1732-1G>A and c.1990-1G>A, and MSH2 c.942+3A>T, were consistent with pathogenicity, as expected from our splicing analysis showing that these variants are associated with exon deletions. The single report for MLH1 c.790+2_+3insT included the same family as in this study (Jenkins, et al., 2006a), but did not report estimates of risk for this variant alone.
MSH2 c.913G>A p.Ala305Thr has been reported to be pathogenic by InSIGHT, based on the submission relating to unpublished data of Wijnen et al. However, the very limited tumor data from a published study by the same authors (Wijnen, et al., 1997), and this study, indicate that this variant is not obviously associated with the features of a high-risk mutation other than its identification in an Amsterdam family. Additional functional studies may provide more insight into the clinical relevance of this alteration.
The overall evidence from this and previous studies suggested that 14 variants were high-risk pathogenic and another two likely pathogenic. Overall, 14 variants had aberrant splicing or protein function, including two variants not previously described. In addition, we detected three variants at polymorphic frequency in unaffected controls, one of which had not previously been reported as polymorphic on InSiGHT or in the literature, and another for which a single report suggested functional abrogation.
This study was undertaken to assess utility of various methods for assessment of splicing aberrations associated with rare variants in MMR genes. The principle aim was to correlate bioinformatic predictions to splicing aberrations detected in vitro, and so provide information for future prioritization of functional studies of such variants. We included variants with varying likelihood of causing splicing aberrations - variants in consensus splice sites (high), variants near consensus splice sites (moderate), and variants predicted to cause missense alterations, many of which are known to have functional consequences relating to protein stability, binding or function (low-nil).
Our results show that use of prediction programs for assessing effects of variants within canonical splice sites are useful to assess if an aberration is likely. However, use of several programs is preferable to predict the precise nature of the splicing aberration(s) that result from a genetic variant (in-frame deletion, frameshift etc), particularly to assess the presence of ectopic splice sites. This indicates that, at present, prediction programs for detection of aberrations associated with variants in canonical splice sites should not be considered a replacement for in vitro studies, but rather a means to prioritize variants for in vitro analysis and confirmation of any associated splicing defects. Furthermore, although our study does not provide data to allow direct comparison of splicing in LCLs and splicing in the target disease tissue (colon or endometrial epithelial cells), it does provide support for the utility of splicing assays performed on RNA extracted from LCLs as a means to infer functional consequences of variants due to the detection of loss of splice sites as predicted bioinformatically. This is further supported by the fact that many of the variants shown to have splicing aberrations in vitro also had high probability of pathogenicity based on Bayes causality scores, implying that the splicing aberrations observed in LCLs do translate directly into risk
ESEfinder & RESCUE-ESE predictions were poor, consistent with published findings from other in vitro studies (Auclair, et al., 2006; Chenevix-Trench, et al., 2006; Lastella, et al., 2006). Use of increased thresholds appears to reduce overprediction, as we have previously shown for ESEfinder (Pettigrew, et al., 2005), but cannot always be easily applied. However, it appears that PESX may be a more conservative tool, as shown in one other study of MLH1 and MSH2 variants (Lastella, et al., 2006). While use of a higher threshold (perhaps as high as 6) may improve prediction of ESE/ESS sites by PESX, this would need experimental validation in a large sample set that preferably includes proven ESE alterations leading to splicing aberrations.
Another filter suggested to prioritize UVs within predicted ESEs for in vitro analysis is the location of variants and their associated ESEs within 125bp of an intron:exon boundary (Pettigrew, et al., 2005), based on molecular evidence for ESE positions relative exon ends (Cartegni and Krainer, 2002; Majewski and Ott, 2002). This filter was irrelevant for the exonic variants studied here, since they all were located within this limit, but may prove useful in future studies of exonic variants. Other possibilities to be considered to formally improve prediction of splicing aberrations outside of the consensus splice sites could include amino acid evolutionary alignment and physicochemical alterations to identify aberrations at the level of protein stability and protein-protein interaction. It is interesting to note that the 6/7 exonic sequence variants without associated splicing aberrations had alterations predicted between almost certainly or possibly pathogenic from the A-GVGD combined evolutionary and physicochemical analysis, and were also predicted to affect protein function by a SIFT position specific scoring matrix analysis, and the MAPP-MMR bioinformatic algorithm.
We have also shown how assessment of tumor characteristics (IHC MMR protein expression, MSI status), and particularly segregation analysis, can be helpful in assessing variant pathogenicity. For eight variants in this study, including two with potentially equivocal interpretation of splicing findings, MLH1 c.113A>G and c.790+2_+3insT, the segregation analysis alone would permit classification as pathogenic or likely pathogenic. We suggest that sequence variants be evaluated using the compilation of tumor and segregation data in combination with improved bioinformatic prediction methods to predict likely splicing alteration or physicochemical properties of amino acid substitutions. Such methods would assess likely pathogenicity and also identify the likely aberrations associated with UVs, to prioritize them for appropriate assays establishing their effect on protein function.
In summary, we have shown that bioinformatic tools for prediction of splicing aberrations need considerable improvement before they may be used without the support of functional studies as a tool to assess pathogenicity of variants in these and other high-risk genes. Our study provided evidence for splicing aberrations associated with ten variants, seven of which had no such data previously available, but which might have been considered to be pathogenic or possibly pathogenic from position alone. We have also shown that, given the availability of resources in terms of tumor characteristics and family segregation information, relatively simple approaches can be used to provide information about the likely clinical importance of MMR gene UVs, particularly if used in conjunction with functional studies. An important advance in this field of research would be the development of a more formal approach to quantitate likely clinical significance of the large number of MMR gene unclassified sequence variants reported by testing laboratories, such as the multifactorial likelihood analysis methods developed for evaluation of BRCA1 and BRCA2 gene variants (Chenevix-Trench, et al., 2006; Goldgar, et al., 2004; Spurdle, et al., 2008b). This would provide a quantitative assessment of risk for improved genetic counseling and management of patients and their family members (Plon, et al., 2008).
We thank the many families who have participated in the research programs, and the Australian Red Cross Blood Services (ARCBS) donors who participated as healthy controls in this study. We are grateful to Rachel Morris and the staff at ARCBS for their assistance with the collection of risk factor information and blood samples, and Joanne Young, Melissa Barker, Melanie Higgins, Kimberley Hinze, Felicity Lose and members of the Molecular Cancer Epidemiology Laboratory for their assistance with collection and processing of blood samples.
Funding: This research was supported by: the Australian National Health and Medical Research Council; the National Cancer Institute; National Institutes of Health under RFA # CA-95-011; through cooperative agreement with Australasian Colorectal Cancer Family Registry (U01 CA097735); by a subaward agreement from the Mayo Clinic, Rochester, MN, funded by the National Institutes of Health Breast Cancer Specialized Program in Research Excellence grant P50 CA116201; and by the Canadian Institutes of Health Research Interdisciplinary Health Research Team Grant (#CRT-43821) and a National Cancer Institute of Canada research fellowship (#13493).