Multiple self-healing squamous epithelioma (MSSE, OMIM132800) also known as Ferguson-Smith disease (FSD) is a rare inherited skin cancer syndrome characterized by multiple invasive keratoacanthoma-like skin tumors that regress spontaneously leaving pitted scars (Ferguson-Smith, 1934; Goudie et al., 1993). The majority of the first-identified affected families have a shared Scottish ancestry, but MSSE has also been described in families of non-Scottish origin (D’Alessandro et al., 2007). Recently, the causative gene for MSSE was identified as TGFBR1, encoding a transmembrane serine/threonine kinase receptor involved in TGF-β signaling (Goudie et al., 2011). TGFBR1 mutations in MSSE are functionally null implying a tumor suppressor action in this disease, whereas mis sense mutations in the same gene can lead to Marfan syndrome-related disorders (Goudie et al., 2011; Loeys et al., 2005). TGFBR1 had previously been excluded as a MSSE candidate because the gene lies outside the original shared at-risk haplotype (SRH) in the Scottish families (Bose et al., 2006). However using large scale sequencing technology, Goudie et al. (2011) identified TGFBR1 germline mutations in a total of 18 MSSE families including 12 Scottish families. Of 9 Scottish families with the SRH, 7 families shared the same TGFBR1 mutation (p.G52R), importantly, however 2 families had different causative mutations (p.N45S and p.R414X). This suggests that unidentified rare variants associated with MSSE pathogenesis may exist in the non-TGFBR1 SRH region that has been conserved in 9 families. The fact that this relatively large conserved haplotype had not undergone recombination over many generations in these 9 affected Scottish MSSE families suggested that manifestation of the disease may require both the causative TGFBR1 mutation and additional variant(s) located within the SRH region.
To search for these SRH variants we used the Agilent SureSelect Target Enrichment system followed by sequencing on the Illumina GAII platform to sequence all of the target genes between the markers D9S197 and D9S1809 on 9q22 31 (D’Alessandro et al., 2007). 13 families including 7 Scottish families with the SRH were available (Table S1). Family numbers hereafter correspond to the study by Goudie et al. 2011. We performed targeted sequencing in 5 families with the SRH and 5 families without the SRH, along with 6 CEPH controls. Coding regions of the 15 then-known genes were sequenced previously (D’Alessandro et al., 2007), but based on NCBI36 (hg18), there are now 29 known genes in the targeted region (Table S5), encompassing ~2.2-Mb in total, and this entire genomic region of these genes was sequenced. After filtering out common variants included in dbSNP and shared with CEPH controls, we identified 9 non-coding variants shared by all 5 families with the SRH. None of the 9 rare variants were detected in MSSE families without the SRH. These 9 variants are located in intronic regions of 5 different genes: FAM120A, PHF2, C9orf3, FANCC, and PTCH1 (Table 1). Sanger dideoxy sequencing confirmed these 9 variants in all 5 families plus 2 additional families (Table S1, Figure S1). Thus, all 9 variants were conserved over this ~2.2Mb region in 7 Scottish families with the SRH, although TGFBR1 mutations were different in families 2 and 18 (Figure 1a). In 231 unrelated healthy controls, including 118 Scottish individuals, the minor allele frequency (MAF) of the 9 variants was rare, ranging from 0 to 0.022 and the association of these variants with the MSSE phenotype was highly significant (Table 1). This suggests that these 9 variants are MSSE-associated and segregated in individuals with this rare skin malignancy condition. Interestingly, these variants are located at either end of the ~2.2-Mb target region leaving a ~1.4-Mb central region where 24 genes are densely located (Figure 1a). Further analysis of MSSE families lacking the SRH identified distinct MSSE-associated rare variants in two Scottish families (Table S2-3). In family 17, we identified 8 distinct variants that were detected in 4 affected family members (Table S3). Interestingly, these 8 variants were all clustered in a ~1.4-Mb central region of the SRH that excluded the 9 MSSE associated variants discussed above (Figure 1a). This family harbored the most complex TGFBR1 mutation (c.1059_1062delACTGinsCAATAA) that was not observed in other families. All of these variants are non-coding and not frequently found in 162 healthy Scottish controls tested.
Three of the 9 variants found were located in intronic regions of the PTCH1 gene, of which germline loss of function mutations are responsible for nevoid basal cell carcinoma syndrome (NBCCS) (Hahn et al., 1996; Johnson et al., 1996). Our previous study using mouse models suggested that a gain of function polymorphic variant in the mouse Ptch1 gene conferred susceptibility to squamous cell carcinoma (SCC) development (Wakabayashi et al., 2007). We considered the possibility that these rare PTCH1 variants may play a role in predisposition to SCC development in MSSE patients. We carried out a computational analysis of the possible functional significance of all variants using the program AliBaba2.1. This analysis identified the 97309311G>C PTCH1 variant as having possible functional significance. An electromobility shift assay (EMSA) was performed on the major 97309311G>C variant, which is reported to bind to multiple transcription factors including SP1 and PU.1 (http://genome.ucsc.edu/ENCODE/). While the major allele (97309311G) could bind to SP1 and PU.1 transcription factors in nuclear extracts from HaCaT immortalized human keratinocyte cells, the minor allele (97309311C) showed complete disruption of this binding, indicating that the MSSE-associated rare variant perturbs normal interaction between this binding site and transcription factors (Figure 1b).
In our study, we identified a rare MSSE-associated haplotype comprising 9 variants spanning a 2.2Mb region lying 1.5Mb proximal to the causative TGFBR1 gene. Several of the genes in this region have been implicated in cancer development, and regulatory polymorphisms may affect tumor susceptibility (Sinha et al., 2008). The co inheritance of these variants in the majority of Scottish families with MSSE suggests that these variants, or other as yet unknown linked variants (or their combinations), may affect the expression of the skin cancer phenotype induced by the mutations in TGFBR1. Their locations at opposite ends of the conserved haplotype may suggest that they modify chromatin loop structure to influence expression of genes within the region. Alternatively, the variants may directly affect expression of relevant genes, particularly the PTCH1 gene, which is implicated in susceptibility to different forms of skin cancer and carries three intronic variants, one of which affects transcription factor binding. Functional analysis of the possible roles of these and other variants in determining the MSSE phenotype, for example by gene expression or ChIP-seq analysis to assess differential transcription factor binding, will require further studies using keratinocytes or cultured primary tumor cells from MSSE affected patients. Finally, it is possible that since loss of function germline mutations in TGFBR1 are very rare, the conserved haplotype may act as a modifier to increase survival of individuals carrying strong mutations in this potent developmental regulator. This mechanism is supported by the identification of unlinked variant genetic modifiers that are preferentially inherited in mice haploinsufficient for Tgfb1 (Benzinou et al. 2012).