This is the first large scale study that shows a clear association of alternative splicing of the 5′ exon 2 region of MUC1
with the rs4072037 A/G SNP. Both experimental and database trawling show that there is a strong association between transcript type and exon 2 allele status, implying that SNP rs4072037 controls splice acceptor usage as originally predicted from data obtained using cancer cell lines (Ligtenberg et al, 1991
). In addition to the major spliceoforms, the polymorphism affects all the minor ones, including ones with additional deletions in exon 2 (c
) and those without a TR domain, the latter association being described here for the first time. Our study confirms that these components are present as a minor fraction of the MUC1
transcripts in ‘normal' as well as cancer tissues. The alternative splicing at the start of exon 2 is independent of alternative splicing of the TR region, which is not controlled by rs4072037. While this work was in progress, an association of rs4072037 with the a
transcripts was also reported in adult corneal tissue (Imbert et al, 2006
), confirming the effect in another tissue. These authors did not, however, test genomic DNA but rather deduced genotype from cDNA.
Analysis of haplotypes 310
bp downstream and 330
bp upstream of MUC1TR
using CEPH trio data shows a number of SNPs in very strong linkage disequilibrium with rs4072037 (data not shown). However, none were completely associated with rs4072037. In addition, extensive re-sequencing (NIEHS (http://egp.gs.washington.edu/
) and A Teixeira and DM Swallow, unpublished) failed to identify any other intragenic SNPs, notably there being none in intron 1 that might have been responsible for this splicing event. Indeed no other common SNPs have been identified within the gene, showing that rs4072037 must be directly responsible for the splicing polymorphism.
However, this splicing polymorphism was not predicted by any of the exon prediction programs currently available. Indeed for both alleles, the longer transcript was predicted. On the other hand, the A-to-G substitution is predicted to alter protein binding in Splice Sequences Finder version 2.2. The secondary structure of the pre-mRNA is also predicted to be different, where only the G allele forms a physiologically stable stem loop structure (as predicted by the mfold program; http://frontend.bioinfo.rpi.edu/applications/mfold/cgi-bin/rna-form1.cgi
) (Mathews et al, 1999
; Zuker, 2003
). Indeed the possible importance of this difference was noted previously (Ligtenberg et al, 1991
). Whatever the mechanism, there appears to be some leakiness in the control: small amounts of the b
transcript can be found in a few of the normal tissues from GG homozygotes, and in the sequences submitted to databases, a few a
transcripts from cancer tissues were found that carry the A allele. The proportions of the transcripts are variable in heterozygotes, and this is not attributable to this leakiness as shown by the restriction enzyme digestion experiments. It seems to reflect cell-to-cell (and thus tissue-to-tissue) as well as person-to-person differences in allelic expression (as opposed to splicing), which may be epigenetic or genetic in origin. Comparison of the relative differences in allelic expression with relative length of the TR domain suggests that the genetically determined differences in transcript length do not significantly affect transcript quantity.
Putting together all these observations, it can be concluded that none of these transcripts are tumour-specific. Their over-representation in tumour material cannot really be evaluated without information on the SNP genotype as well. It is possible that inflammation and cancer affect the relative expression of the alleles and/or splicing. However, it is also possible that the genotypes were unevenly represented in the cohorts under study. In most studies, the a
transcript (normally encoded by the G allele) was more abundant. Studies on gastric cancer and gastritis (Carvalho et al, 1997
; Vinall et al, 2002
) have shown that MUC1TR S
alleles are over-represented, and the A allele (not the G allele) is normally associated with short TR alleles (Pratt et al, 1996
; Vinall et al, 2002
). However, in our own work, we have shown that it is in fact the recombinant GS alleles that are over-represented in gastric cancer and gastritis (A Teixeira et al
, unpublished). Unfortunately, in most of the studies that suggest an association of tumorigenesis with the a
transcript, data for neither the SNP nor the TR length were collected. Thus, it is possible that the associated rs4072037 G allele is often present as a recombinant haplotype with the short TR domain. These recombinant haplotypes are present at a reasonable frequency in the population (7–10%).
Interestingly, in one paper (Schmid et al, 2002
) where genotyping was done, substantial quantities of a
transcript were detected in three cell lines, which were homozygous AA for rs4072037. This is suggestive of secondary ‘leakiness' rather than allelic differences in expression and could be a genuine cancer-related change in splicing, which is consistent with the finding of some tumour-derived A allele a
transcripts on the databases. These observations, together with the study described here, also indicate that transcript phenotype cannot be used as a reliable surrogate for genotype. It had been our intention to examine these splicing events in relation to H. pylori
gastritis. It is clear, however, from the work reported here that there are too many variables to address this question without having a very large cohort of samples and better quantitative methods.
It should be emphasised that the allelic variation of rs4072037 polymorphism may affect the function of MUC1 protein. The alternative splicing event occurs within the signal peptide of MUC1 in the vicinity of the known proteolytic cleavage sites found for the b variant (Parry et al, 2001
). Although there is no experimental evidence for the a
variant, SignalP 3.0 site (http://www.cbs.dtu.dk/services/SignalP/
), which accurately predicts cleavage of the b
variant, suggests that the signal peptide for the a
variant will end at residue 23 from the start of translation, and an above threshold signal peptide cleavage site was predicted to lie between the T and A residues at positions 22 and 23, respectively. This is within the region that contains the inserted amino-acid sequence. The predictions for all the 5′ transcript variants are shown in .
Figure 4 Amino-acid sequences of the N-terminal ends of the a, b, c, and d encoded peptides and positions of the observed and/or predicted peptide cleavage. The sequence that undergoes alternative splicing is shown in grey on the a transcript. The upward arrows (more ...)
The normal MUC1 biosynthesis pathway and glycosylation include targeting of the mature protein to the apical surfaces of epithelial cells, and we have previously shown that this is altered in H. pylori
gastritis (Vinall et al, 2002
). The alternative splicing events within the signal peptide sequence domain could possibly lead to differences in cellular trafficking, as observed in interleukin-15 (Kurys et al, 2000
), altering the localisation of the mature protein. If, however, variation in the signal peptide does affect apical targeting, in either the normal or diseased mucosa, the effect is not ‘all or nothing' because apical staining is seen in the normal mucosa of individuals of all three genotypes and no staining was seen in any of the gastritis specimens regardless of genotype (LE Vinall et al