Several technical approaches, including gel electrophoresis, RP-HPLC, mass spectrometry, cDNA cloning and sequencing, allowed to get a detailed description of the milk protein fraction across two mouse strains belonging to the Mus spretus or Mus musculus species. In this manner, we first observed that the protein content of Mus spretus milk is by far very low compared with that of C57BL/6J. In addition, we identified several polymorphisms differentiating these two species, as well as so far undescribed casein splice variants.
The protein content varies across mouse species
Piletz and Ganschow [
12], in a comprehensive study, have already reported that the milk protein concentration in fifty inbred strains of mice belonging to the
Mus musculus species ranges between 97 g/L (in the C3H/HeJ strain) and 213.6 g/L (in the YEBT/Ha strain). Such a strain effect, affecting more generally the concentration of all milk components, was also observed across five
Mus musculus strains [
13], however to a less exent (10% of the mean milk concentration). More recently, Riley
et al. showed that the protein concentrations in QS5i and CBA milk are 87.6 ± 7.7 and 91.6 ± 8.9 g/L, respectively [
14]. Here, we show that the protein concentration in the milk of SEG/Pas mice was four folds lower compared with C57BL/6J mice and therefore similar to the protein concentration previously reported in PWK/Pas
Mus m. musculus milk (32 ± 6 g/L) [
10]. Thus, two strains recently introduced in animal facilities and belonging to
Mus spretus and
Mus m. musculus, display a much lower milk protein concentration by comparison with the classical
Mus m. domesticus subspecies. Therefore, the milk protein concentration greatly varies between mouse species but also within mouse species and between strains. The range of variation between mice is quite of the same order with that observed in phylogenetically distant species, since the concentration of milk proteins may account for more than 200 g/L in some lagomorph species, whereas in human milk, it does not exceed 10 g/L [
15].
Protein polymorphisms distinguish Mus spretus and Mus m. domesticus milk
Of the nine major proteins from mouse milk three only, namely αs1-casein, β-casein and WAP, showed obvious variations in electrophoretic mobility and/or chromatographic retention time between SEG/Pas and C57BL/6J, reflecting variation in charge, hydrophobicity and molecular weight. Sequencing cDNAs encoding Csn1s1, Csn2 and Wap in both mouse species revealed differential splicing patterns and SNPs in coding sequences, of which some were responsible for amino acid substitutions. Most of the splicing variants, observed with casein mRNAs, were shared by both C57BL/6J and SEG/Pas. On the other hand, SNPs inducing amino acid substitutions are the most discriminating features to distinguish SEG/Pas from C57BL/6J.
A polymorphism in the WAP encoding gene was suspected between C57BL/6J and SEG/Pas from chromatographic as well as 2D- and 1D-electrophoresis gel behaviour. Indeed, the WAP variant in SEG/Pas is remarkably slowed and ran as a smearing spot at a molecular weight higher than expected from its amino acid sequence. From the
Wap transcripts sequences, proteins with different molecular weights and isoelectric points (pI) are predicted: MW: 12,432.61/pI: 5.00 and MW: 12,313.57/pI: 4.83 for the C57BL/6J and SEG/Pas WAPs, respectively. The 119.04 Da difference in molecular weight cannot account for the dramatic difference in mobility of the variants on SDS-PAGE gels, whereas the horizontal shift observed in 2D electrophoresis agrees with the 0.17 difference in isoelectric points. Genetic polymorphisms across mouse strains have been previously reported for the
Wap gene. Indeed, WAP-A and WAP-B are used to ascribe C57BL/6J and YBR variants, respectively [
17]. The protein variant encoded by
WAP-B has one cysteine less and one arginine more than the WAP-A variant. Comparison of C57BL/6J and SEG/Pas
Wap cDNA sequences revealed mutations associated with 3 amino acid substitutions (K36E, T94A and M99K; numbering of amino acid residues is that of the pre-protein of C57BL/6J) and one amino acid deletion (ΔS93). Interestingly, the deletion of S93 together with T94A and M99K substitutions provide a domain II peptide sequence which is closer to the rat WAP than to the C57BL/6J WAP [
18]. Likewise, K36E substitution, located within domain I, leads to an acidic residue (E or D) that is conserved in most species, except for the
Mus musculus strains. Elsewhere, we found that the KSPT (or ESPT) insertion in the C-terminus part of the rat protein is due to incorporation of an intron sequence at the splice site junction between exons 3 and 4. Since these mutations in the SEG/Pas WAP are not located in the four disulfide core (4-DSC) domains containing the conserved cysteine residues, it is likely that they have small effects, if any, on the three-dimensional structure of the protein. Amino acid sequence of mouse WAPs does not highlight any potential site for post-translational modifications, in contrast to pig WAP which, from molecular weight considerations, appears to be glycosylated [
19]. Neither WAP-A, nor the SEG/Pas WAP stained positively with the periodic acidic-Schiff (data not shown), suggesting that they are not glycosylated. Moreover, Orbitrap mass spectrometer data confirmed the absence of post-translational modifications in WAP from both strains. Thus the shift observed in 1D and 2D electrophoresis gels between C57BL/6J and SEG/Pas seems not to result from molecular mass alterations, but rather reflects changes in protein conformation that may affect the constant SDS/protein ratio or the shape of the SDS-protein complex. Indeed, numerous studies aimed at testing the sensitivity of electrophoresis in detecting protein polymorphisms have shown that protein migration in SDS gels is often depending on their shape which in turn varies with their conformation [
5,
20,
21]. WAP displays a lipoprotein-like structure, although the amount of lipid associated with WAP is heterogeneous [
17]. Therefore, our hypothesis is that the amount of associated lipid is higher in SEG/Pas WAP than in WAP-A, thus impairing the expected migration.
The
Wap gene was also sequenced from strains 129/SvJ [
22] and GR (GenBank:
MMU 38816) belonging to
Mus musculus species. WAP from 129/SvJ is identical to the C57BL/6J WAP-A. By contrast, WAP from GR differs from WAP-A by three amino acid substitutions (L11R, P35Q and M90T), the first one being located in the signal peptide. Thus, at least three WAP variants exist within the
Mus musculus species and one in
Mus spretus (this work). Following the nomenclature used by Hennighausen and Sippel [
23], we propose to name WAPs from GR and SEG/Pas WAP-C and D, respectively.
The frequency of SNPs in the coding region of Wap cDNA from SEG/Pas and C57BL/6J was estimated to 1.76%. This frequency was only 0.5% within the coding region of Wap from C57BL/6J and GR that belong to the same Mus musculus species.
A comparison of C57BL/6J nucleotide sequences published for β- and αs1-caseins with the C3H/HeN and FVB/N mice strains sequences, respectively, did not reveal any genetic polymorphism. By contrast, sequencing data provided here for C57BL/6J and SEG/Pas αs1- and β-caseins cDNA clearly show the existence of SNPs in the coding regions, leading to 5/6 amino acid substitutions, as well as in the 3'UTRs.
Comparisons of coding and non coding (3'UTR) orthologous milk protein genes in
Mus musculus and
Mus spretus indicate that, on average,
Mus spretus exhibits one SNP in every 60 to 100 bp and 20 to 50 bp, respectively. We found that SNPs occur at higher frequencies in non-coding (3'UTR: 2.86%) than in coding (1.3%) sequences in
Csn1s1,
Csn2 and
Wap genes. These results agree with previous data indicating that SNPs occur at higher frequencies in non coding (2.2% and 1.4% in introns and both 3' and 5'UTRs, respectively) than in coding (0.6%) regions in
Mus spretus [
5]. However, the SNP frequencies in milk protein genes reach twice the values reported at the whole genome level. Comparing C57BL/6J and SEG/Pas, others have also estimated the polymorphism rate at the whole genome level to range between 1 and 2% [
1,
4].
Therefore, selection pressure seems to be lower on milk protein genes than on the rest of the genome. However, highly conserved hydrophobic domains and multiple phosphorylation sites, identified both in αs1- and β-caseins, are less subjected to mutations, confirming functional constraints acting to conserve the overall architecture of the corresponding molecules. Likewise, despite a significant rate of polymorphism between C57BL/6J and SEG/Pas, no mutations were detected in the WAP 4-DSC domains that are essential for its structure and function.
αs1-casein molecular diversity is mainly due to post transcriptional modifications
RP-HPLC analyses were indicative of the existence of several α
s1-casein isoforms and polymorphisms, since α
s1-caseins from C57BL/6J and SEG/Pas milks show different retention times. In a previous study [
10], we reported that the minor and the major isoforms of α
s1-casein exhibited different chromatographic elution behaviour between milk from C57BL/6J and PWK/Pas mice. Indeed, comparison of chromatographic profiles from C57BL/6J, SEG/Pas and PWK/Pas, revealed that the two α
s1-casein isoforms from C57BL/6J had a shorter retention time than the isoforms from PWK/Pas and SEG/Pas milks. Despite they belong to different species, α
s1-casein isoforms from PWK/Pas and SEG/Pas show a similar chromatographic elution pattern.
Since tryptic peptide masses that allow the identification of α
s1-casein in both fractions did not distinguish between isoforms, native masses of proteins contained in each chromatographic fraction of α
s1-casein were measured using MALDI-TOF mass spectrometry (data not shown). We obtained several masses ranging between 31,800 and 35,000 Da which correspond to α
s1-casein. However, it was not possible to assign the different isoforms identified from cDNA sequencing to the molecular weights obtained, within each fraction. This result suggests the existence of additional isoforms including different phosphorylation levels. Such a hypothesis is consistent with the electrophoresis (1D) patterns of α
s1-casein chromatographic fractions which were shown to contain at least two bands, either with C57BL/6J or SEG/Pas milks (data not shown). Bands from minor fractions migrate faster than those present in the major fractions, thus suggesting that the former contains the shortest variants (278+282 aa and 269+279 aa), whereas the major fraction contains the full-length protein (298 aa) together with protein variants arising from c (292 aa) or c'(309 aa) mRNA isoforms. Moreover, the partial deletion of exon 21 in isoforms
a (C57BL/6J) and
a' (SEG/Pas) should strongly modify its chromatographic behaviour since the 19-aa deleted segment encodes a number of hydrophobic residues, including 4 phenylalanyl (F68, F72, F77 and F83), 1 isoleucyl (I71) and 2 alanyl (A77 and A82) residues. This large deletion is due to the usage of a cryptic splice site occurring in exon 21, 57 nucleotides downstream from an AG defining the proper end of intron 20, in frame and following a rather strong polypyrimidine tract (n = 20) although interrupted by a triplet of contiguous G. The same deletion was also reported in the α
s1-casein from the mouse FVB/N strain (GenBank:
AAH40246). Even though such an event seems to be relatively rare, a casual improper splicing using an exon cryptic splice site and leading to the loss of 132 aa residues was also reported in the equine
CSN2 mRNA [
24]. On the other hand, the loss of a CAG, induced by an error-prone junction sequence, is much more frequent. This defect in accuracy was observed both with β-casein (already mentioned in GenBank:
AK021328) and for the first time in α
s1-casein (this work). It leads casually to the loss of a glutaminyl residue (Q) promoted by the nucleotide sequence at the junction between intron 7 and exon 8 for
Csn1s1 mRNAs and between intron 5 and exon 6 for
Csn2 mRNAs. The mechanism by which AG defining the 3' splice site is accurately and efficiently recognized involves a 5' to 3' scanning process [
25]. The first AG downstream of the branch point-polypyrimidine tract is selected preferentially. However, the occurrence of competitive AG, downstream from the proximal one, can alternatively trigger its usage. The occurrence of a tandem CAG triplet codon at an intron-exon junction would be a facilitating feature. The casual deletion of the CAG codon was first detected in casein-coding genes in goat [
26], and later in ovine [
27], bovine and water buffalo [
28]. Such a splice-acceptor site slippage was also reported in the human α
s1-casein [
29,
30]. Examples of insertion/deletion of Q are well documented and occur in all calcium-sensitive casein pre-mRNAs, as well as in a number of other proteins in mice and humans such as ABCG8 [
31,
32], IGF-1 receptor [
33] and PAX3 [
34]. More generally, alternative splicing at short-distance tandem site is widespread in many species [
35].
The insertion of 33 nucleotides upstream of exon 11 in Csn1s1 SEG/Pas is likely due to the usage of a cryptic splice site. Since the relevant genomic sequence of Mus spretus is not available yet, it is difficult to sustain such an hypothesis. However, there is an imperfect polypyrimidine tract containing several purine bases in the genomic sequence of Mus musculus, upstream from the 3' acceptor cryptic splice site.
Additionally, several α
s1-casein isoforms identified in C57BL/6J and SEG/Pas as splicing variants arise from "clean" skipping of exon sequences. These isoforms originated from skipping events during the processing of the primary transcripts. The phenomenon has been reported in small ruminants [
26,
30] and in humans [
29,
36]. Regarding exons 9 and 10 which are alternatively skipped, "
en bloc" from the C57BL/6J mRNAs, this could be explained, as far as exon 9 is concerned, by weakness in the consensus sequence at the 3' acceptor splice site of intron 8. However, there is no obvious explanation for exon 10 skipping. Likewise, in SEG/Pas, two exons (16.14 and 17) were simultaneously deleted. Since genomic sequences are not available for
Mus spretus, it is difficult to explain why this kind of event may have occurred. Exon 16.3, which is skipped in C57BL/6J, is located in a chromosomal region where major rearrangements have been reported in the mouse genome (GenBank:
NC_000071). This is also the region where one can posit that exon 15 in cattle (generally in bovidae species) might have resulted from the "exonisation" of a 24 nucleotides intronic region located between exons 14 and 16, from comparison of orthologous
CSN1S1 genes between several species. In mouse
Csn1s1, there is a tandem repeat of 14 copies of a 18 nucleotides exon (exon 14 in Rijnkels
et al [
37] and exon 16.1 in the numbering adopted in Figure ). Multiple alignments of genomic sequences spanning the first to the third exon copy suggest that this structure results from a first duplication event that involved exon 16 and a part of the downstream intron (Figure. ). Later, a subsequent tandem duplication of this basic motif might have occurred seven times. This exon 16 which is present in humans [
29] camel, horse, guinea-pig and pig [
30,
37] is also present in bovine [
29]. It was shown to correspond to a short "virtual exon" occurring within intron 15 and surrounded by quite perfect consensus splice sequences, except the 5' donor splice site which is absent in the bovine genome.
About 45 different genetic variants are expressed from the 6 main bovine milk protein loci (Miranda
et al., unpublished results) and considerable differences in allele frequencies were observed among breeds. The situation is still much more complex in less selected ruminants such as goats in which
ca. 35 alleles have been found at
CSN1S1 and
CSN3 loci [
30]. However, comparative analysis of the casein gene cluster genomic sequences across species shows that the organization and orientation of the genes is highly conserved. The conserved gene structure indicates that the molecular diversity of caseins is primarily achieved through variable species-specific use of exons (exon-skipping or differences in exon usage) and high evolutionary divergence. Caseins are the most divergent of the milk proteins with an average pairwise percent identity ranging between 44 and 55% across placental mammals [
38].
By contrast to the rapid evolution of casein genes previously put forth [
39], milk protein genes in general seemed to evolve more slowly than others in the bovine genome, despite selective breeding for milk production. The most conserved genes were those for proteins of the milk fat globule membrane, suggesting that the mechanism for milk-fat secretion is essential. Diversity in milk composition could not be explained by diversity of the encoded milk proteins and although gene duplication may contribute to species variation, this is not a major determinant [
40]. Thus, other regulatory mechanisms must be involved. For example, on the basis of analysis of the opossum genome, Mikkelsen et al [
41] concluded that most of the genomic diversity between marsupials and placental mammals comes from non-coding sequences, arising from sequence inserted by transposable elements.
Sharp et al. [
42] proposed models for evolution of the WAP gene in the mammalian lineage either through exon loss from an ancient ancestor or by rapid evolution via exon shuffling, whereas a functional WAP gene has been lost in humans, cattle and goats.
The question remains however to know whether polymorphisms of milk proteins is larger between mice inbred strains than between breeds of ruminants for example?