This paper explores the genetic basis of MS pathogenesis through the lens of a mathematical model of genetic susceptibility and a critical analysis of the currently available epidemiological information about this illness. This is not to downplay the importance of environmental factors in disease pathogenesis. Indeed, these factors were the principal focus of previous work [10
]. Rather, the focus of this paper is on the attempt to understand, not the genes that lead to MS susceptibility but, rather, the basis and importance of genetic susceptibility to this illness.
Several results seem particularly noteworthy. In earlier publications, the critical environmental factors have been suggested to be "population wide " exposures [10
]. Intriguingly, a similar conclusion can be reached by a mathematical analysis, in which that these environmental exposures (whatever they are) can be shown to be extremely common events [10
]. By contrast, the genetics of MS seems to be of critical importance with regard to disease pathogenesis. Thus, the analysis presented in this manuscript demonstrates that the large majority of individuals who develop MS (possibly all) must have, in part, a genetic basis for their disease. Moreover, to underscore the importance of genetic susceptibility to disease pathogenesis, the mathematical analysis of the present manuscript demonstrates that, under any circumstance, only a tiny fraction of the general population (<2.2%) is genetically susceptible to getting this illness. Finally, the derived model demonstrates that the possibilities for the number of susceptibility loci (and the number of involved loci necessary to confer that susceptibility) are quite limited (Tables , , , , , , , and ). Indeed, it seems that genetic susceptibility is, by far, the most important factor in disease pathogenesis. Thus, whereas environmental factors (while necessary) are very common population-wide events [10
], only a very small fraction of the general population are genetically capable of getting the disease, regardless of what occurs to them during life. This conclusion is not altered at all by the recent report of Baranzini and co-workers [31
], which reported that there were no genetic or epigenetic differences between elderly monozygotic twins who were clearly discordant for MS. This finding is anticipated. Even if everyone who developed MS had to be genetically susceptible (and as both MZ twins will be if one has MS), the expected concordance rate in MZ twins is still only 25% in the northern North America and northern Europe. This study only serves to underscores the importance of an environmental contribution to MS pathogenesis - a conclusion that was clearly evident decades prior to this publication [2
A previous paper also explored a mathematical model of MS Genetics in order to determine both the number of risk alleles required and their allelic frequency [32
]. In this paper, the author approached the problem by using observed value for the lifetime risk of MS [P(MS)] and the monozygotic-twin concordance risk (CRMZ
), together with different numbers of risk alleles with different frequencies, to predict the recurrence risk in both first and second degree relatives of an MS proband. Conceptually, this author's view of MS susceptibility is that MS risk increases either with the number of risk alleles in an additive or a multiplicative manner or as a step function where (in the case of 10 total alleles) the risk for 6 or fewer alleles was 0 and the risk for 7 or more alleles was 0.25 (i.e., the CRMZ
). His conclusion was that the best fit with existing data was for autosomal dominant models having either a strong interaction between the different loci, in which risk increases rapidly with each additional disease allele or, better yet, a step function [32
]. When 10 loci are present, the allelic frequency of the susceptibility alleles was calculated to be (0.15 - 0.31). The author then explored the impact of changing the number of presumed risk alleles (in models using a step function where the step occurred at 100%, 67% and 33% of the number of alleles). In each case, the allelic frequency was adjusted so that the predicted population probability of MS [P(MS)] was always 0.2%. Using this strategy, the author reported that four different models fit the observed recurrence risk data. Thus, dominant models in which 100% of 6 risk alleles were required for susceptibility, dominant models with 9 or more risk alleles and 67% required, dominant models with 15 or more risk alleles and 33% required, and a recessive model in which 100% of 2 or 3 risk alleles were required, all fit the data. Moreover, using this model, the author found no upper limit to the number of alleles possible although, with an increasing number, the allelic frequencies increased toward 1.0 [32
In many ways this earlier model [32
] is a subset of the model proposed here. Thus, following the argument in Section 1, this model would still require there to be a total of (n
) susceptibility loci to be in a susceptible state in order for an individual to be genetically susceptible. Moreover, the notion of a step function (going suddenly from non-susceptibility to susceptibility once a certain number of risk alleles are present) is common to both schemes. However, in the previous model, this step function was inferred from the "closeness of fit" analysis of the predicted recurrence rates whereas, in the present paper, it is used as a convenience to describe a much more complicated and interactive underlying susceptibility structure (see Section 1). In addition, there is no provision in the previous model for a possible difference in penetrance of different combinations of risk alleles. Neither of these differences, however, is critical and, in fact, they are both likely to be of no importance whatsoever.
Rather, the critical difference between the models is that the previous model is unbounded precisely because it is has not been tied securely to the epidemiological realities of MS and because of its conceptualization of susceptibility seems biologically unlikely. For example, the author concludes that one possibility is a recessive model in which 2 or 3 risk alleles are present and all are required. Clearly, however, such a model is untenable. First, the HLA DRB1*1501 allele is an established risk allele for MS and it is known to be "dominant" in sense that both heterozygous and homozygous states confer susceptibility [12
]. Consequently, not all risk alleles can be recessive. Second, if a 100% of anything was required, then every patient with MS would be in a susceptible allelic state at the HLA DRB1 locus, a circumstance which is claimed by no one. Thus, the previous model has failed to incorporate the known epidemiological information about the only allele (HLA DRB1*1501), which has been securely linked to MS susceptibility. Third, and most importantly, the cutoffs of 100%, 67% and 33% for the step function are arbitrary. In and of itself, the use of arbitrary cut-points makes little difference and, as the author states (somewhat confusingly), other cut-points were also explored. Rather, it is the use of such cut-points, in the first place, that indicates a fundamental conceptual difference in the nature of MS susceptibility between the two approaches. Thus, in the previous model [32
], MS susceptibility is held, not to be the result of an individual possessing a specific combination of susceptibility alleles, but rather, to be the result of possessing a certain fraction of the total number such alleles. Naturally, in such a circumstance, because the total number of possible risk alleles is unbounded (other than by the entire genome), so too is the number of alleles required for MS susceptibility. Naturally, also, the allelic frequency of these putative susceptibility alleles increases as their number increases in order to maintain the population prevalence at 0.2%.
By contrast, in the present paper, the genetic susceptibility to developing MS is conceptualized as occurring when, from amongst a total of (x
+ 1) susceptibility loci (haplotypes) spread throughout the genome, an individual possesses a specific combination of some of these loci, each of which is in a specific susceptible state. Although, at each locus, there may be more than one "susceptibility" gene and, for any gene, there may be more than one "susceptibility" allele, the net effect of the interaction of these different genes and different alleles at a particular locus is presumed to put this locus into a "susceptible" state or not. Moreover, different specific combinations of different numbers of these susceptibility loci each having specific "susceptibility genotypes" are envisioned to produce susceptibility and the entire set of such "susceptible" genetic combinations is taken to define the subset of individuals in the general population who could potentially get MS in the right environmental circumstances. If an individual does not possess one of these susceptible genetic combinations, then they cannot get MS regardless of what environmental events they experience in their lives. Alternatively, of course, it is possible that only some (but not all) of MS is genetic (in the sense described above) and that some individuals may get this illness through a purely environmental mechanism. Nevertheless, the evidence (such as it exists) suggests that the vast majority (and likely all) cases of MS are the result of a genetically susceptible individual experiencing a sufficient (but complex) environmental exposure, which includes multiple different events occurring at different times during their life [10
]. The available evidence also suggests strongly that genetic susceptibility to MS is a rare occurrence. Thus, as noted above, only 2.2% or less of the general population is susceptible to getting MS. Even among individuals who carry the HLA DRB1*1501 allele, the probability of being susceptible to getting MS is still only ~2.6% (Additional File 1
; Appendix S1; Section 4).
As discussed in the Introduction, this conceptualization reflects a binary view of genetic susceptibility. This is not, however, a fundamental assumption of the model. Rather, the binary nature of model is a consequence of the concept of susceptibility. For example, if everyone is genetically susceptible, then the influence of genetic factors is to alter the penetrance of the different genotypes. By contrast, if some individual are not susceptible while others are, then susceptibility is binary, not because of the model but because of the nature of susceptibility. Importantly, in the model, even though susceptibility was conceived as binary, this was not forced into the final result. Thus the term P(G) was unconstrained and could have been 100%. The limit of [P(G) ≤ 2.2%] was set by the constraint of epidemiological observations - not by a constraint inherent to the model.
Placed into this context, the model derived in this manuscript provides considerable insight into the nature of the genetic basis for MS. Indeed, the current epidemiological observations of (h = 0.24; hm
= 0.55; P(MS) = 0.0015; and Pt0
= 0.25), suggest that the upper limit for the average number of susceptibility loci (n
) that need to be in a susceptible state for an individual to be susceptible to getting MS is (11 ≤ n
≤ 18). Moreover, the total number of non-HLA DRB1 loci (x
) that contribute to susceptibility seems to be between 50 and 200, and that the frequency of susceptibility at these loci is approximately (h/r ≤ 0.12) or (r
≥ 2). In fact, the genetic configuration that best fits these epidemiological observations, the current prevalence estimates, and the concordance data for non-twin siblings, parents and children, children of conjugal MS couples, second degree relatives, and third degree relatives occurs when 80% of the loci are recessive and at (x
= 100-107; n
= 13; and r
= 4). Indeed, the prevalence and recurrence risks predicted by these particular values match the actual epidemiological observations quite closely (Table ). It is of note that these predicted recurrence risks have been calculated using a penetrance estimate that has been down-weighted from the identical-twin concordance rate because of the apparently important influence of the shared intra-uterine or early post-natal environment [10
]. If an unadjusted penetrance had been used, all of the estimated recurrence rates would have been approximately double and none of models (recessive, dominant, or mixed) would have provided a good fit with the actual epidemiological data. Such a finding, independently, tends to validate the importance of the intra-uterine and/or early post-natal environment in MS pathogenesis.
It also seems likely that either the large majority of the (x
) susceptibility loci must be "recessive" (in the sense described in the Additional File 1
; Appendix S1) or there must be more than one susceptibility gene present at each susceptibility locus and that these genes must combine in such a way that only a small fraction of the possible combinations produce a susceptible state at the locus (Additional File 1
; Appendix S1). There are three reasons for this conclusion. First, and most important, the predicted recurrence risk for MS in siblings for a single dominant gene (even one with multiple different susceptibility alleles) seems too high to explain the epidemiological observations (Tables , and ). Second, the optimal fit for the predicted with the observed data occurs when only 20% of the loci are assumed to be "dominant" (Table ). Third, the observed odds ratios (OR = 1.1 - 1.3) for different candidate genes at non-HLA DRB1 loci in genome-wide association studies [12
] seems too small to be easily explained by the alterations of the parameters of (h), (hm
), and (Pt1
) for "dominant" alleles. In addition, altering these parameters generally results in a Closeness of Fit estimates, which are both too high and worse compared to the estimate using the observed parameter values of (h = 0.24), (hm
= 0.55), and (Pt1
= 0.25). This last piece of evidence, however, may not make a compelling argument because the odds ratio can also be markedly affected by the use of single SNPs to identify alleles. Thus, depending upon the exact nature of the relationship between the state of the DNA at the SNP location and the polymorphic alleles of any particular susceptibility gene, the observed odds ratio (even for dominant alleles) can be dramatically reduced (see Additional File 1
; Appendix S1; Section 5).
One difficulty with the use of genome-wide association screens to identify susceptibility loci is that, due to multiple statistical comparisons and random sampling error, all such screens will be quite susceptible to both the false positive and the false negative identification of loci. If the bar for association is set too low, false positives will greatly outnumber false negative identifications. By contrast, if the bar is set too high, false negatives will greatly outnumber false positive identifications. Compounding the difficulties of sorting out false positive and false negative identifications, is the fact that the distinction between a true susceptibility locus and a disease-modifying locus will be problematic. Thus, although only (x + 1) susceptibility loci are present in the entire genome, there may be many other loci that can modify the clinical expression of MS by either by changing the actual penetrance of MS in susceptible individuals or by changing the apparent penetrance, for example, by altering the disease severity or the phenotype of the illness.
Regardless of the mechanism however, on a genome-wide association screen, any locus that has such an effect on penetrance (real or otherwise) will appear to be positively or negatively associated with the illness. For example, if the presence of a particular allele of a particular gene (not involved in MS susceptibility) doubled the penetrance of MS for all susceptible combinations, the odds ratio for an association of this allele with MS would be (2.0) and highly significant, despite the fact that this allele would not be a "susceptibility" allele and the locus that harbored this allele would not be a "susceptibility" locus in the sense defined earlier (i.e., this would be a false association). Moreover, because the model places no constraints on the possible number of these disease-modifying loci, many of the observed associations (even highly significant and/or well replicated ones) may have a substantial probability of representing a false association with the genetic susceptibility to MS. Consequently, because susceptibility loci and disease modifying loci will be identified equally well by genome-wide screens, unraveling the two will not be possible using this approach. One possible method for establishing that an MS-associated allele was a true susceptibility allele (e.g., the HLA DRB1*1501 allele) would be to demonstrate that it doesn't alter the penetrance sufficiently to account for the observed odds ratio on genome-wide screens (e.g., Table ). For most associations, however, such a method will be difficult both because the available identical-twin data to assess penetrance differences is limited and because the observed odds ratios for candidate genes are typically small [15
Considering the results of several genome-wide association screens [2
], it has been relatively easy to identify the HLA DRB1 locus (haplotype) in general, and the 1501 allele in particular, as associated with MS. In addition, the observed odds ratio for an association of this chromosomal region with MS has been much larger (and much more consistent) compared to other potential candidate loci [12
]. Indeed, the strength and uniqueness of this association has led many investigators to conclude that genetic variation within this chromosomal region is principally responsible for genetic susceptibility to MS [2
]. Consideration of the model proposed here and some of the observations made from it, however, might be taken to raise questions about such a conclusion. First, the HLA DRB1 locus (haplotype) seems to be only one among a hundred or more loci that are involved in MS susceptibility. Second, although the frequency of having at least one copy of the HLA DRB1*1501 allele in the general population is approximately four times the frequency of susceptibility at non-HLA DRB1 loci, the penetrance of susceptible genotypes that include this allele is no different from those that don't (Table ). Third, although the number of other susceptibility loci that need to be involved is smaller when this allele is present, the actual difference is less than 1 locus (Table ). In circumstances where a genetically susceptible genotype requires involvement of 11-18 total loci (Table ), this difference seems negligible. Fourth, almost a half of the genetically susceptible individuals, lack this allele entirely. Moreover, only a small fraction of those individuals who carry this allele (≤ 5.2%) are even susceptible to getting this MS in the first place (Additional File 1
; Appendix S1; Section 4). In this context, the apparent predominance of the HLA DRB1*1501 allele in MS pathogenesis seems likely related to three factors (see Additional File 1
; Appendix S1; Section 5). First, this allele is one of the uncommon dominant susceptibility alleles and these have greater associated odds ratios than recessive alleles. Second, susceptible genotypes including this allele have a slightly smaller number of involved loci compared to those genotypes without it, a circumstance that will inflate the observed odds ratio for the HLA locus but not for the non-HLA loci. And, third, the use of SNPs to represent the allelic structure of the genome will markedly reduce the observed odds ratio for many (possibly most) true susceptibility non-HLA loci regardless of whether they are dominant or recessive.
This might also help to explain the observation that some of the identified SNPs have relative allelic frequencies (RAFs) for some of the identified susceptibility loci, which are unexpectedly high [13
]. For example the interleukin 7 receptor (ILR7) gene using SNP (rs6897932) has an RAF of (0.75), whereas the IL2 receptor alpha (IL2RA) gene using SNPs (rs12722489 and rs2104286) has RAFs of (0.85) and (0.75) respectively. Several phenomenon may account for this apparent paradox. First, if any of the these SNPs tagged more than one allele (see Additional File 1
; Appendix S1; Section 5), this would increase the "apparent" allelic frequency of the true susceptibility allele. Indeed, the occurrence of different RAFs for the same locus (as is seen above for IL2RA gene) presumably indicates that the allelic structure is not simple even though, in this case, the difference is quite small. Second, even if the SNP is located in the coding region of a particular gene and is known to cause a functional change in the coded protein by introducing a stop codon or a non-synonymous amino acid substitution, altering splice sites, or changing the binding characteristics of regulatory molecules (e.g. 33, 34), this does not prove either that this functional change is what caused susceptibility or that this gene is involved with susceptibility. Even in this circumstance, the association only identifies the region of the genome wherein susceptibility resides. Third, even in the circumstance where the SNP association has identified the correct gene and causes a functional change in the coded protein, this still falls short of proving causation. It could be that a second alteration in this gene, together with the identified SNP, identifies the true susceptibility allele (see Additional File 1
; Appendix S1; Section 5). And fourth, the method of genome-wide association screening is set-up to identify associations with SNPs of high frequency (i.e., major alleles) and to ignore minor alleles.
Indeed, the fact that the genes identified to date (with the exception of the HLA DRB1 locus produce such low ORs [2
], especially in the circumstance where the genetics of MS is, by far, the most important contributor to disease incidence, makes it seem likely that some, perhaps all, of these physiologic mechanisms are occurring. Clearly, we have a long way to go to understand the specifics of MS susceptibility. Nevertheless, gaining some insight to the number of susceptibility loci involved, the number of loci needed to be in a susceptible state, and the average frequency of susceptibility at these susceptibility loci represents progress.