For a given HLA gene locus, let {*m*_{1}, *m*_{2}, ..., *m*_{N}} denote a set of MHC alleles, with each allele associated with a genotypic frequency *G*(*m*_{i}) for a population or ethnic group. To account for 100% of alleles of a given locus, the total genotypic frequency (∑*G*(*m*_{i})) should add up to 1. If ∑*G*(*m*_{i}) is less than 1, an unidentified HLA allele with a genotypic frequency equal to the residual (1 - ∑*G*(*m*_{i})) is added to the locus. If ∑*G*(*m*_{i}) is greater than 1, the genotypic frequency of each *m*_{i }allele of the locus is scaled down proportionately by dividing the frequency by ∑*G*(*m*_{i}). Next, let {*e*_{1}, *e*_{2}, ..., *e*_{K}} denote a set of epitopes with known MHC binding or restriction data. For each epitope *e*_{k}, its restriction to an MHC allele *m*_{i}, *e*_{k}(*m*_{i}), is defined as followed:

First, for each MHC allele (*m*_{i}), a total number of epitope "hits", *H*(*m*_{i}), was tabulated by adding the number of epitopes that are restricted to (or bound by) *m*_{i}:

Next, for each possible diploid MHC combination (*m*_{i}, *m*_{j}), a phenotypic frequency *F*(*m*_{i}, *m*_{j}) was calculated based on individual allele genotypic frequency:

*F*(*m*_{i}, *m*_{j}) = *G*(*m*_{i}) × *G*(*m*_{j}) (3)

For *n *MHC types, this corresponds to an *n *× *n *tabulation of the phenotypic frequency at which each specific pair of MHCs will be found in the population from which the MHC frequencies were derived. A similar table was also generated to contain the number of epitope hits per each of the MHC combinations *H*(*m*_{i}, *m*_{j}). In the case of heterozygous combinations, *H*(*m*_{i}, *m*_{j}) was calculated as the sum of the number of epitope hits associated with each of the two alleles, *H*(*m*_{i}) + *H*(*m*_{j}). This is because *m*_{i }and *m*_{j }are two different alleles, and therefore the number of epitope hits recognized by each allele in the combination is independent of each other. However, in the case of homozygous combinations which contain two identical alleles, the number of epitope hits was the same as the number of epitope hits of the given allele:

Based on the calculated *F*(*m*_{i}, *m*_{j}) and *H*(*m*_{i}, *m*_{j}) tables, a frequency distribution was assembled by tabulating the phenotypic frequencies of all MHC combinations associated with a certain number of epitope/HLA combination hits (*h*):

where

is an indicator function.

For calculation of coverage by epitope sets restricted to MHC alleles of multiple *k *different loci, a combined frequency distribution (*P*) as a function of epitope/HLA combination hits (*n*) was generated by merging *k *separate frequency distributions. This merging procedure is based on the assumption that linkages between MHC loci are in equilibrium, and was done as follows:

where

is an indicator function, and

*F*_{i}(

*h*_{i}) is a phenotypic frequency associated with

*h*_{i }epitope/HLA combination hits of locus

*i *calculated from equation 5.

The population coverage (*C*) or fraction of individuals projected to respond to the epitope set was then calculated as the sum of the combined phenotypic frequencies associated with at least one epitope hit/HLA combination:

Based on equation 6, a histogram was generated to summarize the fraction of population coverage (*P*) as a function of the number of HLA/epitope combinations (*n*) recognized. A cumulative population coverage distribution frequency (*Y*) as a function of the number of HLA/epitope combinations (*n*) was also calculated:

From this cumulative population coverage distribution of the whole epitope set, *PC*90, defined as the minimum number of epitope/HLA combination hits (*n*) recognized by 90% of the population, was determined as follow:

where

*Y*(

*n*) ≥ 0.9 >

*Y*(

*n *+ 1). Because)

*PC*90 was determined by data interpolation, it can be of any positive decimal value. Based on equation 9, if the population coverage is less than 90% or

,

*PC*90 will be less than 1.

Additionally, the average number of epitope/HLA combination hits (A) recognized by the population is a weighted average and was calculated as follow: