|Home | About | Journals | Submit | Contact Us | Français|
Because clinical trials to assess the efficacy of vaccines against anthrax are not ethical or feasible, licensure for new anthrax vaccines will likely involve the Food and Drug Administration’s “Animal Rule,” a set of regulations that allow approval of products based on efficacy data only in animals combined with immunogenicity and safety data in animals and humans. US government sponsored animal studies have shown anthrax vaccine efficacy in a variety of settings. We examined data from 21 of those studies to determine if an immunological bridge based on lethal toxin neutralization activity assay (TNA) can predict survival against an inhalation anthrax challenge within and across species and genera. The 21 studies were classified into 11 different settings, each of which had the same animal species, vaccine type and formulation, vaccination schedule, time of TNA measurement, and challenge time. Logistic regression models determined the contribution of vaccine dilution dose and TNA on prediction of survival. For most settings, logistic models using only TNA explained more than 75% of the survival effect of the models with dose additionally included. Cross species survival predictions using TNA were compared to the actual survival and shown to have good agreement (Cohen’s κ ranged from 0.55 to 0.78). In one study design, cynomolgus macaque data predicted 78.6% survival in rhesus macaques (actual survival 83.0%) and 72.6% in rabbits (actual survival, 64.6%). These data add support for the use of TNA as an immunological bridge between species to extrapolate data in animals to predict anthrax vaccine effectiveness in humans.
Medical countermeasures against potential bioterrorism threats cannot be evaluated for efficacy in humans by traditional clinical trials. A set of regulations known as the Food and Drug Administration (FDA) “Animal Rule” [1, 2] may be used to approve such countermeasures by demonstration of efficacy in animals. The application of the Animal Rule requires a reasonably well-understood biological mechanism of action that includes prediction of efficacy in humans based on dose-dependent immune response and protection relationship in animals. No vaccines have been approved using the Animal Rule, but several are being developed that will require use of the Animal Rule.
Within the same study, a traditional approach for establishing a correlate of protection in vaccines is to use a cutoff of the proposed immune response, such that nearly all subjects that reach that cutoff are assumed protected . Recent statistical methods have been developed that use the entire distribution of the immune response to predict protection [4, 5]. Further theoretical work has more precisely defined what we can say about the causal nature of the proposed correlate and how that correlate may or may not be used as a surrogate for protection [6–9]. In contrast to these methods developed for within-species studies, less work has been done on methods and efforts for bridging between species to predict protection in humans. Typically, once a vaccine has been shown effective in humans directly, the assumed relationship between human immunogenicity and animal models is no longer needed. For example, in the case of the whole-cell pertussis vaccines, a relationship was established between a mouse intracerebral challenge potency test and clinical efficacy , but the mouse potency test is not considered a model of human disease and protection and would not be assumed to predict human efficacy when testing a new vaccine. For vaccines untestable in humans there has been some discussion of bridging between efficacy studies in non-human species and immunogenicity in humans (for example for Ebola virus vaccines  and anthrax vaccines).
Efforts by the US government to expand available medical countermeasures to anthrax considerably increased after Bacillus anthracis spores were sent via US mail in 2001. A 2002 US government-sponsored workshop  recommended the use of rabbits and nonhuman primates in animal models for anthrax aerosol challenge, and the use of aerosol challenge doses that could occur in an anthrax attack. The data analyzed in this paper were generated on the basis of those recommendations.
New recombinant protective antigen (rPA) vaccines and the licensed anthrax vaccine BioThrax® [anthrax vaccine adsorbed (AVA)] were studied. Both vaccines rely largely on the protection afforded by immunological responses against the PA protein [14–20]. Many studies have shown the protective efficacy of PA based anthrax vaccines in several animal genera and species including guinea pigs [21–25], rabbits [26–29] and NHPs [20, 26, 30–33]. A few studies have gone further to evaluate correlate of protection levels based on antibody to PA for AVA in rabbits [28, 29] and for rPA in rabbits  and guinea pigs . Data sets now exist for multiple studies in multiple species using multiple vaccines. Additionally, quality assays are available to support the evaluation of meaningful endpoints across laboratories and in multiple species [34–37].
Here, we combine data from 21 US government-sponsored animal studies (15 of which are previously unpublished). The studies form an extensive series of nonclinical aerosol B. anthracis challenge experiments of AVA and different rPA candidates conducted in rabbits, rhesus macaques, and cynomolgus macaques. We assess the relationship of vaccine-induced antibody responses with survival and assess this relationship under different vaccine types, dilution doses, adjuvants, schedules, genus and species. Additionally, we use data from human immunogenicity studies to illustrate possible approaches to extrapolation from animal challenge model results to prediction of human protection .
The goal of this paper is threefold: (i) to explore the effect of vaccine-induced antibody response on survival in different animal model settings; (ii) to assess the role of vaccine dosage (such as antigen load) and antibody level within a specific species; and (iii) to determine whether it is meaningful to extrapolate the antibody protection relationship seen in animals to infer protection in humans. To achieve this threefold goal we examined antibody-survival relationships across genera and species, PA vaccine formulation (AVA or rPA), dose, adjuvant, time of immunological measurement, and vaccination schedule. For example with rabbits receiving two injections of adjuvanted rPA at various doses, we ask whether antibody levels usefully predict survival, and whether the vaccine dose has any additional impact on protection for fixed levels of antibody. If dose has little additional effect, it suggests that antibody levels alone may allow for reliable extrapolation. The third part is the most difficult. A formal statistical approach treating the effect from each species as a random draw from an assumed distribution has difficulty with precise predictions of survival in humans because we have data from only three non-human species. Fundamentally, extrapolation from animal genera to humans is not primarily a statistical issue but relies on judgment about how well the animal model recapitulates essential features of the infection, immune response, and protection processes in humans. We can indirectly address this issue by seeing how well a given animal species predicts survival in a different animal species or genus. If these cross-species predictions are reasonably accurate, this supports the proposition that they would be relevant to humans.
In this analysis, we combine data from US government anthrax studies in which a particular animal species was vaccinated at various dosages (different antigen levels and vaccine dilutions), measured for serum antibody response and challenged by aerosolized B. anthracis spores. Animals were monitored for survival and declared survived if they lived for at least 21 to 30 days (depending on the study) after challenge. A detailed listing of the included studies is given in Table 1. Six important aspects of the experiments change from study to study: (i) genus and species- the animal studies were conducted in two genera and three species, cynomolgus macaques (cynos, Macaca fascicularis), rhesus macaques (rhesus, Macaca mulatta), and New Zealand white rabbits (rabbits, Oryctolagus cuniculus); (ii) vaccines- studies used either the AVA vaccine or one or more of four rPA vaccines, where the rPA vaccines differ primarily by whether the PA protein was produced in Escherichia coli or B. anthracis; (iii) diluent- the rPA vaccines were diluted with either saline or adjuvant; (iv) the time at which the immune response was measured; (v) the day of challenge; and (vi) vaccine schedule. We partition the data from 21 studies listed in Table 1 into 11 different experimental settings such that each of those six important aspects is identical within a setting, but the vaccine dose varies within a setting. Note that for these data once we match on the first five aspects, we necessarily match on the sixth. This partitioning allows us to create a series of simple models, rather than creating a complicated single model, which would require choosing from among the many possible ways of controlling the effects of the first five aspects, all of which improve prediction of survival. Data from four complete studies and parts of two studies were not included in Table 1 for various reasons (the challenge happened before 4 weeks after the last vaccination; challenge day was not fixed within the study; or many different vaccine schedules were used within the study and some of those schedules did not match those of the existing settings). For tractability, differences besides those six aspects between the 21 included studies were not explicitly modeled. Animals were challenged with aerosolized B. anthracis Ames spores at target levels of 80 to 400 times the dose producing 50% death (LD50) for that genus and species. At such large challenge levels, differences between exposures are suspected to have little effect on survival[39, 40]. Settings 9 to 11 involving AVA in rhesus were focused on evaluating the duration of protection, while setting 5 was to evaluate a rabbit model for AVA. Among rabbit rPA experiments (settings 1 to 4) the role of vaccination adjuvant (1 and 3 versus 2 and 4), schedule (1 and 2 versus 3 and 4), and challenge time (1 and 2 versus 3 and 4) were all examined. Settings 6 and 7 examined the role of adjuvant for rPA vaccines in cynos, while setting 8 examined rPA vaccines in rhesus.
Because immune responses change over time, measurement of immune response at a similar time point post vaccination was used for comparisons between and within studies. Here, we chose 4 weeks after the final vaccination. This is a somewhat arbitrary choice because the vaccine schedules vary among the studies (see Table 1). The 4-week time was chosen because we have data close to that time point (within 2 weeks) for all the animal settings and additionally we have immunology measurements 4 weeks after the second vaccination for 334 humans vaccinated with AVA in the Centers for Disease Control and Prevention (CDC) Anthrax Vaccine Research Program clinical trial .
We begin by postulating that one particular immune response explains a substantial proportion of the survival of the animals in all the studies. If such an immunological response exists we will call it a correlate of protection. Because most if not all of the known correlates of protection for existing vaccines are related to antibody measurements , we postulate an antibody measurement for our correlate. In particular, we study antibodies to PA as measured by either a binding enzyme-linked immunosorbent assay (ELISA)  or an in vitro anthrax lethal toxin neutralization activity assay (TNA) [36, 37, 41]. For some vaccines the ELISA and TNA are not necessarily highly correlated , so it is necessary to study both measurements unless there is a very high correlation between them.
Figure S1 plots TNA versus ELISA in humans, and similar plots are given for NHPs (fig. S2) and rabbits (fig. S3). The assay results are highly correlated with correlation coefficient values of 0.94 for humans, 0.94 for rabbits, and 0.97 for NHPs. This level of correlation indicates that ELISA and TNA would work similarly in the models. In general, ELISA can detect measurements below the limit of detection of the TNA, and is less variable than the TNA. However, as a function-based assay, the TNA is considered to be species neutral, allowing direct comparison of neutralizing activity across species . Consequently only TNA responses will be evaluated here. This paper will focus on the ability of TNA to predict survival, and will not delve into the more nuanced issues of surrogacy and causality with respect to the TNA which are discussed in the Supplemental Appendix. Once an animal survives to 3 to 4 weeks after challenge, that animal is very unlikely to die from the challenge at a later time; therefore, the binary survival endpoint (that is, the animal either lived or died) we use is more appropriate than a time to death endpoint because the latter endpoint can emphasize unimportant differences in time to death during the first few days after challenge. We first examine the effect of TNA on survival using a simple logistic regression model:
where x is a log-transformed antibody response, and a and b are parameters to be estimated. If b > 0, then antibodies are positively related to protection, sometimes called a correlate of protection.
Figure 1 displays a curve with 95% confidence intervals (CIs) for each of the 11 settings under model 1 based on the estimated parameters a and b given in Table 2. The antibody level that results in 50% predicted survival (PA50) is also provided as a useful comparator, not to indicate a threshold upon which to base decisions (Figure 2). For almost all settings, there is a statistically significant relationship between antibody levels and survival [P<0.001 for all settings except setting 6 (P=0.011) and setting 9 (P=0.067)], indicating that increases in antibody levels at 4 weeks after the last vaccination increase the probability of survival.
The number of animals and doses, or more generally, the information content varies quite widely from setting to setting. Broadly speaking, settings 1 to 5 and 7 have narrower CIs for the prediction curve and the PA50 than the other settings. Tighter CIs are due primarily to large sample sizes and more complete coverage of the range of the curve for those settings. Setting 9 with 60 rhesus monkeys and four doses has the most uncertainty about the estimated slope and the antibody level that achieves 50% protection. The large uncertainty in setting 9 is due perhaps to the emphasis on highly diluted vaccine doses so that the antibody effect is harder to detect. Additionally, settings 9 to 11 completed a three dose vaccine priming series (0,1, and 6 months) and have a much longer time between our selected time of antibody measurement level and the challenge (12, 30 and 52 months).
The settings were not designed for direct comparisons, but they do demonstrate that TNA PA50 estimates vary across settings (Figure 2). We can use the conservative test that the PA50 values for two settings significantly differ if the associated 95% CIs fail to overlap. For example, settings 4 and 5 are both two-injection regimens in rabbits with a challenge 70 days after the first vaccination. The PA50 values were 84 and 353 respectively with non-overlapping 95% CIs. These settings differ in terms of the vaccine administered (rPA versus AVA) and were done by different organizations at different times. Thus, different methods of attaining a fixed level of TNA can result in different PA50 values, although precisely what is causing this difference in protection is uncertain. Settings 1 and 3 evaluate rPA with adjuvant diluent in rabbits under one and two vaccinations with a challenge at 28 and 70 days, respectively. The PA50 values are 31 and 134, respectively, with non-overlapping CIs indicating that the settings differ. This might be due to the number of vaccinations, time of measurement, later challenge time, or something else. Thus, a fixed amount of TNA can have a different impact on survival depending on how and when it was achieved. Species-level overall estimates of PA50 are given in Figure 2. The overall rabbit and rhesus PA50 values are similar, while the overall cynos PA50 is smaller but is estimated from only two settings.
Each dose group and the common control within each setting can be viewed as a very small trial. Thus, we can estimate a vaccine efficacy (VE) for each dose group with substantial uncertainty. VE is estimated in the usual way as follows:
VE can be estimated directly from the proportion of deaths in the vaccinated and control. When all placebo animals die, then the VE is simply the percent survival in the group given vaccine.
Figure 3 plots the mean TNA and estimated VE, with a 95% CI, for each dose group within each setting. For example, at the left part of the display, there are three orange circles bisecting solid vertical lines corresponding to the three (non-placebo) dose groups of setting 1. We see that as the mean TNA increases, the VE also increases. In many of the dose groups the lower bound of the 95% CI is larger than 0 indicating a significant VE.
Ideally, we want to combine the VE estimates to examine the relationship of TNA to survival within species as suggested by Daniels and Hughes , to see whether achieved antibody as measured by TNA is a substitute for survival. To do this we regress the empirical logit of the estimated VE on the average immune response for a single dose group using least squares regression, that is, using the following model:
where a and b are parameters to be estimated, and gmean (TNA) is the geometric mean TNA immune response for a specific dose group within a specific setting. Separate curves are estimated separately for each species. These logistic curves are given in Figure 3. Although all three curves show an increasing survival with increasing mean TNA, when we test for statistical significance for predicting survival by this method we have low power because all effects of TNA within a (noncontrol) dose group are modeled to act only through the geometric mean for the animals in that dose group. Additionally, this is a crude approach because it mixes data from different settings within a species. Nevertheless, we find a significant protective TNA effect on survival for rabbits by this method (rabbits two-sided p=0.014) and nonsignificant trends toward protective effects for TNA for the monkeys (cynos p=0.14, rhesus p=0.28). The rest of this paper will focus on the more sophisticated approach that divides the studies into settings based on the study designs.
In the previous section, we showed that TNA can predict survival (Fig. 1), and that most positive doses have significant vaccine effects which appear to be related to average TNA value within a dose group (Figure 3). In this section, we examine whether dose has any effects on survival beyond its effects on TNA. To isolate the combined effects of antibody and dose on protection, we expand model 1 to allow increasing flexibility for the effect of dose in each of the 11 settings:
where x and d are the log10 antibody response and log10 dose for a selected animal [for dose=0 we used log10 of one half of the smallest dose, that is, log10(0.005) for rPA and log10(l/512) for AVA], and i indexes dose group within a setting. In model 1 animals that achieve an immune response x have the same probability of survival no matter what dose was used. Model 2 allows for an additional smooth effect of dose so that an animal that achieves x with a high dose could have a somewhat greater probability of survival than an animal who achieves x through a low dose. Model 3 allows each dose to have its own relationship of immune response to survival, a very flexible approach. Visually oriented readers can see examples of models 1 and 2 in Fig. 4. Model 3 is the same as repeatedly using model 1 on each dose group.
Within a setting, the main question of interest is the extent to which the probability of survival differs for an antibody level of x achieved from a dose d compared to an antibody level of x achieved from a dose d′. Figure 4 displays model 2 for setting 3 where 130 rabbits in nine dose groups received two injections of an rPA vaccine with saline diluent. Recall that in model 2, we require dose to have a smooth effect on survival beyond the effect of antibody. Here we have a strong effect of TNA on the probability of survival with the additional effect of dose (apart from its effect via TNA) being insignificant (P=.18), where higher (lower) doses have estimated survival slightly higher (lower) than predicted from model 1 which solely uses TNA. For example, the very top dark orange curve gives the estimated probability of survival as a function of TNA achieved by dose=10. Visually, this curve gives only slightly higher probability of survival compared to the overall black curve that ignores dose, and most of the effect of the large dose is explained by the resulting increase in TNA. Other dose-specific curves are close to the overall black curve as well reflecting the irrelevance of dose at a fixed level of antibody.
Although the difference between model 1 and model 2 for setting 3 is not significant, with increasing numbers of animals we could potentially achieve statistical significance for an effect that is perhaps unimportant. Thus, we need some statistic to quantify the extent to which dose has an additional predictive effect beyond antibody within settings. To do this we introduce the statistic called percent of prediction explained (PPE). The coefficient of discrimination (CoD) is an R squared measure for logistic regression, defined as the difference in average survival probabilities for those who live minus the average survival probabilities for those who die . This ranges from 0 (when the model is useless) to 1 (for example, when all animals with x > some threshold T live, and all those with x < T die). Because we are comparing CoD from different models and we do not want the models with more parameters to have an unfair advantage, we use an adjusted CoD (CoDa), which we define analogously to the adjusted R squared value: CoDa = 1−[(n−1)/(n−p)](1−CoD), where n is the number of animals and p is the number of estimable parameters. To describe the improvement in prediction with the addition of dose to immune response, we form the percent prediction (of model 3) explained (by model 1) as follows:
The PPE attempts to describe the relative impact of the addition of dose to a model using antibody alone to predict survival. We bound the PPE so that it ranges from 0 (if antibody has no effect) to 100 (if dose has no effect given antibody). We can test whether PPE differs significantly from 100 by seeing if the upper 95% confidence limit excludes 100, and this is equivalent to testing whether model 1 differs from model 3. This testing strategy is consistent with that of Freedman et al  and Buyse and Molenberghs  for examination of the surrogacy of a candidate x. Similar analyses were done comparing model 1 to model 2.
We explain the PPE for setting 3 in detail. In this setting, model 3 fits model 1 within each of the nine dose groups thus allows for 9 completely independent curves. It is thus geared to predict quite well. In setting 3, 84 of the rabbits survived and more than 50% of these survivors had a predicted probability of survival greater than 0.90 using model 3. Conversely 46 of the rabbits died and more than 30% of them had predicted probabilities of survival less than 0.10 using model 3. Overall we see that the rabbits that survived generally had quite high probabilities of survival and that animals that perished had quite low probabilities of survival, reflecting the strong predictive ability of antibody for model 3. However, model 1, although much simpler, does fairly well in predicting survival. For model 1 (3), the mean probability of survival is 0.812 (0.853) for rabbits that ultimately lived. For model 1 (3), the mean probability of death is 0.657 (0.731) for rabbits that ultimately died. The CoDs are 0.469=[0.812 − (1−0.657)] and 0.584=[0.853 −(1−0.731)] for models 1 and 3, respectively. However, model 1 only has only 2 estimable parameters while model 3 has 16 [2 for each dose group, except 1 for the 2 smallest dose groups where each animal has the same level (undetectable) of TNA], so the CoDa values are as follows: 0.465 = 1− (129/128)(1−0.469) and 0.529 = 1−(129/114)(1−0.584). Thus the proportion of the model 3 prediction explained by model 1, or PPE is 100(0.465/0.529) which gives 87.9% in Table 2. The 95% CI for the PPE is 59.1 to 94.4%. Thus, 87.9% significantly differs from 100%, and dose provides a statistically significant though modest additional benefit beyond antibody alone in terms of predicting survival using the very flexible model 3. The important point for this analysis is not the statistically significant additional benefit, but the fact that 87.9% of the possible predictability with dose included was already explained by the TNA before dose is added to the model.
Table 2 evaluates the additional benefit of incorporating dose in addition to TNA within each setting by comparing the simple model 1 to more complex models using PPEs. When comparing model 3 to model 1,10 of 11 settings have PPE estimates more than 75%. Looking at the lower 95% confidence limits, we see that for most of the settings (6 of 11), we are 95% sure that the PPE is greater than 50%. When we repeat this exercise using model 1 versus model 2 the PPEs are similar with all estimated PPEs greater than 75%, with only three settings with lower 95% limits less than 50%. In two settings the upper limit is different from 100%, suggesting that dose, as used in model 2, is not always useless in predicting response once antibody is known; however, in general the magnitude of the improvement from model 1 to model 2 is quite modest with 8 of 11 of the estimated PPEs exceeding 95%.
Comparing both models 3 and 2 to model 1 via the PPEs provides complementary information. Model 3 is extremely flexible but because of this flexibility may over fit the data, leading to PPEs that are less reliably estimated. Model 2 is quite parsimonious and stringent, requiring only one parameter to explain the additional smooth effect of dose. This leads to PPEs that are more reliably estimated. In either comparison, the PPEs suggest that a substantial fraction of the effect of the dose of vaccine is captured by the immune response. This suggests that within a specific setting of species and formulation, model 1 may accurately predict survival for a new dose.
Because humans cannot be challenged with inhalation anthrax, we cannot directly evaluate how an animal-derived statistical model applies to humans. As a proxy, we evaluate how a statistical model derived from one species predicts survival in another species for a specific vaccine regimen. If these cross-species predictions are accurate, we have more confidence in extrapolating from the animal models to humans.
Two groups of settings have similar designs except species: Settings 3, 6, and 8 all have a day 0 and day 28 vaccine schedule with rPA vaccines, diluted with adjuvant, and challenge at day 70, while settings 4 and 7 have similar designs except they are diluted with saline. We compare the effect of TNA on survival between the species in two different ways.
First, we test for differences in the PA50 values. Consider the group with vaccine diluted with adjuvant first. The PA50 for setting 3 (rabbits) is 134, that for setting 6 (cynos) is 28, and that for setting 8 (rhesus) is 42. The ratio of the PA50 for setting 3 over the PA50 for setting 6 is 4.86 (95% CI, 0.28 to 3785; P=0.22). The corresponding ratios for settings 3 to 8 and 6 to 8 are 3.20 (95% CI, 0.28 to 17.3; P=0.22) and 0.66 (95% CI, 0.0003 to 15.6; P=0.79), respectively. The group with vaccine diluted with saline has a PA50 ratio (setting 4/setting 7) of 2.79 (95% CI, 1.35 to 5.80; P=0.01). Only the last ratio denotes species having PA50 values that are statistically significantly different from each other. For these analyses, the important issue is not if there are statistically significant differences between species, but how large that difference is. If two species are statistically significantly different, but the difference is small, then useful cross-species predictions can still be made. The wide CIs for the other ratios reflect the relatively poor statistical power in these data.
A way to focus on the practical differences between species is via prediction. We thus plug in the cyno immune responses into model 1 with a and b estimated from rabbits, and average the associated probabilities of survival. This rabbit-to-cyno predicted average can be compared to the proportion of cynos who survived challenge. Additionally, we can see how well the binary predicted survival (yes/no) for each cyno agrees with the actual survival using Cohen’s κ, an agreement coefficient that corrects for chance agreement . Cohen’s κ ranges from −1 to 1, with 0 indicating agreement is no different from chance, and 1 indicating perfect agreement. For determining agreement, animals with a predicted probability of survival greater than 0.5 were considered to predict survival. In general, κ values can be classified as fair (0.21 to 0.40), moderate (0.41 to 0.60), or substantial (0.61 to 0.80) .
Table 3 gives the results of the second type of cross-species protection estimates. Cross-species estimates were performed on all cases where there were two or more settings that differ only by species. In each of these cases, protection for human was also estimated, although in the human case there are other setting differences in addition to species. Figure 5 illustrates the cross-species protection calculation of the first row of Table 3 where a model estimated in the 130 rabbits of setting 3 is applied to the 29 cynos of setting 6. The actual TNAs achieved by the cynos are transformed into estimated probabilities of survival using the rabbit prediction equation and averaged to 70.1%.
Consider settings 4 and 7. In setting 4, rabbits predict a low percent of cynos surviving (31.4%) and the actual surviving percent is 46.3%. The agreement between the observed and the binary predicted survival in the cynos is substantial (κ=0.61) and significantly different from chance. The cynos in setting 7 predict 71.6% of rabbits surviving, and the actual surviving percent is 59.4% with moderate agreement (κ=0.59). Some of the particular rows of Table 3 may not have sufficient power to show statistically significant agreement (some CIs for κ have the lower 95% limit equal to 0), but when we take the average κ coefficient for all of Table 3, we get substantial agreement (average κ=0.63; 95% CI by nonparametric bootstrap 0.32 to 0.74).
Although humans received AVA vaccines and the animal settings in Table 3 are for rPA vaccines, we also perform the calculation to predict human protection based on the animal models. In practice we would want animals and humans to use the same vaccine formulation, so these calculations are more for illustration. In these predictions for humans we use TNA 4 weeks after the second vaccination because this matches the timing and schedules in the animals. The predicted survivals range from 54 to 84% with higher predictions from the NHP models.
This paper has analyzed an extensive and heterogeneous suite of inhalation anthrax challenge experiments to determine if an immune measure could be correlated with protection in the varied experiments, and to ascertain if the models could predict protection across species. We find in the final models for all three animal species that increases in vaccine-induced TNA 4 weeks after the last vaccination were strongly associated with increased survival. Because of differences in study designs, 4 weeks after vaccination time has a different meaning in different studies (for example, 4 weeks after the last vaccination of a 0–1 month schedule and challenge at 10 weeks is different from 4 weeks after the last vaccination of a 0-1-6 month schedule and challenge at 1 year or later), and we emphasize that the 4 weeks is arbitrary and other times and other immune measures (such as antibodies as measured by ELISA) may work similarly. We have studied 11 different settings, where within each setting the species, vaccine type (rPA or AVA), vaccine schedule, vaccine formulation, time of TNA measurement, and challenge time are the same, and we have found that the TNA measurement can be used to predict survival in each setting. When we hold constant all factors except vaccine dose, we find that immune response explains a substantial fraction of the combined effect of dose and immune response. This lends support to the idea that immune response alone can be used to predict survival outcomes between different doses of vaccine. The fundamental question, however, is the extent to which this supports extrapolation to humans, which is much more ambitious than extrapolating to a different dose. We approached this indirectly by looking at cross-species extrapolation while holding other factors constant. For nonhumans the extrapolation appeared reasonably accurate (Cohen’s κ 0.55 to 0.78; Table 3), which supports the idea that such extrapolation from animal to human data may also be informative.
How might this information be used for dose selection in humans? Knowing that the TNA endpoint is a reasonable correlate and that results of the animal challenge studies should be predictive of efficacy in humans, one can assess that the likely protection afforded a human population from a safe and logistically sound vaccination regimen.
Extrapolation from animals to humans is fundamentally not a statistical issue because humans cannot be intentionally challenged with inhalation anthrax. The estimate of what might happen to humans relies on a holistic judgment of many sorts of evidence, only some of which can be statistically manipulated as done in this paper. Knowledge of the pathogenicity of the bacterium and the likely protective mechanism of the vaccine form the basis of extrapolating from animals to humans. Anthrax is primarily a toxin mediated disease and as such, should be ameliorated by anti-toxin antibodies. In the 1950s an AVA-like vaccine was evaluated in a series of trials that randomized textile mill workers who worked with raw goat hair to vaccine or placebo. The overall reported VE for cutaneous anthrax was 0.925 . This directly demonstrated the ability of PA based vaccines to protect humans from B. anthracis infection. Additionally, passive transfer experiments convincingly demonstrate that infusing sufficient antibodies early in the infection process can reliably ensure survival in animals . Nonetheless, in passive immunization, the amount of circulating antibodies required for protection is much larger than that required in active vaccination. Thus, other players in the vaccine induced acquired immune response, such as memory B cells, T cells, and/or unidentified processes must have a causative role in protection. Implicitly, we accept for the vaccine formulations evaluated in this analysis that these other unmeasured players are likely associated with antibody measured by TNA 4 weeks after the last injection. When taken in total, the evidence supports that PA protein based vaccines protect humans and also that the TNA measures a very important, although not exclusive, mechanism of protection.
There are subtle aspects to the statistical reasoning that go into these experiments. The extrapolation is built up within different nonhuman species and then bridged from nonhumans to humans. The amount of data and resources that are devoted to estimating the effect of antibody on protection, exploring the impact of dose, and the similarity of models across species needs to be judged with this in mind. For example, the ratio of PA50 values from rabbits (in setting 3) to cynos (in setting 6) was 2.79 (95% CI, 1.35 to 5.80). There is a statistically significant difference between species here, and we can estimate this ratio more precisely by studying more animals. Nevertheless, one needs to question whether increasing that precision will substantially fortify the extrapolation from animal models to human protection models. Clearly, there is a difference between rabbits and cynos. While greater precision about that difference may not tell us much about the difference between cynos and humans it may at least reduce the uncertainty of extrapolation.
For anthrax, we have shown that TNA measured at a specific time after vaccination can obtain good agreement when predicting survival between species. We have also shown that the actual TNA values needed to predict at least 50% survival can vary between settings and genera.
Although other strategies could be used to try to more precisely isolate a causative mechanism, the analyses presented here support a rational approach for bridging VE in animals to vaccine effectiveness in humans. Detailed knowledge of the infectious process, how the immune system successfully defeats infection, and how the vaccine successfully enhances the immune response contributes to the ability to extrapolate from animal models to human use. We believe that the approaches described here are relevant to the requirements codified in the FDA’s Animal Rule.
The animal studies included were designed in a data-driven, iterative manner to develop and refine animal models that would support licensure of new anthrax vaccines using the Animal Rule. Data were provided by Division of Microbiology and Infectious Diseases, National Institute of Allergy and Infectious Diseases (MAID); National Center for Immunization and Respiratory Diseases, CDC; and the United States Army Medical Research Institute of Infectious Diseases (USAMRIID), Department of Defense. Each institution performed various studies to assess the effect of vaccine dilution dose on survival, the immediacy and duration of vaccine-induced protection, the immunological responses to vaccination or a combination of aspects. Study designs varied depending on the primary purpose of the study. Although not all studies were specifically designed to determine immunologic correlates of protection, the basic approach of vaccination with a range of vaccine dilutions to modulate the immune response, humoral immune response assessment, and then challenge with high doses of aerosolized virulent B. anthracis spores allowed rational combination of data.
For the AVA vaccine, dose was varied by diluting the standard human dose with saline so that a fractional dose was obtained. Doses are expressed as the fraction of the full dose, that is, 1 is equal to a full dose, 0.5 is equal to a half dose and so forth. For the rPA vaccines, dose was expressed by micrograms of rPA protein and dose was varied by diluting the vaccine to desired protein concentration. The rPA vaccines were diluted either with buffer containing aluminum hydroxide adjuvant or with saline. Thus, when diluted with buffer containing adjuvant, the adjuvant concentration was held constant for all protein doses. When diluted with saline, the ratio of adjuvant to protein remained constant because the two components were diluted together. Control groups (placebo injections) were also included, some of which were injected with saline and others with buffer containing adjuvant.
ELISAs to detect antibody to PA were based on the CDC methods  with minor variations among the laboratories. In general, recombinant PA was passively adsorbed to the wells of a 96 well plate overnight at about 4°C. Samples and reference sera were diluted in phosphate buffered saline (PBS) (pH 7.4)/5% skim milk/0.5% Tween 20. Plates were washed with PBS/0.1 % Tween 20, and samples, reference sera and controls were plated and incubated for 60 min at 37°C. Plates were then washed and horseradish peroxidase conjugated immunoglobulin G-specific antibody appropriate for the species was diluted in PBS/5% skim milk/0.5% Tween 20 and added to the plates. After incubation at 37°C for 60 min, plates were washed and the appropriate substrate was added and the plates were developed. Plates were generally read at dual wavelength and the samples were quantified against the reference material calibrated in microgram per milliliter  and analyzed with a four parameter logistic regression.
The laboratories performed the TNA essentially as described [36, 37, 41]. Briefly, serum samples were titrated by with twofold serial dilutions in a 96-well plate, followed by the addition of a constant amount of lethal toxin (LT) to each dilution. The concentration of LT added was that needed to kill about 95% of the cells in the absence of any neutralization. After pre-incubation of the test serum with the LT, the mixtures were transferred to another 96-well plate that had been seeded with J774A.1 cells in late log phase. LT that was not neutralized by anti-LT antibodies in the serum would intoxicate and kill the cells. Following intoxication, 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT) was added to the plates, followed by the addition of a solubilization buffer to lyse the cells and solubilize the MTT. The cell plates were then incubated, and the optical density (OD) values were read with a microplate reader to determine cell viability . All incubations were carried out at 37°C in about 5% C02. Neutralization of anthrax LT was manifest as a suppression of cytotoxicity and, hence, the preservation of cell viability. A four-parameter logistic regression model was used to analyze the OD versus the reciprocal of the serum dilution. The inflection point was reported as the effective dilution at 50% inhibition (ED50).
These assays were conducted for all studies at the study site. An interlaboratory comparison was conducted that included most of the laboratories contributing to the data set for this study and TNA data from all laboratories were found to be similar . A long-term assay performance study of the TNA estimated that the SE of log10(TNA) for replications was 0.11. Values below the limit of detection were set at one half the limit of detection.
Other assays that examined multiple aspects of humoral and cell-mediated immunity were used in settings 9 to 11. For these settings, statistical analyses indicated that antibody measured with either TNA or ELISA at different time points provided the best prediction of survival with limited improvement with the addition of other assays.
Although the logistic regression models posit a specific parametric form for the model, we use nonparametric bootstrap methods with percentile CIs which give asymptotically correct coverage even if the parametric models do not fit the data . For example, in Fig. 1, we refit the logistic model 1 for each of 2000 bootstrap samples, calculate the predicted survival at a fine grid of points, and then take the middle 95% of those 2000 replicates at each point in the grid. The PA50 values and the CIs (Fig. 2 and Table 2) are just one of those points in the grid. Statistical significance for each setting in Figure 1 was determined by permutation test on the TNA values, that is, seeing if the observed slope of the logistic model was large compared to 2000 slopes calculated after randomly permuting the TNA values 2000 times. In Figs. 1 and and2,2, percentile bootstrap CIs were used for the settings, and for overall PA50, we used a random effects weighted mean model with the Paule-Mandel estimator and using within setting variances estimated by bootstrap . A similar permutation test to that in Fig. 1 was done to test if the slopes are significantly different from 0 in Fig. 3, and in that case we permute only within setting and the permutation test automatically adjusts for the fact that the same controls are used within each setting to calculate the VEs for that setting. For the empirical logits, the VEs were estimated using the ratio of proportion deaths of the vaccinated to control groups, with the proportions adjusted to ensure that VE values were less than 100% using two sequential adjustments: First, the vaccinated proportion was estimated by adding 1/2 to the numerator and 1 to the denominator, and second any control sample proportion less than the adjusted vaccine proportion was replaced with the unadjusted vaccinated proportion. Confidence limits for VE in Figure 3 were obtained with an asymptotic method . Nonparametric bootstrap percentile method was used in the cross species comparisons for CIs on predicted survival, ratios of PA50 values and Cohen’s κ. This approach bootstraps both data from animals used in the creation of the logistic model and data from animals whose immune responses are predicted from the model . The CIs for the observed survival (Table 3) are exact . All P-values are two-sided. Calculations were done using R 2.15.0.
The animal procedures done by Battelle were approved by Battelle’s Institutional Animal Care and Use Committee. The research was conducted in compliance with the Animal Welfare Act and followed the principles in the Guide for the Care and Use of Laboratory Animals . Similar ethics were followed for the animal studies at USAMRIID (see ). The human study was approved by several human investigations committees and was registered at clinicaltrials.gov (identifier: NCT00119067), and all subjects provided informed consent .
We acknowledge the Battelle Biomedical Research Center (BBRC) (NIAID Contracts N01-AI-25494 and N01-AI-30061, and CDC Contract 200-2000-100065) and the USAMRIID for conduct of studies that generated data analyzed in this paper. We are also grateful to Dr Judy Hewitt (NIAID), Dr. Louise Pitt (USAMRIID) and Dr. Jim Estep (Battelle) for assistance in study planning, design and implementation. We are grateful to Larry Moulton who suggested the strategy of cross-species extrapolation to examine surrogacy.
Appendix: Correlation, surrogacy, and causation.
Fig. S1. Human data.
Fig. S2. Nonhuman primate data.
Fig. S3. Rabbit data.
Program. Includes description of data and instructions on using R functions.
Author contributions: Drafted sections of paper [MPF, DAF, FL, EON], guidance in interpreting results and editing of paper [MPF, DAF, FL, JS, GS, RK, CPQ, EON], data analysis [MPF, DAF], analysis advice [RK, JS, FL, CPQ, EON], database assembly [JS, FL, MPF, GS], study design and coordination [FL, CPQ, EON, GS]. None of the authors declare any competing interests.