|Home | About | Journals | Submit | Contact Us | Français|
In a randomized controlled clinical trial that assesses treatment efficacy, a common objective is to assess the association of a measured biomarker response endpoint with the primary study endpoint in the active treatment group, using a case-cohort, case-control, or two-phase sampling design. Methods for power and sample size calculations for such biomarker association analyses typically do not account for the level of treatment efficacy, precluding interpretation of the biomarker association results in terms of biomarker effect modification of treatment efficacy, with detriment that the power calculations may tacitly and inadvertently assume that the treatment harms some study participants. We develop power and sample size methods accounting for this issue, and the methods also account for inter-individual variability of the biomarker that is not biologically relevant (e.g., due to technical measurement error). We focus on a binary study endpoint and on a biomarker subject to measurement error that is normally distributed or categorical with two or three levels. We illustrate the methods with preventive HIV vaccine efficacy trials, and include an R package implementing the methods.
Commonly, clinical efficacy trials randomize study participants to receive a treatment or control preparation (e.g., placebo) at one or more visits, and follow these participants for occurrence of the primary clinical study endpoint. The primary objective assesses treatment efficacy against the clinical endpoint, and a common secondary objective assesses the association of intermediate response endpoints (e.g., biomarkers) measured after the administration of treatment with primary endpoint occurrence in the active treatment group. Applications of this secondary objective include developing prognostic biomarkers and providing information for other analysis objectives such as surrogate endpoint and mediation assessment. Typical statistical approaches for assessing such correlates of risk (CoRs) have included logistic or Cox proportional hazards regression models that account for the sampling design that was used for measuring the biomarkers (e.g., [1–4]).
For power calculations to detect CoRs in a cohort such as an active treatment group, many methods have been developed for case-cohort studies (e.g., ), case-control studies (e.g., [6–7]), and the generalization of case-control studies to two-phase sampling studies (e.g., ). However, the available approaches typically do not account for the level of clinical treatment efficacy overall and in biomarker response subgroups, precluding interpretation of the results in terms of potential correlates of efficacy/protection. We develop an approach to CoR power/sample size calculations that accounts for this issue, which is important because if the power calculations are based solely on the biomarker-outcome association in the active treatment group, then one could design a case-control study to, say, have 90% power to detect a biomarker-outcome odds ratio of 0.5, but not realize that this power is achieved under a tacit assumption that the endpoint rate is higher in the active arm than the control arm for the subgroup with lowest biomarker responses. By specifying overall treatment efficacy and biomarker-specific treatment efficacies as input parameters, our approach makes transparent in the power calculations the link between the CoR effect size in the active treatment arm and the corresponding difference in biomarker-specific treatment efficacies.
In addition, our approach accounts for the component of inter-individual variability of the biomarker that is not biologically relevant (e.g., due to technical measurement error of the device employed to measure a biological response), which is important because the degree of measurement error of the biomarker heavily influences power of the CoR analysis, such that accounting for this issue is needed to obtain accurate power calculations. In our approach the user inputs a parameter ρ defined as the estimated fraction of the biomarker’s variance that is potentially biologically relevant for protection, and displays how power and sample size requirements vary with ρ.
Our approach can be used for a general binary clinical endpoint model with case-cohort, case-control, or two-phase sampling of the biomarker, using without replacement or Bernoulli sampling. We illustrate the approach with a logistic regression model and case-control without replacement sampling. For rare event studies (e.g., with cumulative endpoint rate less than 10%), we found in simulations that the power for the logistic regression model tends to be very similar to that for a Cox regression model ; thus in this setting the approach may provide sufficiently accurate power results for time-to-event CoR analysis. The simplification afforded by using a binary outcome is helpful for focusing attention on the two issues listed above.
Related research has developed power calculators of testing procedures for assessing the association of a true biomarker subject to measurement error and a sub-sampling design with an outcome (e.g., [10–12]). Here, we depart from this research objective by developing a power calculator of testing procedures for assessing the association of a measured/observed biomarker that has components of variability thought to be not possibly associated with the outcome. Whereas the former testing procedures incorporate bias-correction techniques, leveraging, for example, validation sets or replicate biomarker measurements, our power calculator may be used with a large number of available hypothesis testing procedures from the case-cohort/case-control/two-phase sampling statistical methods literature (going back to Horvitz and Thompson ), where the methods do not need bias-correction techniques. Thus, the contribution of this work is to provide more interpretable and accurate power calculations for the common scientific endeavor to understand power for detecting the association of a measured/observed biomarker with the outcome. Moreover, previous work has developed power calculation formulas for associating a measured biomarker subject to measurement error with a dichotomous outcome; for example [14–16] considered a normally distributed biomarker following a classical measurement error model, with application to logistic regression correlates analysis.
While the newly proposed power calculator applies for general randomized controlled two-group clinical trials, for definiteness we focus on preventive vaccine efficacy trials, which randomize study participants to receive a candidate vaccine or placebo at one or more visits, and follow these participants for occurrence of clinically significant infection with the pathogen under study . The primary objective assesses vaccine efficacy (V E) defined as the multiplicative reduction (vaccine versus placebo) in the rate of the primary endpoint, and a secondary objective assesses the association of immune response biomarkers measured shortly after vaccination with the primary endpoint. For settings where some trial participants were previously infected with the pathogen (e.g., influenza) this analysis is done for each of the vaccine and placebo groups or pooling over the groups, and for settings where trial participants have not been previously infected with the pathogen (e.g., HIV), such that the immune response biomarker does not vary in the placebo group , this analysis is done either pooling over the vaccine and placebo groups or in the vaccine group only. In the vaccine field such analyses have been named CoR analyses (e.g., [19–20]) and for definiteness we focus on assessing a CoR in the vaccine group. The approach is illustrated with power calculations for the RV144 HIV vaccine efficacy trial after the primary analysis was conducted, and with sample size calculations for the prospective design of a sequel HIV vaccine efficacy trial being planned by the HIV Vaccine Trials Network.
Section 2 describes the study set-up, parameters of interest, and identifiability assumptions. Section 3 describes the power and sample size calculation approach. Section 4 illustrates the power/sample size calculator with the two examples, and Section 5 concludes with discussion. Supporting Materials Appendix A discusses how to unbiasedly characterize the biomarker distribution accounting for the sampling design, Supporting Materials Appendix B provides selected mathematical details of the power calculation methods, and Supporting Materials Appendix C addresses the important topic of how to estimate the noise level of the biomarker. Supporting Materials Appendix D presents supplementary figures for the two illustrations and Supporting Materials Appendix E summarizes how to use the R package.
We consider a double-blind clinical trial that randomizes participants to vaccine or placebo, with Z the indicator of assignment to vaccine and W baseline covariates. Let S be the immune response biomarker measured at a fixed time τ post-randomization, which we assume to be continuous or trichotomous, with the case of dichotomous S covered as a special case. Participants are followed for occurrence of the primary clinical study endpoint, clinically significant infection with the pathogen, with followup through time τmax, with T the time from randomization until the study endpoint and Y I[T ≤ τmax] the binary outcome of interest. Let Yτ I[T ≤ τ] and Vτ be the indicator that a subject attends the visit at τ. Fitting to the motivating application, we focus on settings where it is only interesting to study the association of S with Y for subjects who did not experience the event before the biomarker is measured. Therefore, subjects with (1 − Yτ)Vτ = 1 are the subgroup observed to be at-risk at τ who could potentially have S measured for the association study.
Because S is expensive to measure, a case-cohort, case-control, or two-phase sampling design is often used; let R be the indicator that S is measured. Let Δ be the indicator that Y is observed, i.e., Δ = 0 if the subject drops out before time τmax and before experiencing the event, and Δ = 1 otherwise. Let L (R(z), R(z)S(z), Yτ(z), Vτ(z), Δ(z), Δ(z)Y (z)) be the potential outcomes if assigned treatment z, for z = 0, 1, where S(z) is defined if and only if Yτ (z) = 0, such that S(z) = * if Yτ (z) = 1. (Note that Yτ (z) = 1 and Vτ (z) = 0 each imply R(z) = 0.) The observed data for a subject are O (Z,W,R,RS, Yτ, Vτ, Δ, ΔY). The CoR power calculations are based on the N vaccine recipients observed to be at-risk at τ (those with Z(1 − Yτ)Vτ = 1), and test for whether P(Y = 1|S = s1, Z = 1, Yτ = 0) varies in s1. To understand our approach, it is critical to note that the CoR power calculations do not need the potential outcomes formulation, as they are based solely on the observable random variables O. The potential outcomes are used to define biomarker-specific vaccine efficacy and hence provide a way to relate CoR effect sizes to vaccine efficacy effect sizes.
To facilitate building this relationship, we assume the vaccine has no effect on the study endpoint before the biomarker sampling time τ: P(Yτ (1) = Yτ (0)) = 1; this assumption will be more credible and less influential for τ near baseline. This assumption is useful by ensuring that the biomarker-specific vaccine efficacy parameters measure causal effects of vaccination, and for equating the CoR parameter P(Y = 1|S = s1, Z = 1, Yτ = 0) to P(Y = 1|S = s1, Z = 1, Yτ (1) = Yτ (0) = 0), which links the CoR and V E parameter types (as described below). Henceforth all unconditional and conditional probabilities of Y (z) = 1 tacitly condition on Yτ (1) = Yτ (0) = 0.
We suppose that each of the N vaccine recipients is in one of three latent/unknown baseline subgroups X, the “lower protected” (X = 0), the “medium protected” (X = 1), or the “higher protected” (X = 2). Define the x-specific outcome risks as
such that the vaccine efficacy for latent subgroup x is with , for x = 0, 1, 2.
Define for x = 0, 1, 2, and define the marginal risks riskz P(Y (z) = 1) for z = 0, 1. Then the overall vaccine efficacy V E equals
We also define risks and vaccine efficacies for subgroups defined by S(1) or by (X, S(1)):
for x = 0, 1, 2, s1 = 0, 1, 2 and z = 0, 1, and
The observed biomarker response s1 = 0 represents a “low” response in some fashion and s1 = 2 a higher response, with s1 = 1 an intermediate response. For example, s1 = 0 could be a negative/non-response and s1 = 2 a response above a pre-specified putative correlate of protection threshold. If S were measured without error, then X = S such that V E(s1) = V Elat(x, s1) and the latent variable formulation would not be needed; we use it to allow measurement error to create differences in V E(s1) versus V Elat(x, s1), with greater differences for noisier biomarkers (developed next).
To incorporate assay noise into the power/sample size calculations, we define protection-related sensitivity/specificity and false positive/negative parameters as:
The probability an observed at-risk vaccine recipient has a low or high response, P0 P(S = 0|Z(1 − Yτ)Vτ = 1) or P2 P(S = 2|Z(1 − Yτ)Vτ = 1), equals
We consider two approaches to the trichotomous biomarker power calculations. Approach 1 takes as inputs (Sens, Spec, FP0, FN2, FP1, FN1), whereas Approach 2 uses an additive measurement error model for a normally distributed continuous-readout biomarker S*, and defines the values of S by S = 0 if S* ≤ θ0, S = 2 if S* > θ2, and S = 1 otherwise, with θ0 and θ2 two user-specified constants with θ0 < θ2. In particular, for Approach 2 we consider a normally distributed latent ‘true’ biomarker X* and link S* to X* by an additive classical measurement error model
with X* independent of e, implying with . Here is the fraction of the variability of S* that is potentially biologically relevant for protection, and is specified to reflect the quality of the biomarker. The ‘true’ trichotomous biomarker X is defined by two percentiles of X* that are determined mathematically by model (9) and the two percentiles θ0 and θ2 (see Supporting Materials Appendix B). Figure 1 illustrates the set-up for Approach 2.
The above set-up handles a dichotomous biomarker as a special case, by setting , in which case only the Sens and Spec parameters are needed for the calculations [because FN2 = 1 − Sens and FP0 = 1 − Spec; see equations (4)–(8)]. The R code handles the dichotomous biomarker as a special case.
The formulation for a continuous biomarker is similar, where now the latent subgroups are defined by the true unobservable biomarker X* in model (9) above. Now
with and riskz(s1) P(Y (z) = 1|S*(1) = s1) for x* and s1 varying over the continuous support of X*(1) and S*(1), respectively.
For the power calculations we specify a fraction of subjects with the lowest X*(1) values ≤ ν to all have the same specified lowest level of vaccine efficacy V Elowest:
For example, V Elowest may be set to 0 and defined as the fraction of subjects without a positive vaccine-induced immune response. The constant ν is determined by , V Elowest, and the measurement error model (9): , where Φ−1(·) is the inverse of the standard normal cdf.
For x* ≤ ν, is modeled as a constant following (10),
and, for x* > ν, is modeled via a logistic regression model
Using model (11)–(12) that specifies a lowest value of vaccine efficacy is useful because the alternative simpler model that would specify (12) for all x would force V E(x) to be negative for the lowest values of x. In many applications this is undesirable as enhanced risk of disease caused by vaccination may be considered unlikely and the most relevant power calculations would dissallow this possibility. (Albeit the power calculator works for V Elowest specified negative.)
We address the scientific objective to assesses a CoR among vaccine recipients. For trichotomous S this entails testing the following null versus alternative hypotheses
with ‘<’ for at least one of the two inequalities in H1. For continuous S* this tests
with ‘<’ for some . While for data analysis 2-sided tests would typically be used, the power calculations are clearer to interpret by testing for the 1-sided alternative H1 of lower clinical risk in vaccine recipients with increasing s1.
Two main approaches to selecting the subset of subjects for whom to measure the biomarkers are:
Our sample size calculations consider both approaches. The first approach has advantages including that the randomly sampled subjects can be used for unbiased assessment of the distribution of the biomarker in the study population, absolute risks can be assessed in biomarker subgroups, and the association of biomarkers with multiple study endpoints can be straightforwardly assessed. The second approach does not facilitate the latter two goals, and some re-weighting is required to use the sampled subjects for unbiased assessment of the distribution of the biomarker in the study population (see Supporting Materials Appendix A). An advantage of the second approach is that waiting until the primary analysis is completed before selecting the controls allows accounting for the vaccine efficacy results for optimizing the biomarker sampling design. This affords opportunities to improve efficiency of the analysis .
In addition to assuming iid random variables (Li, , Xi) and (Oi, , Xi) for i = 1, …, N, we assume the standard set of assumptions that have been used in correlates of risk and protection studies: SUTVA, ignorable treatment assignment (Z L|W), equal early clinical risk (P(Yτ (1) = Yτ (0)) = 1), and random censoring (Y (z) Δ(z) for z = 0, 1). We also assume S is missing at random (MAR): R depends only on the observed data O. To the extent the investigator controls the biomarker sampling design MAR is guaranteed to hold, although it could be in question due to happenstance missingness caused by not attending the visit at τ. Moreover, we focus on the scenario that after accounting for the latent category (and any baseline covariates W included in the CoR analysis) the measured biomarker in vaccine recipients does not affect risk, i.e., for all s1 and x*, and similarly for risk as a function of trichotomous X and S.
We develop the power calculations for the relatively simple scenario of homogeneous risk in the placebo group, where for all s1 and x* and similarly for risk as a function of trichotomous X and S. In general, risk0(x*, s1) and risk0(s1) are not identifiable (because S(1) is a counterfactual random variable for subjects assigned Z = 0), and power calculations could be conducted under many scenarios for these functions. However, the special case is very helpful for power calculations because risk0 can be specified based on the observed or projected incidence in the trial. Because the CoR data analysis itself would control for known baseline prognostic factors W, the scenario in which the power calculations are accurate is after conditioning on W.
From (14), (15), and (16), analysis of the vaccine group data provides inference on the relative risks RRt risk1(2)/risk1(0) for a trichotomous biomarker and RRc risk1(s1)/risk1(s1 − 1) for a continuous biomarker. We refer to RRt and RRc as the user-specified “CoR effect sizes” of the power calculations. From the assumptions of Section 2.7, RRt and RRc are identified from the observed data measured from the subset of vaccine recipients with R = 1, because they imply risk1(s1) = P(Y = 1|Z = 1, R = 1, s1) . Therefore under the assumptions the power calculations for testing H0 can be based on the set of vaccine recipients with S (or S*) measured at τ.
For a trichotomous biomarker, straightforward calculation shows that RRt is linked to the latent V E parameters via the equation
This formula makes the estimable RRt interpretable in terms of a gradient in vaccine efficacies, where for a noise-free biomarker with 1 − Sens = 1 − Spec = FP0 = FP1 = FN2 = FN1 = 0 (illustrated in Figure 5 below). Otherwise, under H1, RRt is closer to 1.0 than .
For a continuous biomarker S* following model (9), RRc is linked to the latent vaccine efficacy parameters via an equation that depends on s1. Because RRc depends on s1, it is not particularly useful to index power calculations by RRc. Instead, we interpret RRc as the effect size for a noise-free biomarker (ρ = 1). Under the logistic model (12), RRc is the relative risk per standard deviation increase in X* in the region above ν, where we use the approximation of a relative risk by an odds ratio.
Of the N vaccine recipients observed to be at-risk at τ, let ncases (ncontrols) be the number of observed cases (controls) from whom S (or S*) is measured, where cases have ΔY = 1 and controls have Δ(1 − Y) = 1. If the power calculations are done at the design stage then N, ncases, and ncontrols are projected numbers.
For a trichotomous S, the algorithm for the power calculations is as follows:
For Step 7 Approach 2, under the assumptions of Section 2 and specified values for , ρ, and , the remaining inputs Sens, Spec, FP0, FN2, FP1, FN1, (θ0, θ2) are determined by solving the equations (4)–(6) and (7)–(8); the R code does this using stochastic integration (Supporting Materials Appendix B).
Step 8 proceeds as follows. First, for each of the N vaccine recipients, determine the numbers that are in the three latent subgroups as rounded to the nearest integers, for x = 0, 1, 2. Second, determine the latent class membership of each of the ncases cases by a realization of a trinomial random variable with success probabilities (P(X = 0|Y = 1, Yτ = 0, Z = 1), P(X = 1|Y = 1, Yτ = 0, Z = 1), P(X = 2|Y = 1, Yτ = 0, Z = 1)), where P(X = x|Y = 1, Yτ = 0, Z = 1) is expressed in terms of the and risk1(x) via Bayes rule. This determines the number of cases ncases(x) in each category x = 0, 1, 2 satisfying . Third, within each subgroup x, simulate Si for the entire set of N subjects as a trinomial random variable. For x = 0 the response probabilities are (Spec, 1 − FP0 − Spec, FP0); for x = 1 the response probabilities are (FN1, 1 − FP1 − FN1, FP1); and for x = 2 the response probabilities are (FN2, 1−Sens−FN2, Sens). This determines the number of controls ncontrols(x) in each category x = 0, 1, 2 by subtracting off ncases(x), satisfying the constraint where ncontrols is fixed at K * ncases. Fourth, finalize the analysis data set by specifying Ri = 1 or Ri = 0 for each of the N subjects.
For Step 9, we use the tps(·) function in the R package osDesign that implements the 2-phase logistic regression method of Breslow and Holubkov , entering S as an ordered score variable with levels S = 0, 1, 2 and conducting a one degree of freedom Wald test. Alternatively a generalized two degree of freedom Wald test could be used. In addition, alternative analysis methods could be used that leverage correlations between the biomarker and auxiliary covariates measured in everyone, potentially increasing power . However, it will often be advantageous to base the power calculations on the simpler method both for the utility of having conservative power calculations and because the strength of correlation of the auxiliaries must be fairly high to yield a material power gain (often not available in practice).
For a continuous normally distributed biomarker S* scaled to have mean 0 and , the same simulated data sets (using Approach 2 in Step 7) can be used for the power calculations, with process as follows.
Under Bernoulli sampling, of the N vaccine recipients observed (or projected) to be at-risk at τ, ncases (ncontrols) is the expected number of observed cases (controls) from whom S and S* are measured, i.e., ncases and ncontrols are random. For a trichotomous biomarker, the power analysis proceeds as described in Section 3.1, except Step 8 uses Bernoulli sampling (classic case-cohort sampling ). In particular, for each of the N vaccine recipients, determine the case status Y conditional on X* = x* as a realization of a Bernoulli random variable with success probability . For a continuous biomarker, Step 8 is altered by determining the case status Y conditional on X* = x* as a realization of a Bernoulli random variable with success probability .
We first illustrate the correlates power calculations for the RV144 preventive HIV vaccine efficacy trial of a candidate vaccine versus placebo that was conducted in the general population in Thailand . For this example the power calculations are conducted after the trial was completed (e.g., [24–25]). The RV144 trial randomized 8198 (8197) HIV uninfected individuals to receive vaccine (placebo) and followed them for the primary endpoint of HIV infection over 42 months. Subjects received immunizations at Week 0, 4, 8, 24, and immune response biomarkers measured at τ = Month 6 (Week 26 visit) were assessed in vaccine recipients as CoRs of HIV infection by τmax = 42 months. Relevant for the CoR power calculations, estimated overall V E to prevent infections after τ through τmax was 0.26. Of the N = 7703 vaccine recipients observed to be at risk at Month 6, biomarkers were measured in the 41 subjects who were observed to subsequently experience the HIV infection endpoint, and in a frequency matched 5:1 controls:cases allocation random sample of 205 observed controls (i.e., without replacement two-phase sampling). Based on these data several papers have reported significant continuous, trichotomous, and dichotomous CoRs in the vaccine group, with initial paper Haynes et al. . These analyses were done using two-phase logistic regression  and two-phase Cox regression , which gave almost identical answers.
For the power analysis with a continuous biomarker following model (12) we assume , such that the 40% of vaccine recipients with lowest X* responses had vaccine efficacy V Elowest. We varied V Elowest from 0 to the overall V E estimate of 0.26. We estimated risk1 as n1/(n1 + n2) where n1 is the number of vaccine recipients observed to be at-risk at τ = 6 months who were diagnosed with HIV infection by the end of follow-up τmax = 42 months (n1 = 41) and n2 is the number of vaccine recipients observed to be at-risk at τ who completed follow-up HIV negative (n2 = 7662). Then we estimated risk0 as .
Figure 2 shows the power curves for ρ = 1, 0.9, 0.7, 0.5. As expected power decreases with the degree of noise. The interpretation of the plot may be aided by annotating it with results from previous efficacy trials that identified CoRs. In particular, suppose a previous trial reported an estimated per sd increment in observed S*. Under the measurement error model (9), for ρ = 1 this equates to per sd increment in X*, and for fixed ρ < 1 this equates to per sd increment in X*. Typically there will be uncertainty as to the level of ρ in the previous trial, such that affixing the estimated relative risk per sd increment in X* to each curve provides a scenario analysis of the power available to detect the previously identified CoR under a spectrum of noise levels. In Figure 2 we use the gp70-V1V2 binding antibody correlate observed in RV144 , which had per sd increment in S*. If this biomarker is assumed to have no measurement error (ρ = 1), power is 0.19, whereas under substantial measurement error (ρ = 0.7), power drops to 0.13. In additional simulations, the power curves are higher if overall V E is higher (not shown).
To help interpret the power results in Figure 2, Figure 3 shows the V E(x) curves for six different scenarios of the true CoR relative risk effect size RRc (ρ = 1) and values of V Elowest for the RV144 scenario with estimated overall V E of 0.26. The null hypothesis RRc = 1 corresponds to a flat curve V E(x) = V E, and increasing departures from the null hypothesis H0 correspond to increasingly variable and steep VE curves. This figure shows that for the scenario risk0(s1, x) = risk0 and no measurement error, an association of the biomarker with infection risk in the vaccine group (a CoR) is equivalent to an association of the biomarker with V E. For interpreting Figure 2, if we focus on the ρ = 0.9 curve with effect size RRc = 0.53 and V Elowest = 0.04 (green solid curve), V E varies substantially in X* but power is low, only about 0.14.
Figure 4 shows the power curves (top panels) based on the same simulated data sets following the recipe given in Section 3.1 (using Approach 2 in Step 7), for a trichotomous biomarker with set to 0.1, 0.2, 0.3, or 0.4 with and tied to through the relationship expressed in Step 5 of Section 3.1. The results show that power majorly increases with P0 = P2, which is intuitively expected given that greater sample sizes at the poles of lowest and highest VE should yield the greatest power.
To help interpret the power results of Figure 4, Figure 5 shows the relationship between the CoR effect size RRt and the relative risk ratio for the four values of ρ, with Table 1 showing how ρ maps to Sens, Spec, FP0, FN2, FP1, FN1 for each set of input parameters used in Figure 4. Figure 5 shows that for a noise-free biomarker with ρ = 1, such that a CoR in the vaccine group is equivalent to the relative vaccine efficacy parameter, whereas for imperfectly measured biomarkers with ρ < 1, such that the CoR effect size is closer to the null than the relative vaccine efficacy parameter. We illustrate a co-interpretation of Figures 4 and and55 for the ρ = 0.9 marker in Figure 5 and P(S = 0) = 0.4 (bottom-right panels). There is about 25% power to detect a CoR with effect size RRt = 0.60 (Figure 4), which corresponds to 25% power to detect (Figure 5). Supporting Materials Figure 1 shows an ROC curve (sensitivity versus one minus specificity) as ranges from 0.10 to 0.90. Our overall conclusion for this example is as follows: Because estimated overall V E was low (at 0.26), the assumption of V E ≥ 0 for all biomarker subgroups constrains the possible CoR effect sizes to a limited range hence yielding low power of the CoR analysis; in contrast if V E were allowed to be negative for some subgroups then power would be greater.
Our second example considers calculations being used to plan the sample size of a Phase 3 HIV vaccine efficacy trial under design by the HIV Vaccine Trials Network. This trial randomizes HIV negative individuals to vaccine or placebo in a 1:1 allocation and follows subjects for HIV infection during a τmax = 36 month follow-up period. We assume 4% annual HIV incidence in the placebo group and 5% annual dropout incidence, as well as overall V E = 0.50. The immune response biomarkers to assess as CoRs are measured at month τ = 6.5. All vaccine group subjects diagnosed with HIV between month 6.5 and 36 have biomarkers measured, as do a random sample of HIV uninfected controls with controls:cases ratio 1:1, 3:1, 5:1, or 10:1. Figure 6 shows the trichotomous biomarker power curves versus the number of infections in the vaccine group (and the total sample size observed to be at risk at τ) to detect a CoR effect size of , for ρ fixed at value 0.9 that may be a realistic scenario for a biomarker assessed as a CoR. (Under the constant placebo risk assumption these calculations assume ) The calculations are for the scenarios ranging from 0.10 to 0.50. The results show that power sharply increases with the prevalences and increases with the controls:cases ratio, with only incremental gain moving from 5:1 to 10:1. Based on this analysis, to achieve 90% power to detect a CoR with P0 = P2 = 0.30, one choice would be the 5:1 allocation design, requiring 2800 total vaccine recipients observed to be at-risk at 6.5 months.
We developed an approach to power and sample size calculations for a typical “correlates of risk” (CoR) data analysis in a randomized controlled clinical efficacy trial for testing an association of an observed biomarker measured in a sub-sample (via a case-cohort, case-control, or two-phase sampling design) of the active treatment group with a clinical endpoint. The contribution of this work is to integrate into the calculations two issues – the level of treatment efficacy across biomarker subgroups and the fraction ρ of the variability of the biomarker that is potentially biologically relevant for protection . The first issue is important because, if ignored, a statistician may design the sample size of a CoR study not realizing the tacit assumptions being made about treatment efficacy. A particularly egregious mistake would be powering a study to detect a CoR with no recognition that achieving the desired power requires that treatment efficacy be negative for some biomarker subgroups, rendering the CoR study underpowered if treatment efficacy is never negative. Our approach provides a way to explicitly explore the relationship of the CoR effect size with treatment efficacy, including a way to specify the lowest treatment efficacy at a fixed value such as zero. The second issue is important because the degree of measurement error ρ heavily influences power [14–16], such that accounting for ρ is needed for accurate power calculations, and may be useful for screening out biomarkers for which the CoR study would be underpowered given an unacceptably low value of ρ.
For the continuous biomarker calculations and for the Approach 2 trichotomous biomarker calculations, we have assumed a classical additive normal measurement error model for the observed continuous biomarker S*, the veracity of which should be tested. In general, in the planning of biomarker CoR studies it is important to conduct biomarker assay laboratory validation studies to estimate ρ; we discuss approaches to this in Web Appendix C.
Our power calculator applies for a univariate biomarker, yet studying the association of multiple biomarkers with outcome is an important application. The calculator for a trichotomous biomarker may be useful for trials that collect possibly high-dimensional multivariate biomarkers, and for which unsupervised clustering based on the biomarkers yields a cluster of “putatively not protected” subjects and a cluster of “putatively protected” subjects. In this scenario, the power calculator may be applied with all other subjects constituting the third cluster. In addition, the calculator for a normally distributed biomarker may be used for studying power to detect a linear combination of multiple biomarkers as a CoR.
Our CoR power and sample size calculations are for the scenario that the biomarker is not associated with the clinical endpoint in the control group after accounting for baseline covariates W that would be controlled for in the CoR data analysis. This assumption is not needed for the CoR calculations because they use data from the active treatment group only. However, this assumption is used as a way to interpret the CoR power calculations in terms of biomarker-specific treatment efficacy, providing a mapping from the CoR calculations (in terms of risk gradients in the active treatment group) to gradients in treatment efficacy. Additional calculations may be conducted under alternative scenarios, where the approach here could be extended to allow functions risk0(x, s1) other than risk0(x, s1) = risk0. While the main application of the methods is more interpretable and accurate CoR power and sample size calculations, a second application is power and sample size calculations for assessing modification of treatment efficacy by the biomarker, i.e., assessing the vaccine efficacy curve V E(s1) directly, which is conducted in the principal stratification framework [18, 27]. Supporting Materials Figures 2 and 3 show such power curves for our two illustrative examples.
Research reported in this publication was supported by the National Institute Of Allergy And Infectious Diseases (NIAID) of the National Institutes of Health (NIH) under Award Numbers R37AI054165 and UM1AI068635. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The authors thank the participants, investigators, and sponsors of the RV144 trial, including the U.S. Military HIV Research Program (MHRP); U.S. Army Medical Research and Materiel Command; NIAID; U.S. and Thai Components, Armed Forces Research Institute of Medical Science; Ministry of Public Health, Thailand; Mahidol University; SanofiPasteur; and Global Solutions for Infectious Diseases.
Web-based Supporting Materials
Title: Appendices Appendix A describes a technique for unbiased characterization of the biomarker distribution accounting for the biomarker sampling design. Appendix B provides selected mathematical details of the power calculation methods. Appendix C discusses how to estimate the noise level of the biomarker under study. Appendix D presents supplementary figures for the two illustrations. Appendix E summarizes how to use the R package implementing the methods.