Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Clin Trials. Author manuscript; available in PMC 2010 April 21.
Published in final edited form as:
PMCID: PMC2857773

The intermediate endpoint effect in logistic and probit regression



An intermediate endpoint is hypothesized to be in the middle of the causal sequence relating an independent variable to a dependent variable. The intermediate variable is also called a surrogate or mediating variable and the corresponding effect is called the mediated, surrogate endpoint, or intermediate endpoint effect. Clinical studies are often designed to change an intermediate or surrogate endpoint and through this intermediate change influence the ultimate endpoint. In many intermediate endpoint clinical studies the dependent variable is binary, and logistic or probit regression is used.


The purpose of this study is to describe a limitation of a widely used approach to assessing intermediate endpoint effects and to propose an alternative method, based on products of coefficients, that yields more accurate results.


The intermediate endpoint model for a binary outcome is described for a true binary outcome and for a dichotomization of a latent continuous outcome. Plots of true values and a simulation study are used to evaluate the different methods.


Distorted estimates of the intermediate endpoint effect and incorrect conclusions can result from the application of widely used methods to assess the intermediate endpoint effect. The same problem occurs for the proportion of an effect explained by an intermediate endpoint, which has been suggested as a useful measure for identifying intermediate endpoints. A solution to this problem is given based on the relationship between latent variable modeling and logistic or probit regression.


More complicated intermediate variable models are not addressed in the study, although the methods described in the article can be extended to these more complicated models.


Researchers are encouraged to use an intermediate endpoint method based on the product of regression coefficients. A common method based on difference in coefficient methods can lead to distorted conclusions regarding the intermediate effect.


The purpose of this article is to describe why standard statistical approaches for examining the intermediate endpoint or mediated effect using probit and logistic regression can be inaccurate and how to correct them. The focus of this article is the case in which the dependent variable is binary such as dead/alive, presence/absence of disease, or use/nonuse of drugs. This article examines methodology for modeling as to how an active intervention condition, compared to control, affects this outcome through a continuous intermediate or mediating variable. Logistic or probit regression are traditionally used to examine this mediation effect. Such models are ubiquitous in clinical trials where the effect of a dichotomous treatment variable on a diagnosis or other dichotomous outcome is mediated by an intermediate variable.

The notion of an intervening or mediating variable is well established in clinical studies of a binary disease outcome [1,2] where the mediated effect is termed the surrogate or intermediate endpoint effect. Surrogate endpoints are a subset of mediator variables where the theoretical and statistical requirements for a variable to be a surrogate endpoint are more stringent than those for mediators in general. Surrogates are expected to explain the full impact on a distal outcome; a mediator may explain part of this effect. One important use of an intermediate variable is in situations where the ultimate outcome is difficult or costly to obtain. In both prevention and treatment trials, the extended length of time for a disease or death to occur and the low incidence rates often require exorbitantly large sample sizes to study correlates of the disease. In this situation, researchers advocate the use of surrogate or intermediate endpoints [3,4]. Surrogate endpoints are more frequent or more proximate to the prevention strategy and are therefore easier to study. Examples of surrogate endpoints are serum–cholesterol levels for the ultimate outcome of coronary heart disease [5], measures of immune system response for the ultimate outcome of death in HIV infected individuals [6], and presence of polyps for the ultimate outcome of colon cancer [7]. The use of surrogate endpoints rests on the mediation assumption that the independent variable causes the surrogate endpoint, which, in turn, causes the ultimate outcome [8,9]. As a result, decomposing the effects by which a surrogate explains impact on a distal outcome is critically important because the surrogate endpoint is considered to be causally related to the outcome. The importance of surrogate endpoints was recently summarized by Begg and Leung ([8], p. 27), Above all else, we believe that the issue of when and how to use surrogate endpoints is probably the pre-eminent contemporary problem in clinical trials methodology, so it merits much extensive scrutiny.

Prentice ([9] p. 432) defined a surrogate endpoint as a response variable for which a test of the null hypothesis of no relationship to the treatment groups under comparison is also a valid test of the corresponding null hypothesis based on the true endpoint. Freedman et al. [10] proposed the proportion of the treatment effect explained by an intermediate variable to test this surrogate endpoint effect. A value of 100% indicates that the surrogate endpoint explains all of the relation between the treatment and the dependent variable, therefore satisfying Prentice's definition. If the proportion mediated is <100%, it indicates that some of the relation may not be explained by this intermediate variable and other causal mechanisms may be neglected [11]. The proportion measure reflects the size of the surrogate endpoint (i.e., mediated) effect as well as the amount of the treatment effect explained by the surrogate endpoint. The proportion mediated has not been accepted without criticism [12], namely that accurate identification of surrogate endpoints requires measurement of the ultimate outcome [8], that the proportion measure can be negative or >1 [13], and that values of the proportion mediated are often very small. Furthermore, other research has shown that the proportion mediated tends to be unstable unless the sample size is large or the effect size is very large [14,15]. In response to the limitations of the proportion measure, Buyse and colleagues proposed two criteria for a surrogate endpoint (see [16,17] for more specific information on these criteria). The first criteria is that the total effect relating the treatment to the dependent variable divided by the effect of the treatment on the mediator or surrogate variable, is close to 1. The logic here is that an intermediate endpoint should be affected by a treatment to the same magnitude as the treatment affects the dependent variable. A second criterion requires that the relation between the mediator and the dependent variable is also substantial. Both the proportion mediated and the measures proposed by Buyse and colleagues are important because they combine a measure of effect size along with a measure of the mediated effect. These Buyse et al. criteria do not include an estimator of the intermediate endpoint effect so they are not addressed in this article. Other approaches to identification of surrogate endpoints focus on the importance of meta analytical combination of results from individual studies [18,19] and more detailed approaches to causal inference from one study [20]. The present article focuses on estimators of the intermediate endpoint effect from a single study.

In addition to replacing costly outcome measures, intermediate variables can also be used to elucidate how an intervention leads to changes in the outcome variable. A common example is in the field of prevention research where programs are designed to change intermediate variables that prevent negative outcomes. An example of testing mediation with binary outcomes was described by Foshee and colleagues [21] in which a program to prevent adolescent dating violence was hypothesized to work by changing norms, decreasing gender stereotyping, improving conflict management, changing beliefs about the need for help, increasing awareness of services for victims and perpetrators, and increasing help-seeking behavior. The researchers compared the values of the logistic regression coefficient of treatment predicting the binary dating violence outcome with and without controlling for the proposed mediators. The authors considered evidence of mediation to occur when the difference between the two values was 20% of the value of the unadjusted coefficient (i.e., 20% of the program effect was explained by the mediators). Given this criterion, results indicated that the treatment effect on sexual violence was mediated by changes in norms, gender stereotyping, and awareness of victim services.

As described by several researchers, the examination of mediation in the evaluation of prevention programs and clinical trials is important for at least two major reasons [2224]. First, most trials are based on changing mediating variables hypothesized to be related to the outcome measure. It is important to assess whether the program successfully changed the mediating variables it was designed to change. If the program was not able to change the mediating variable, then it would not be surprising that effects on outcomes were not observed. Second, the extent to which change in one or more mediating variables account for observed program effects sheds light on the theoretical basis of the program. This information has practical implications as well, because ineffective intervention components may be dropped if they do not contribute to intervention success or intervention components may be added if the original intervention only works through some but not all hypothesized mediators. Statistical methods for assessing mediation for continuous outcomes have been described [25,26], as well as methods for specific statistical models such as covariance structure models [27] and multilevel models [28]. Issues related to mediation analysis for categorical outcomes have not received much research attention.

Two general approaches have been offered to quantify mediation [23,24,29]. In the situation where the dependent and mediating variables are continuous and normal-based models are used, one way to quantify mediation is to compare the regression coefficient of the outcome on the independent variable both without adjustment for the mediator (βz in Equation (5)) and with adjustment for the mediator (γz in Equation (1)). If the unadjusted independent variable coefficient βz is nonzero and the adjusted coefficient γz is zero when the mediator is included in the model, the effect of the independent variable is entirely mediated by the mediating variable. Standard errors for the mediated effect are available [26] and are generally accurate [15]. This is called the difference in coefficients method.

A second general method, based on path analysis and more commonly used in the social sciences for continuous outcomes, treats mediation as the product of two regression coefficients, the regression of the mediator on the independent variable (αz in Equation (2)) and the partial regression coefficient of the outcome on the mediator, adjusted for the independent variable (γm in Equation (1)) [30]. This method is called the product of coefficients method [26].

For standard least squares regression models without missing data, the difference in coefficients and the product of coefficients estimators are identical [15]. For binary response variables, the two estimators are not identical and can differ dramatically, as shown below. The most widely used estimator of the standard error of this mediated effect was derived by Sobel [27] using the multivariate delta method under the assumption of normality (see MacKinnon et al. [26] for examples of other standard errors).

The proportion mediated measure, 1 – γzz (from Equations below), was suggested by Freedman et al. [10]; we note that this measure is not necessarily between 0 and 1. Other possible definitions of proportion mediated measures are γmαzz and γmαz/(γz + γmαz) (from Equations described below). In the continuous dependent variable case with no missing data, all three proportion measures for the mediated effect are equal. Because logistic and probit regression are nonlinear models, the point estimates of these two quantities are not equal. In this article we show some of the estimates have low bias while others require standardization procedures in order to be appropriate.

The mediated effect along with its standard error can be used to create confidence limits for the mediated effect [15]. For dichotomous outcomes, Freedman et al. [10] proposed a variance estimator of the difference in coefficients method. The results of this article suggest a preference for the product of coefficients method.

Complications with the estimation of the intermediate variable effect with a binary dependent variable

When the dependent variable is binary and logistic regression is used, the difference in coefficients and product of coefficients methods for calculating the mediated effect are not equal as MacKinnon and Dwyer [24] demonstrated with a simulation study. This article will examine estimation methods for mediation analysis with binary outcomes, describe a problem that arises with the estimation, and propose a solution. The solution is demonstrated with analytical methods and a simulation study that investigates both small sample properties and robustness to distributional assumptions. Finally, we present a case study of a randomized preventive trial to prevent later cigarette use in adolescence.

Estimation of the mediated effect with binary outcomes

Let Z be a binary variable representing intervention status (1 for active intervention and 0 for control), and let the binary Y be the major outcome variable (diagnosis, death, or improvement). Define X to be p-dimensional covariates measured at baseline. Let M be the hypothesized surrogate variable. In most trials, M is measured with a continuous variable, rather than a dichotomous one, since the former provides much greater statistical power. For both practical and statistical reasons, we consider the case of a continuous M variable, with normal or non-normal errors conditional on intervention condition.

As in the case of a continuous dependent variable, the mediated effect can be calculated in two ways when the dependent variable is binary [24]. To calculate the mediated effect both methods use information from two of the three equations listed below. In this model we assume no treatment by baseline interaction or moderation.

First, Equation (1) shows a logistic regression model where Y depends on M as well as on baseline covariates X and intervention condition Z1


where the γ coefficients refer to intercept (γ0), covariates (γx), and intervention (γz) after adjustment for the mediator. For probit regression, a similar expression exists, with logit replaced by the inverse distribution function for a standard normal.

Second, a linear relationship for the relation between Z and M is assumed.


The α coefficients refer to the prediction of M (ignoring Y completely). Like the previous model, this model assumes no baseline by intervention interaction. Also, most analytic models impose a distributional assumption on the error, such as normality with a homogeneous variance. This distributional assumption is more restrictive than the conditional mean specifications given in Equations (2) and (3). To specify this exact distribution, fM|XZ(m|x, z) refers to the conditional density of M given covariates X = x and intervention condition Z = z.

The subscript i used to indicate subject level covariates (Xi), intervention assignment (Zi), mediator (Mi), and distal outcome (Yi), for subjects i = 1, ... , n. It is assumed that each mediator is independent of all other mediator outcomes and is given by Equations (2) and (3) conditional on all covariates and intervention assignments. Also, given all covariates, intervention assignments, and mediators, the distal outcomes Y1, ... , Yn are all independent and follow the distribution specified in Equation (1).

Under the above assumptions, as sample size increases, asymptotically unbiased estimates of α's and γ's can be estimated through standard least squares and logistic regression respectively. Further, the marginal relationship between Z and Y, conditional on X but unconditional on M, is given by


Equation (4) shows the true relation between Z and Y, conditioning on multiple covariates X but not depending on M. This relation is called the marginal relation between Z and Y in this article. Without knowing the explicit form of fM|XZ, the above integral cannot be evaluated explicitly, and regardless of the form of this error distribution, there is no simple form for this integral. Instead, researchers often introduce a standard logit model for this marginal relationship between Y and X and Z. In particular, consider the following model:


Equation (4) reduces to the logit form given in Equation (5) only if there are no covariates [31]. When there are no covariates, the binary nature of Z allows specification of these parameters in a standard logit model. Even in this case of no covariates, however, there is no analytic way to obtain the exact value of βz from the α and γ coefficients; the integral needs to be evaluated numerically or approximated.

Quantifying mediation using the difference of coefficients method

There are two equivalent methods for examining the degree of mediation, difference in coefficients and product of coefficients methods. As will be shown below, these two methods are not the same for binary outcome models. A natural method for quantifying mediation is to compare the coefficients for the marginal relationship between Z and Y, to that adjusting for M. No difference in these coefficients implies that knowledge of M plays no role in predicting how intervention affects the distal outcome. This method has been proposed for dichotomous outcomes as well, but applied to the approximate marginal model Equation (5) rather than the exact marginal model given by Equation (4). Thus this straightforward (but as we will see inaccurate) difference in coefficients method is based on the parameter difference, βz – γz. It is estimated by fitting two separate logistic regression models, one using covariates and intervention status as predictors, the other including the mediator as well. We know of no applied research examples that use the difference in coefficients method with the correct marginal relationship, Equation (4).

Quantifying mediation using the product of coefficients method

The second method for assessing mediation in continuous outcome models relies on a path diagram approach. Mediation depends on the product of the relation between intervention and mediator, and the relation between mediator and distal outcome, adjusted for intervention. A direct application of this method to the regression model of M on Z and the logistic regression model of Y on both M and Z leads to examination of αzγm.

Mediation using latent variables

Since there is equivalence between the difference in coefficient and product of coefficient methods with linear, continuous models, it is useful to embed the problem within a continuous model. Investigation of the nonequivalence of the methods clarifies how the two approaches differ statistically and conceptually. The equivalence of the two methods is demonstrated using an alternative representation of these logistic regression models in terms of an underlying latent variable, an approach that goes back at least to Finney [32].

Equation (1) can be expressed equivalently by introducing the unobserved latent variable Y* that is linearly related to Z and M as well as covariates X.


In this model we let εy represent the residual variability having a standard logistic distribution, that is,


We also assume that this error in predicting Y* is independent of the error predicting M from Z in Equation (2). The dichotomous, observable Y is derived from Y* through the relation Y = 1 if and only if Y* > 0. This definition of Y in terms of Y* and the logit error distribution together produce a model that is identical to that of Equation (1).

In Equation (6), we can substitute the expression for M in terms of Z that is explicit in Equation (2), obtaining


The next to last line above shows that the mean difference on Y* for active intervention versus control when not adjusting for M is (γz + γmαz). The mean difference on Y* when adjusting for M is γz. Thus the product of coefficients, γmαz is identical to the difference in coefficients on the Y* scale. This underlying equivalence of the two methods when applied to the mean of Y* holds regardless of the distribution of εm as long as these errors have zero conditional means as in Equation (3).

Estimation and variance of mediated effects with binary outcomes

This section shows that the estimates based on the product of coefficients method produce asymptotically unbiased estimates of the mediated effect while those derived from the difference in coefficients method must be scaled properly in order to reduce zero-order bias. We also show that variance estimates from the General Estimating Equation (GEE, see [33]) approach can be used to form confidence intervals and testing. Appendix shows the consistency of the product of coefficients method for logistic regression when Y|XMZ is given by Model 1 and the conditional mean of M|XZ is linear as in Equations (2) and (3). Appendix A also shows the asymptotic independence of γ^ and α^z.

Standardizing mediated effects based on the difference of coefficients method

In this section, it is shown that the ordinary estimates obtained from a logistic regression of Y on X and Z based on Equation (5) are not consistent for γz + γmαz. We start from the true latent variable representation in Equations (8)(10). Then


Unless γm = 0, this probability function will not have a logistic form. More importantly, because the variance of γmεm + εy is larger than that of a standard logistic error, all of the logistic regression estimates of the intercept and slopes for X and z will be biased. It is possible to make adjustments for this higher variance, noting that a standard logistic regression error has variance Var(εy) = π2/3 [35] while the marginal variance is Var(γmεm+εy)=γm2Var(εm)+π23. Thus if we first logistic regression of Y on X and z, then multiply all the logistic regression estimates, including β^z by this factor,


the coefficient will at least be scaled properly. In the tables below we refer to this as the standardized logistic solution and investigate how this simple adjustment compares with the unstandardized solutions. Similarly, the proportion mediated can then be estimated by 1γ^zβ^zlogit.standard.

For probit regression, the residual variance is fixed to one while Var(γmεm+εy)=γm2Var(εm)+1. Thus the standardized probit solution is to adjust the routine probit regression estimate of β^z by


When using the standardized estimate of βz, the standard error of the estimate needs to be scaled as well. Bootstrap resampling can also be used to obtain the standard error.

Simulation study

A simulation study was conducted to demonstrate the discrepancy between βz – γz and γmαz and to investigate the performance of different estimators for the proportion mediated. The mean and variance of different estimators of the mediated effect and the proportion mediated were investigated as a function of sample size and size of relations among variables. In addition, the simulation allows the exploration of robustness to both the logistic assumption in Equation (4) and to the normality assumption for the distribution of M|Z.

The Z variable was always dichotomous with equal numbers in each group, and the mediator M was either normally distributed, distributed as a t-distribution with three degrees of freedom, or having large skewness in the error; these were generated using a chi-square distribution on three degrees of freedom. This latter situation was chosen to mimic the primary departure from normality that we observed in our actual data example below. For all of these cases we still apply least squares so that we can examine the impact of misspecifying the error distribution for M|Z.

The outcome variable Y*|MZ was generated to have either an underlying logistic or a normal error distribution. Thus with the assumption of a logit error, Y|MX follows Equation (1) with a similar probit model for normal errors. Estimates were compared for both logistic and probit regression with underlying probit errors so that we can assess the impact of misspecifying the conditional model for Y|MX. Simulated sample sizes were 50, 100, 200, 500, 1000, and 5000. Parameter values for αz, γm, and γz were chosen to correspond to four different effect sizes ([36], p. 412–414) of small (2% of the variance), medium (13% of the variance), large (26% of the variance), and very large (40% of the variance). These parameter values correspond to 0.14, 0.39, 0.59, and 1, respectively in a population mediation model with continuous variables. For each of these 64 combinations of parameter values and all six sample sizes used in the simulation, there were 500 replications. In addition, for every combination of parameter values we computed one replication for a sample size of 20 000, using this as the true expected value for infinite sample size.

The accuracy of the point estimates were compared to the parameter values generating the latent Y* data. Also, by comparing the means across different estimators for the largest sample size (20 000), we can examine the variation in population averages of the different parameterizations. Analysis of Variance was used to estimate effects of parameter value and sample size in order to summarize results. To conserve space, details of the analysis of variance are not presented.

Graphical comparison of the difference in coefficient and product of coefficient methods

We begin with a graphical comparison of the product of coefficients with both standardized and unstandardized difference in coefficient methods, all derived under a correctly specified model. For logistic regression, a comparison of the unstandardized β^zγ^z, the standardized β^zlogit.standardγ^z, and the product of coefficients γ^mα^z estimates of mediation effects is shown in Figure 1 for the case where M|Z is normally distributed. In this figure, the large sample expected values, obtained through 500 simulations of the three estimators of the mediated effect with n = 20 000, are plotted as a function of increasing values of the γm coefficient, averaging across values for the relation between the independent variable and the mediator (αz). As the relation between the mediator and dependent variable gets larger, the size of the expected mediated effect should increase linearly because this relation is one of the components of the mediated effect. As shown in Figure 1, with increasing values of the γm parameter, all three estimators increase at different rates. While the mean of the estimated product of coefficients measure, γmαz, has the highest rate, the difference in coefficients measure, βz – γz, tends to flatten out at larger values of γm. Also shown in this figure, the standardized measure for β^zlogit.standardγ^z is very close to the γ^mα^z values. This result demonstrates the distortions possible when using the unstandardized difference in coefficients method. A similar pattern in the three estimates exists for probit regression and for other combinations of logit and probit error and logistic and probit regression.

Figure 1
Average values of mediated effect estimates: logistic regression

Figures 2 and and33 demonstrate problems with the βz – γz method of estimating mediation. Figure 2 shows the value of βz – γz as function of γz for the case where the true model holds αz at zero and allows γz to vary. The true mediated effect is zero for all conditions in the plot and the αzγm estimator of the mediated effect correctly has a value of zero for all values while the βz – γz estimate departs from zero and becomes more negative as γz increases. Figure 3 shows similar information for the two estimators with αz held at 0.14 and γm varied so that the mediated effect should increase as γz is increased. The value of βz – γz varies as function of the value of γm and for γz equal to 1, βz – γz increases slightly and then declines even though the true mediated effect is increasing.

Figure 2
Estimated mediated effects as function of γm when αz is held at 0: logistic regression
Figure 3
Estimated mediated effect β^zγ^z as function of γm and γz with αz = 0.14: logistic regression

Numerical comparison of mediation measures under the correct and incorrect models for the dependent variable

The first two tables are based on logistic and probit models with each fit using the appropriate model. The third table fits a logistic model to data generated from a probit model. For Tables 16 below, we provide three rows for comparison. The first two rows compare the replication averages to population quantities whereas the last row examines variability due to replications. The first row, labeled Mean (μn), provides a mean value of the estimates for each sample size (n), averaged over the 64 combinations of parameter values. Since all of the parameter values chosen are positive, a mean value that is smaller than those used to generate the data, labeled in the table as μ, indicates attenuation. Changes across this row reflect sample size bias. The second row, labeled Standard Deviation (σn), measures the average standard deviation of the replication-averaged means over the 64 different parameter value simulations at that sample size. These values can be compared to the true standard deviation across the 64 sets of parameter values (σ). The last row, labeled Average Squared Distance, measures the average squared distance of the estimate from the expected value of the estimate. The expected value of the estimate was obtained through 500 simulations with n = 20 000.

Table 1
Comparison of the product of coefficients and difference in coefficients estimators of the mediated effect under a correct logistic and correct normality assumption
Table 6
Comparison of the product of coefficients and difference in coefficients estimators of the mediated effect under an incorrect logistic and incorrect normality assumption (probit and χ2(2))

Table 1 compares the three estimators of mediated effect when the underlying data are both generated and modeled with a logistic distribution. The distribution of M|Z is generated from the normal and modeled as normal. For the product of coefficients method which is listed first, the first row shows there is very little overall bias for large samples (0.281) and very modest bias for small samples (0.314), compared to the true population mean (0.281). In terms of variability of these estimates across the sets of parameters, the second row shows substantial variation at sample size of 50 (σn = 0:439 relative to σ = 0:258), but minimal difference in these estimates with sample sizes of 500 or above. The last row indicates that the replication variances for the product of coefficients method decrease smoothly to zero.

The unadjusted difference in coefficients method exhibits several unfavorable behaviors. The first row shows it produces systematic attenuation, with μ5000 = 0:213 compared to μ = 0:281. Also, these means are nearly constant over the different sample sizes. Variability across these 64 replications in the average difference in coefficients estimates shows a similar pattern of decrease with sample size compared to the product of coefficients method.

Standardizing the difference in coefficients under a logistic model does substantially reduce the attenuation, as indicated by the first row in the third part of this table. Also, σn is quite appropriate given the 8% attenuation that still occurs for a sample size of 5000.

Table 2 shows the same comparisons of mediated effect when the underlying data are both generated and modeled with a probit distribution. Again, the distribution of M|Z is taken from the normal and modeled as such. As shown in the three parts in Table 2, the γmαz and βz – γz estimators are not equal but standardization makes βz – γz very close to γmαz.

Table 2
Comparison of the product of coefficients and difference in coefficients estimators of the mediated effect under a correct probit and correct normality assumption

The product of coefficients method behaves just as well under a probit model as it does under a logit model. There is very little bias (row 1), and variability in these estimates behaves quite well for large and small sample sizes (row 2).

Examination of the unadjusted difference in coefficients method in Table 1, leads to a very different conclusion. First, there is large attenuation of this method (μ5000 = 0:160 compared to μ = 0:281), even more than under a logistic model. Also, compared to the product of coefficients, there is larger variance as a function of sample size (σ50/σ5000 = 1:76) for this difference in coefficients method. However, standardization succeeds very well, making each estimae of βzlogit.standardγz nearly identical to those of γmαz.

The sensitivity of these estimates to the assumption of a logistic distribution is investigated in this section. Table 3 shows the behavior of the different estimators for the mediated effect when the underlying data are generated with a probit distribution but analyzed with logistic regression. We note that these two distributional assumptions are difficult to distinguish in practice unless large amounts of data are available. Thus methods that behave differently under these logistic and probit assumptions are not robust against this fairly innocuous change in models. Again, the distribution of M|Z is taken from the normal and modeled as such.

Table 3
Comparison of the product of coefficients and difference in coefficients estimators of the mediated effect under an incorrect logistic and correct normality assumption

Unlike the two previous tables where definitions of mediation could be computed directly from the parameters themselves, this table's true mediated effects had to be defined numerically based on finding a best-fitting linear logistic model applied to data generated by a linear probit model. We found μ = 0:475 and σ = 0:442.

As shown in Table 3, the three methods produce quite different results, with the difference in coefficients having the smallest values, the product of coefficients method having the largest values, and the standardized method having a mean 13% smaller than the first. For large sample sizes, the product of coefficients methods agrees very closely with the mean over the 64 sets of parameter values (μ5000 = 0:475). We do, however, observe in row 1 of this table a relatively large bias of order n1. This overestimation in small samples under an incorrect logit assumption is somewhat larger than the overestimation when the true probit model is used. In contrast, the unadjusted difference in coefficients method shows severe attenuation at large samples, even more than it did under a correctly modeled probit. The unadjusted difference in coefficients method produces a modest negative n1 bias making attenuation larger in smaller sample sizes.

Simulation for the proportion mediated measures

We next compared five different estimators of the proportion mediated, using the product of coefficients and unstandardized and standardized estimates of difference in coefficients. Note that the first estimator is based on coeffients that need no standardization; because the second and third estimators use βz in the denominator, we anticipate that they will contain bias. The fourth and fifth estimators use the standardized βz in the denominators and because of this we expect a priori that they will do nearly as well as the first. For Table 4 the dependent variable is generated from a logit and modeled using logistic regression, with the mediator having a normal distribution and modeled as such. The true mean of the 64 simulations for the proportion mediated is 0.343 and σ = 0:227. Both of these agree well with the μ5000 and σ5000 for the first estimator, α^zγ^m(α^zγ^m+γ^z). However, this estimator performs extremely poorly for samples of 500 or smaller since at these sample sizes the variability in the estimates is extreme. This is caused by the denominator occasionally being near zero. For the second and third estimators, which are both based on unstandardized βz, both are severely biased in large samples and are therefore not recommended. The last two estimators, which are based on denominators with standardized βz both have modest bias in large samples (μ5000 = 0:360 and 0.335, respectively versus a true value of 0.343) and agree well across all 64 simulations, as indicated by the fact that σ5000 is only modestly larger than σ = 0:227. Somewhat surprisingly, both of these estimators perform better than the first in small samples, although there is still substantial variability across replications.

Table 4
Comparison of the product of coefficients and difference in coefficients estimators for the proportion mediated estimators under a correct logistic assumption and correct normal assumption

The results for probit regression under a true probit model mirror these results for a correct logistic regression and therefore are not presented in tabular form but only described. Again, the large sample fits using α^zγ^m(α^zγ^m+γ^z) are the best, followed by the last two estimators that use standardized βz. The two nonstandardized models have very large bias in large samples. Also as in the previous case with logistic regression modeling a true logit model, there is extremely high instability of the first estimator for sample sizes of 500 or below. We conclude that the measure of the proportion mediated is inherently difficult to estimate in small to moderate sized samples.

We now turn to estimation of the proportion mediated when the data are generated under a probit but analyzed incorrectly with logistic regression. As the results are similar to the two correctly analyzed logistic and probit regressions just described, no table is shown. Again, large sample bias for α^zγ^m(α^zγ^m+γ^z) is negligible, but this estimator is still subject to extreme variance in small to moderate samples. The last two estimators with standardized βz in the denominators perform much better in small samples and show only a small amount of bias with a sample size of 5 000. However, both of these estimators that rely on unstandardized βz show substantial bias in large samples and cannot be recommended.

Simulation results when both conditional distributions are incorrectly specified

To investigate how sensitive results are to the underlying distributional assumptions, we investigated the behavior of the three estimators of mediation effect in two situations where the conditional distributions of Y and M were both misspecified. Again, for both of these simulations we assumed a true probit model but used logistic regression to model how Y depends on M and Z. In the first simulation described below, we assumed that M|Z was normal, but the true error distribution for the mediator was given a t-distribution with three degrees of freedom. That is, M=αz(Z32)+T33, z = 1, 2 with the scale factor chosen so that the variance of the error term is one. This symmetric error distribution with large but finite variance is often used in studies of robustness to a small proportion of large errors. The second simulation was generated to resemble the data in our example described below. These data had substantial skewness (κ1 ~ 2). In our simulation, the mediator was generated as M=αz(Z32)+(χ22E(χ22))2 so that the skewness in the data were exactly 2. The errors were weighted so that the standardized regression coefficients for the mediator were again small (0.14), medium (0.39), and large (0.59) to match the earlier simulations.

In Table 5, where the t-distribution is incorrectly modeled as normal, the product of coefficients method performs quite well. In fact, the simulation results for this estimator are nearly identical to the ones obtained in Table 3 where conditional normality for M was correctly assumed. In fact, the means and variances are almost identical for the product of coefficients method (0.475 versus 0.475 for μ5000 and 0.443 versus 0.444 for σ5000). There is also very good agreement between the product of coefficients method and the standardized difference in coefficients method across all sample sizes, much better than that in Table 3. Both of these methods show positive bias at smaller sample sizes. Just as before, the unstandardized difference in coefficients method shows substantial bias in large samples and little change in bias as a function of sample size.

Table 5
Comparison of the product of coefficients and difference in coefficients estimators of the mediated effect under an incorrect logistic and incorrect normality assumption (probit and t(3))

In Table 6, with an asymmetric error distribution, we anticipate some instability of these estimators because they are computed using two wrong distributional assumptions. Only the product of coefficients method provides very little overall bias. In particular, μ5000 = 0:484, which agrees very well with value of μ5000 = 0:475 from Table 3 and 0.475 from Table 5, the two previous models with the same incorrectly specified logistic model. There is somewhat more variation in these estimates with a asymmetric error distribution than in the previous two cases (σ5000 = 0:577 compared to 0.444 when normality is correctly assumed). These differences, however, are quite minimal compared to the instability of both the unstandardized and standardized difference in coefficients methods for the normality, long-tailed, and asymmetric error distributions. In particular, μ5000 varies from 0.235 to 0.364 for the unstandardized difference in coefficients and from 0.410 to 0.582 even with standardization.

Case study with intentions as a mediator for cigarette use

Mediation for a binary dependent variable is illustrated here with a data set from a school-based drug prevention program. The Midwestern Prevention Project (MPP) was a longitudinal school- and community-based drug prevention program. The intervention program was introduced to 6th and 7th grade students in Kansas City and consisted of school and parent programs, health policy changes and mass media coverage (for full details see Pentz et al. [37]). Schools were randomly assigned to treatment or control conditions. In this example we posit that behaviors are preceded by behavioral intentions, and test whether intention to use cigarettes is a mediator of the program effect on cigarette use. The data set consists of 864 students with complete data on three variables: Z treatment condition; M intention to use tobacco in the following 2-month period, measured approximately 6 months after baseline; and Y report of tobacco use at follow-up in the prior one month period. Table 7 contains the cross-tabulations of these three variables, as well as the coding schemes.

Table 7
Example data set from Midwestern Prevention Project (N = 864)

The mediated effect was estimated using Equation (11) and least squares regression of M on Z. When calculated as α^zγ^m, the mediated effect was –0.171 (σ^αzγm2=.065, CI = –0.298, –0.044) When calculated as β^zγ^z the mediated effect was −0.129 (σ^βzγz=0.089, CI = –0.303, –0.046), a discrepancy of 25% and a formal difference in whether a significant mediation was found. This discrepancy is also present in the various proportion mediated measures, with 1γ^zβ^z=0.254,α^zγ^mβ^z=0.338, and α^zγ^mγ^z+α^zγ^m=0.312. Although the proportion mediated estimates in this example are all >0.20, the proportion mediated estimates using the α^zγ^m estimate are 20 to 30% greater than the size of the estimate using β^z and γ^z.

Equation (12) was used to standardize the βz logistic regression estimate. After standardization, the difference in coefficient mediated effect and the two proportion measures involving proportions were recalculated. The standardized estimate of the difference in coefficients estimate was –0.197, much closer to the product of coefficients estimate. The standardized proportion measures are 1γ^zβ^z=0.343 and α^zγ^mβ^z=0.297. These results are also consistent with the simulation study; standardization brings the estimates much closer together. In this example, the use of the unstandardized estimates would lead to the conclusion that the mediated effect was smaller than it truly was.


The purpose of this article was to examine two different approaches to estimating the intermediate endpoint or mediated effect when the dependent variable is binary and logistic or probit regression is used. Unlike the linear model situation, the difference in coefficients and product of coefficients estimators can lead to substantially different estimates and inferential conclusions. We recommend using the product of coefficients methods for the following reasons. It generally has less bias than the difference in coefficients method, it reflects the population mediated effect, and in our simulations it was quite robust against departures from the logistic or probit assumption as well as the normality assumption for the distribution of the mediator. This estimator behaved well under both a symmetric error distribution with heavy tails as well as an asymmetric error distribution. Another advantage of this method is that it is the easiest to compute.

We do not recommend the use of the difference in coefficients method. The unstandardized procedure can indicate no change in the mediated effect when it is actually increasing. This effect is due to the difference in scales of the logistic regression coefficients of Y on Z adjusting or not adjusting for M as these coefficients come from separate logistic regression equations. There may be situations where the mediated effect is defined in terms of a difference in coefficients so the method may be appropriate. The same problem also affects the proportion mediated measures based on this difference in coefficients method. Standardizing coefficients prior to calculating these quantities does improve the performance of the difference in coefficients method.

None of the five estimators for the proportion mediated effect were fully satisfactory. The method based on the product of coefficients performed well in samples over 500 both in terms of bias and robustness to the probit assumption, but this estimator showed a high degree of variability in small samples. The two estimators based on the unstandardized difference in coefficients method showed marked bias even in large samples. By standardizing the denominator for these two estimators, there was a marked decrease in bias compared to the unstandardized version, and the results were somewhat more stable across different sample sizes. We recommend not using any of these estimators for samples below a few hundred and in large samples recommend estimating the proportion mediated based on the product of coefficients method.

Although the results discussed in this article are for the binary X case, the results were very similar when the study was conducted with a continuous X. The only important difference in the results was related to the α^zγ^mγ^z+α^zγ^m proportion mediated estimator, which performs much better when Z is continuous.

A limitation of the current study is that is does not discuss multiple mediator models. More complicated models would have the same issues with mediation estimation as described in this article. Software for the latent variable approach is available for more complicated models [38]. Similarly, it was assumed that the form of the true mediation model was known and no moderation existed. With actual data, the model is not known and issues related to the true relations among Z, M, and Y may be more difficult to disentangle.


This paper is dedicated to James H. Dwyer. We thank our colleagues in the Prevention Science and Methodology Group (PSMG) as well as anonymous reviewers for many helpful comments on this article. We thank Ghulam Warsi and Myeong sun Yoon for assistance with this research and Christopher Winship for comments on the manuscript. We thank Ellen Laing and L. Terri Singer for their help in producing this manuscript. We also thank Mary Ann Pentz for the use of data from the Students Taught Awareness and Resistance study. Part of this work was presented at the 1991 Psychometric Society Meeting. This research was supported by U. S. Public Health Service Grants R01-DA09757, R01-MH40859, R01-MH42968, and P30-MH068685.

Appendix: Mediated effects based on the product of coefficients method

This section shows the consistency of the product of coefficients method for logistic regression when Y|XMZ is given by Model 1 and the conditional mean of M|XZ is linear as in Equations (2) and (3). A formula for the variance of this product of coefficients estimator is then given. Define γ^ as the maximum likelihood solution for the logistic regression coefficients γ=(γ0,γx,γm,γz) in Equation (1), the standard logistic regression model of Y predicted by M as well as Z and X. As the sample size gets large, γ^ are asymptotically unbiased estimates for γ. Similarly for the regression of M on Z adjusted for X in Equation (2), the least squares estimates α^ are asymptotically unbiased estimates for α provided the conditional mean is correctly specified as in Equation (3). Thus the product of these estimates is consistent.

Further, the following sandwich-type variance estimates are consistent as well. First define the ith observation's score function as Sγi=logPr{Y=yiX=xi,M=mi,Z=zi}γ evaluated at γ^. The information matrix for α is defined as Iγ=i=1n2γγlogPr{Y=yiX=xi,M=mi,Z=zi}. Then


Turning to the regression of M on Z and X, if fM|XZ;α(m|x, z) is known, it would be used to maximize the corresponding likelihood. For example, if this distribution is normal, define ϕ(u; μ, σ2) as the normal density. Then the ith observation's score function, with parameters θ=(α0,αx,αz,σm.xz2), is given by Sθi=logϕ(mi;α0+αxxi+αzZi,σm.xz2)θ evaluated at the least squares solution θ. Similarly, the information matrix for this problem is given by Iθ=i=1nθSθi. Then


and the diagonal element corresponding to αz, Avar(α^z) is therefore a consistent estimate of the variance of α^z provided the mean of M is linear in X and Z.

If the underlying distribution for Y|XZ is not known, then consistent estimates of α can still be obtained as long as the conditional model for the mean is correctly specified. In this case, the least squares approach described above provides consistent estimates of both the regression parameters and their standard errors [34].

Finally, we show that these GEE estimators of γ^ and α^z are asymptotically independent. Using a Taylor expansion on the score function for γ,


where Op(n–1) is a random term that converges to a distribution with order n–1. Explicitly, the ith observation's contribution to the logistic score function, Sγi is


where pi = Pr{Y = 1|X = xi, M = mi, Z = zi} = E(Y = 1|X = xi, M = mi, Z = zi).



Note that like Iγ, α^z depends only on T = {(xi, mi, zi), i = 1, . . . , n} and not on y. Then by conditioning first on T, we have


This result can easily extended to any GEE type of estimator of γ, of which the maximum likelihood from an exponential family distribution is a special case.


1In the social science literature four symbols are often used to express the mediation equations. Our notation in this article allows other covariates and is more in line with notation used in standard logistic regression modeling. This footnote shows their comparability. The symbol a, the regression of the mediator on the intervention condition, is given by αz in this paper's notation. The symbol b, the partial regression of the outcome on the mediator adjusted for the intervention condition, is given by γm in our notation. The symbol c for the regression of the outcome on the intervention condition without adjusting for the mediator is given by βz in our notation. Finally, c′, the partial regression of the outcome on the intervention condition adjusted for the mediator, is given by γz.


1. Susser M. Causal thinking in the health sciences: concepts and strategies of epidemiology. Oxford University Press; New York: 1973.
2. Susser M. What is a cause and how do we know one? A grammar for pragmatic epidemiology. Am J Epidemiol. 1991;133:635–48. [PubMed]
3. Statistical and mathematical modeling of the AIDS epidemic Papers from a conference.; Baltimore, Maryland. 17–18 November, 1987. [PubMed]Stat Med. 1989;8:1–139. [PubMed]
4. Henney JE. International Conference on Surrogate Endpoints and Biomarkers; National Institutes of Health. 1999; [21 November 2001]. Remarks by: Jane E, Henney, M.D. Commissioner of Food and Drugs. Available at
5. Criqui MH, Cowan LD, Tyroler HA, et al. Lipoproteins as mediators for the effects of alcohol consumption and cigarette smoking on cardiovascular mortality: results from the lipid research clinics follow-up study. Am J Epidemiol. 1987;126:629–37. [PubMed]
6. Lin DY, Fleming TR, De Gruttola V. Estimating the proportion of treatment effect explained by a surrogate marker. Stat Med. 1997;16:1515–27. [PubMed]
7. Freedman LS, Schatzkin A. Sample size for studying intermediate endpoints within intervention trials or observational studies. Am J Epidemiol. 1992;136:1148–59. [PubMed]
8. Begg CB, Leung DHY. On the use of surrogate end points in randomized trials. J R Stat Soc A. 2000;163:15–28.
9. Prentice RL. Surrogate endpoints in clinical trials: definition and operational criteria. Stat Med. 1989;8:431–40. [PubMed]
10. Freedman LS, Graubard BI, Schatzkin A. Statistical validation of intermediate endpoints for chronic diseases. Stat Med. 1992;11:167–78. [PubMed]
11. Fleming TR, DeMets DL. Surrogate end points in clinical trials: are we being misled? Ann Intern Med. 1996;125:605–13. [PubMed]
12. Molenberghs G, Buyse M, Geys H, Renard D, Burzykowski T, Alonso A. Statistical challenges in the evaluation of surrogate endpoints in randomized trials. Controlled Clin Trials. 2002;23:607–25. [PubMed]
13. Verbeke G, Molenberghs G. Linear mixed models for longitudinal data. Springer; New York: 2000.
14. Freedman LS. Confidence intervals and statistical power of the “Validation” ratio for surrogate or intermediate endpoints. J Stat Plan Infer. 2001;96:143–53.
15. MacKinnon DP, Warsi G, Dwyer JH. A simulation study of mediated effect measures. Multivar Behav Res. 1995;30:41–62. [PMC free article] [PubMed]
16. Buyse M, Molenberghs G. Criteria for the validation of surrogate endpoints in randomized experiments. Biometrics. 1998;54:1014–29. [PubMed]
17. Buyse M, Molenberghs G, Burzykowski T, Renard D, Geys H. The validation of surrogate endpoints in meta-analyses of randomized experiments. Biostatistics. 2000;1:49–67. [PubMed]
18. Burzykowski T, Molenberghs G, Buyse M. The evaluation of surrogate endpoints. Springer; New York: 2005.
19. Gail MH, Pfeiffer R, van Houwelingen HC, Carroll RJ. On meta-analytic assessment of surrogate outcomes. Biostatistics. 2000;1:231–46. [PubMed]
20. Frangakis CE, Rubin DB. Principal stratification in causal inference. Biometrics. 2002;58:21–29. [PubMed]
21. Foshee VA, Bauman KE, Arriaga XB, et al. An evaluation of Safe Dates, an adolescent dating violence prevention program. Am J Public Health. 1998;88:45–50. [PubMed]
22. Botvin GJ. Preventing drug abuse in schools: Social and competence enhancement approaches targeting individual-level etiologic factors. Addict Behav. 2000;25:887–97. [PubMed]
23. Judd CM, Kenny DA. Process analysis: estimating mediation in treatment evaluations. Evaluation Rev. 1981;5:602–19.
24. MacKinnon DP, Dwyer JH. Estimating mediated effects in prevention studies. Evaluation Rev. 1993;17:144–58.
25. Kenny DA, Kashy DA, Bolger N. Data analysis in social psychology. In: Gilbert DT, Fiske ST, Lindzey G, editors. The handbook of social psychology. McGraw-Hill; Boston: 1998. pp. 233–265.
26. MacKinnon DP, Lockwood CM, Hoffman JM, West SG, Sheets V. A comparison of methods to test mediation and other intervening variable effects. Psychol Methods. 2002;7:83–104. [PMC free article] [PubMed]
27. Sobel ME. Asymptotic confidence intervals for indirect effects in structural equation models. Sociological Methodology. 1982;13:290–312.
28. Krull JL, MacKinnon DP. Multilevel mediation modeling in group-based intervention studies. Evaluation Rev. 1999;23:418–44. [PubMed]
29. Baron RM, Kenny DA. The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. J Pers Soc Psych. 1986;51:1173–82. [PubMed]
30. Bollen KA. Structural equation with latent variables. Wiley; New York: 1989.
31. Hosmer DW, Lemeshow S. Applied logistic regression. Wiley; New York: 2000.
32. Finney DJ. Probit analysis: statistical treatment of the sigmoid response curve. Cambridge University Press; Cambridge, England: 1964.
33. Zeger SL, Liang KY, Albert PS. Models for longitudinal data: a generalized estimating equation approach. Biometrics. 1989;45:1049–60. [PubMed]
34. Draper MP, Smith H. Applied regression analysis. 3rd edition Wiley; New York: 1998.
35. McCullagh P, Neldes JA. Generalized linear models. Chapman Hall; London: 1989.
36. Cohen J. Statistical power analysis for the behavioral sciences. Erlbaum; Hillsdale, NJ: 1988.
37. Pentz MA, Dwyer JH, MacKinnon DP, et al. A multi-community trial for primary prevention of adolescent drug abuse: effects on drug use prevalence. JAMA. 1989;261:3259–66. [PubMed]
38. Muthén LK, Muthén BO. Mplus 4.1: User's guide. Author; Los Angeles: 2007.