|Home | About | Journals | Submit | Contact Us | Français|
Analytical solutions for point and variance estimators of the mediated effect, the ratio of the mediated to the direct effect, and the proportion of the total effect that is mediated were studied with statistical simulations. We compared several approximate solutions based on the multivariate delta method and second order Taylor series expansions to the empirical standard deviation of each estimator and theoretical standard error when available. The simulations consisted of 500 replications of three normally distributed variables for eight sample sizes (N = 10, 25, 50, 100, 500, 1000, and 5000) and 64 parameter value combinations. The different solutions for the standard error of the indirect effect were very similar for sample sizes of at least 50, except when the independent variable was dichotomized. A sample size of at least 500 was needed for accurate point and variance estimates of the proportion mediated. The point and variance estimates of the ratio of the mediated to nonmediated effect did not stabilize until the sample size was 2,000 for the all continuous variable case. Implications for the estimation of mediated effects in experimental and nonexperimental studies are discussed.
A mediator is a variable that accounts for all or part of the relation between a predictor and an outcome (Baron & Kenny, 1986). More formally mediation occurs when an independent variable causes a mediator which causes a dependent variable (see Sobel, 1990). Hypotheses regarding mediated or indirect effects have a long and important history in social science research (Alwin & Hauser, 1975; West & Wicklund, 1980). A well-known example of mediation in psychology is the extent to which intentions mediate the effects of attitudes on behavior (Ajzen & Fishbein, 1980). The effect of father’s socioeconomic status on son’s socioeconomic status that is mediated by son’s educational achievement has received considerable attention in sociology (Duncan, Featherman, & Duncan, 1972). Similar mediational hypotheses are present in other social science and psychological research (Baron & Kenny, 1986; James & Brett, 1984).
Mediational analyses are especially useful in studies where several mediators are targeted by an experimental manipulation, for example, mediators of the effect of threat on protective behavior (Breznitz, 1984) and mediators of the effects of different types of influence on persuasion (Cialdini, 1984). If the mediator is the only construct targeted by the manipulation, then the manipulation potentially allows for examination of the causal relationship between the mediator and the outcome. Often the manipulation changes other mediators in addition to the targeted mediator. And in many studies, the experimental manipulation is designed to change many mediators rather than one of them. More of these types of research studies are appearing in the research literature and statistical tests of the mediated effect are conducted in some of them (Bierman, 1986; Guerra & Slaby, 1990; Hansen & Graham, 1991, MacKinnon, et al., 1991).
One of the best examples of a manipulation targeting several mediators is the estimation of mediated effects in the experimental evaluation of health related prevention programs. Prevention and intervention programs are designed to change critical mediating constructs thought to be causally related to health outcomes (MacKinnon & Dwyer, 1993). For example, the Multiple Risk Factor Intervention Trial (MRFIT) was designed to reduce smoking, lower cholesterol, and lower blood pressure to prevent heart disease (Multiple Risk Factor Intervention Trial Research Group, 1990). Drug prevention programs seek to reduce drug use by increasing skills to resist drug offers and engendering norms less tolerant of drug use (Flay, 1985). AIDS and sexually transmitted disease prevention programs are designed to increase knowledge about early detection of disease, reduce barriers to screening, and change norms regarding screening (Murray et al., 1986; Shapiro, 1976). In the prevention case, mediation analysis assesses the extent to which the program changed the mediator which in turn changed the outcome variable. Mediational analysis in prevention studies is important because the processes that lead to behavior change are studied (MacKinnon, 1994). Estimation of mediated effects in these experimental prevention studies may differ from correlational studies because the independent variable is typically binary (0 = control group and 1 = treated group) rather than continuous.
The three variable mediation model, shown in Figure 1, is studied in this report. This one mediator model was studied because it facilitates the analytical and simulation tasks necessary for this research. The path from the program to the mediator to the outcome is the process of mediation. The indirect or mediated effect is equal to αβ. Other effects in the model include the direct effect, τ′, and the total effect τ = τ′ + αβ. These mediation effect measures can be supplemented with two measures of relative effect when the direct effect is nonzero. The proportion of the total effect that is mediated (αβ/(τ′ + αβ)) and the ratio of the mediated effect to the nonmediated effect (αβ/τ′) provide important information on the relative magnitude of the mediated effect. With these measures, for example, a researcher could state that half of an effect was mediated or that 30% of the total effect was explained by a particular mediator.
Until Sobel (1982, 1986) and Folmer (1981) derived the standard error of the mediated effect using the multivariate delta method, researchers had used a series of hypothesis tests to provide evidence for mediation (Baron & Kenny, 1986; Judd & Kenny, 1981) or had calculated mediated effects without a confidence interval for the effect. Stone and Sobel (1990) conducted a simulation study of the variance of the indirect effect assuming multivariate normal continuous measures and found that a sample size of at least 200 was required for adequate mediated effect variance estimation in the recursive model studied. Stone and Sobel did not study the case of mediation in experimental studies with categorical independent variables defining treatment conditions. MacKinnon and Dwyer (1993) conducted a simulation study for both binary and continuous independent variables but for one set of parameter values only and obtained results similar to Stone and Sobel. The point and variance estimators of the ratio of the mediated to the nonmediated effect and the proportion mediated have not been studied although Sobel (1982) provides the first order multivariate delta solution for their variance. There has been prior work on the densities of functions of random variables such as product and ratio however (Lominicki, 1967; Springer, 1979).
The purpose of this article is to describe the results of a simulation study investigating two major aspects of estimating mediated effects. First, the performance of point and variance estimators of mediated effect measures when the independent variable is binary (0 for the control group or 1 for the prevention program group) was investigated, as this is the typical case in the analysis of mediation in experimental studies. Second, simulation studies were conducted to compare point and variance estimators of mediated effect measures to be described in the Method section. For the mediated effect, these estimators were: (a) the multivariate delta method approximation to the standard error of the mediated effect (Sobel 1982, 1986), (b) second order or exact variance of a product, (c) unbiased estimator of the exact variance of a product (Goodman, 1960), and (d) a method based on an alternative point estimator of mediation (McGuigan & Langholtz, 1988). First and second order Taylor series solutions for the variance estimators of the proportion mediated and the ratio of the indirect to the direct effect were studied. The behavior of all estimators was examined in a wide range of sample sizes and combinations of parameter values.
The mediated effect can be calculated in two ways. In the first method, the following two regression equations are estimated.
Y0 is the outcome variable, Xp is the program or independent variable, XM is the mediator, τ codes the relationship between the program to the outcome in the first equation, τ′ is the coefficient relating the program to the outcome adjusted for the effects of the mediator, ε1 and ε2 code unexplained variability, and the intercept is assumed to be zero.
In the first regression, the outcome variable is regressed on the independent variable. In the second regression, the outcome is regressed on the independent variable and the mediator. The value of the mediated or indirect effect equals the difference in the program coefficients (τ − τ′) in the two regression models (Judd & Kenny, 1981). If the treatment coefficient (τ′) is zero when the mediator is included in the model, then the program effect is entirely mediated by the mediating variable.
A second method also involves estimation of two regression equations, and is illustrated in Figure 1. First, the coefficient in the model relating the mediator to the outcome is estimated (β) in Model 2 above. Second, the coefficient (α) relating the program to the mediating variable is estimated.
The product of these two parameters (αβ) is the mediated or indirect effect. The coefficient relating the treatment variable to the outcome adjusted for the mediator (τ′) is the nonmediated or direct effect. The rationale behind this method is that mediation depends on the extent to which the program changes the mediator (α) and the extent to which the mediator affects the outcome variable (β).
As described above, the τ − τ′ and αβ methods are alternative computational approaches to the estimation of the mediated effect. As shown in the derivation below, the two methods yield identical estimates of mediation when the dependent variable is continuous and ordinary regression is used to estimate Model 2 and Model 3 above (τ′ = direct effect; τ = total effect; τ − τ′ is the difference in the total and direct effects; αβ = indirect or mediated effect). Although the derivation is based on population values, in every sample we have examined, c − c′ = ab, where a, b, c and c′ are the estimators of α, β, τ and τ′ respectively.
The variance of the product of two independent random variables such as estimators a and b of coefficients α and β is discussed in several mathematical statistics texts (Mood, Graybill, & Boes, 1974; Rice, 1988). The estimators a and b are independent as described in Sobel (1982) and can be derived based on the asymptotic covariance matrix among parameter estimators described in Bollen (1989, pp. 107–109). The variance of the product of two independent random variables a and b is equal to: , where μ denotes the mean and σ denotes the standard deviation of the random variable. The sample variance estimator is , where sa is the sample standard deviation of a and ā is the sample mean of a. The asymptotic variance of the indirect or mediated effect was derived by Sobel (1982, 1986) using the multivariate delta method. The multivariate delta method is a general method to determine the variance of functions of random variables that follow a multivariate normal distribution (Bishop, Fienberg, & Holland, 1975). The method consists of pre- and post-multiplying the covariance among the relevant random variables by the partial derivatives of the new functions of the random variables. Sobel’s variance estimator based on the multivariate delta method does not include the term because it is claimed to be small compared to the other two terms and the analytic solution is based on first order derivatives. Goodman (1960) has shown that the unbiased estimator of the variance of the product of two random variables subtracts rather than adding this value.
McGuigan and Langholtz (1988) derived the variance of the indirect effect for the τ − τ′ method of determining mediation. The statistical test of mediation is given in the formula , where c and c′ are the estimators of τ and τ′, s is the sample standard deviation and the covariance between c and c′, (rscsc′) is the Mean Square Error in Model 2 divided by the sum of the squares of the independent variable.
Four analytical solutions for the variance of the mediated or indirect effect were studied:
where r is the correlation between c and c′, and the covariance between c and c′ (rscsc′) is the Mean Square Error in Model 2 divided by the sum of the squares of the independent variable. Each of the four estimators were compared to the analytical variance based on true variances of a and b (Hanushek & Jackson, 1977).
The ratio of indirect to the direct effect is defined as αβ/τ′ and the proportion mediated is αβ/(τ′ + αβ). The estimators of the ratio of indirect to the direct effect and the proportion mediated are ab/c′ and ab/(c′ + ab), respectively. Four analytical solutions for the variance of the estimators of the proportion mediated and the ratio of the indirect to the direct effect were compared. No exact solutions are available so we use approximations in MacKinnon and Warsi (1991).
The second order Taylor series expansions for the ratio and the proportion are available by writing to the first author.
The SAS© (Statistical Analysis System) programming language was used to conduct the statistical simulations. The data were generated from the normal distribution (Box & Muller, 1958) transformation in the RANNOR function. The current time was used as the seed for each simulation. Eight different sample sizes of 10, 25, 50, 100, 200, 500, 1000 and 5000 were chosen to reflect sample sizes in social science studies. For each of the parameters α, β and τ, four different values, .1, .3, .5 and .7 were used for a total of 43 = 64 different parameter combinations. The parameter values were chosen to reflect a variety of values that also reflect commonly observed relationships. Eight sample sizes for each of 64 parameter combination, yields 512 simulations. The population value of the error variances was equal to 1. Each simulation was replicated 500 times. Simulations for the binary and continuous independent variable were identical except that the independent variable was dichotomized prior to the regression analysis in the binary independent variable case.
The performance of each analytical solution was assessed with measures of bias and relative bias (Stone & Sobel, 1990) Mean Square Error, and empirical confidence limits. We use the same measures of approximate bias (βi) and relative bias (Rβi,) used in Stone and Sobel to compare estimates of the mediated effect (ŵ) to true values (w) or approximate true value when the exact true value is not known,
where ŵi and wi are the estimate and approximate true value at each replication. The Mean Squared Error of the estimators was obtained by squaring the bias measure. The same measures were used to evaluate estimates of the proportion mediated and ratio of mediated to direct effect and their standard errors.
Confidence intervals were examined by determining the proportion of times confidence intervals were to the left or right of the value of the mediated effect. The large sample 95% confidence limits were constructed using the mediated effect estimate plus and minus 1.96 times the estimate of the standard error of the mediated effect. It is expected that 2.5% of the confidence intervals will be to the left of the true value of the mediated effect and 2.5% will be to the right of the true value for a total of 5% of the confidence limits that will not include the true value.
The simulation outcome measures were computed for point estimators of the mediated effect (αβ), the ratio (αβ/τ′) of mediated to nonmediated effect and the proportion of mediated to the total effect (αβ/(τ′ + αβ)). These measures were also computed for four different estimators of the standard error of the mediated effect and two estimators for the proportion mediated and the ratio of the mediated to nonmediated effect.
Following recommendations that simulations be treated like any other experimental study (Hauck & Anderson, 1984), we conducted an analysis of variance with sample size, binary or continuous independent variable, parameter values, and interactions to model the variability in the simulation outcome measures. ANOVA effects on bias and relative bias measures were estimated. The difference between the number of confidence intervals to the left minus the number of confidence intervals to the right of the value of the effect was also determined.
We conducted the entire simulation described below with 100 replications, but then increased the number to 500 to see if the small number of replications may have altered the results. There were no major differences between the results for 100 and 500 replications, so we only present the results for 500 replications.
Tables 1–9 pool the results across the 64 parameter value combinations so the mean for αβ = .16, αβ/τ′ = .67, αβ/(τ′ + αβ) = .30. Differences for specific parameter values are discussed below when the complete simulation results are analyzed with ANOVA.
As shown in Table 1, the point estimates of αβ had very little bias for any sample size. In contrast, as shown in Tables 2 and and3,3, the estimates of the ratio and the proportion were quite different from the true values, and often did not even have the correct sign for small sample sizes (10, 25, 50 and 100). These sample sizes are deleted from Tables 2 and and3.3. The bias of these estimates is slightly higher when the independent variable is dichotomized except for sample size 200 for the ratio case. The proportion point estimator ab/(c′+ ab) appears to stabilize at a sample size of 500, but is less stable for the case of a dichotomous independent variable. The point estimator of the ratio, ab/c′, does not appear to stabilize until the sample size is 5000. The stability of the estimator ab/(c′ + ab) does depend on values of α and β.
Although the standard error estimators of the mediated effect were quite similar for all sample sizes (see Table 4), there was one major difference. For the case of the binary independent variable, the McGuigan and Langholtz (1988) estimator was approximately two to three times larger than the true standard error. There were several small differences among the estimators, as shown in Tables 4 and and5.5. Generally, the first order Taylor series estimator in Sobel (1982) performs the best. As in Stone and Sobel (1990), the standard error of the mediated effect is overestimated for smaller sample sizes.
As shown in Table 6, for both ab/c′ and ab/(c′ + ab), variance estimators obtained from first order Taylor series approximations were quite large, especially for small sample sizes. Reasonably accurate estimates were obtained at sample sizes of 500 or higher for the proportion mediated, and 5000 for the ratio. The second order Taylor series estimator was superior to the first order solution and estimators incorporating a possible correlation between the b and c′ estimates were better than estimators that assumed no correlation between b and c′.
To more clearly identify when the point and standard error estimators of the ratio and proportion stabilized, we conducted the same simulations with100 replications and sample sizes of 2000, 3000, and 4000, as shown in Tables 7 and and8.8. The point and variance estimators of the ratio appear to stabilize around a sample size of 2,000 for the case of continuous measures. For the binary independent variable case, the ratio stabilizes at 4,000. As described earlier, the proportion measure stabilized at a 500 sample size and becomes more accurate at larger sample sizes.
The proportion of confidence limits based on the three estimators of the standard error of the mediated effect to the left and right of the true value is shown in Table 9. At all sample sizes, the percentage of confidence intervals to the left and right of the true value is approximately 5%. As in Stone and Sobel (1990), however, at smaller samples more confidence intervals are to the left of the true value. As sample size is increased the proportions on either side become more similar.
An ANOVA on the relative bias dependent measure was conducted to determine whether the estimators were affected by sample size, binary or continuous independent variable, and the value of parameters α, β and τ′. Relative bias decreased with increasing sample size. The binary independent variable was associated with significantly more relative bias in point estimates and standard errors. In several cases, relative bias was a function of parameter values. The relative bias in the mediated effect point and variance estimators decreased as the α and β parameters increased.
The relative bias in point and variance estimators of the ratio and proportion mediated decreased as τ′ increases. There was also evidence that relative bias in the standard error of the ratio increased when the α parameter increased. The relative bias in the standard error of the proportion mediated was dependent on all the parameter values and their interactions. The relative bias of the standard error of the proportion decreased as the α, β, and τ′ parameters increased, but the pattern of interaction effects suggested a complicated relationship among the parameter values. For example, for α = .1 and .3, relative bias increases or stays the same as β increases, but at α = .5 and .7, relative bias decreases with larger values of β.
To determine whether sample size, binary or continuous independent variable and the parameter values had any effect on the asymmetry of percentage of confidence intervals to the left and right of the true value of the mediated effect, an ANOVA was performed on the difference of the left and right values as shown in Table 9. The parameter τ′ had no significant effect while the sample size, the parameters α and β and its interaction αβ had significant effect. In general, the asymmetry decreased with increasing sample size. For the McGuigan and Langholtz (1988) estimator, the scale of the independent variable whether binary or continuous had an effect. The statistically significant effects of the parameter values and their interaction suggests that the results of the simulation may have been different if only one or a few sets of parameter values were used.
Point estimates for the mediated effect had very little bias for all sample sizes studied, and all estimators of the standard errors for the mediated effect are quite close for sample sizes greater than 50. The first and second order Taylor series estimators were quite similar. When the independent variable is coded 0 or 1 (control versus intervention), the standard errors of the mediated effect are larger. Even when the independent variable is binary, the standard errors are quite close to the theoretical standard error for all but the τ − τ′ standard error. The standard error estimator described in McGuigan and Langholtz (1988) should not be used in the analysis of studies with binary coding of the independent variable such as experimental studies.
Ratio and proportion point estimates did not stabilize until sample size of 500 for the proportion and 5000 (for a binary independent variable) for the ratio. The required sample size also varies with the magnitude of effects. The standard error did not stabilize until sample size of 500 for the proportion. Similarly, the second order Taylor series solutions for the ratio and proportion performed slightly better than the first order solution. The standard errors of both ab/c′ and ab/(c′ + ab) are quite large and unpredictable, except at very large sample sizes and for certain parameter values. Since the bias in point and variance estimates are quite large, researchers should be very cautious in interpreting the ratio value. For the proportion, sample size of at least 500 appears to be necessary, although it was a function of parameter values. The proportion and ratio are likely to be unstable because they are ratios of random variables rather than the product as for ab. The proportion mediated is probably more stable than the ratio because the denominator, c′ + ab, will be larger than c′, the denominator for the ratio.
A simple completely specified mediation model with no latent variables was studied here. The normal distribution was assumed for all variables, except Xp, in which case the least squares and the maximum likelihood estimates are identical. We are now examining mediation in more complicated mediation models, including non-normal distributions, multiple mediators, latent variables, logistic and probit regression, and longitudinal models (MacKinnon, Dwyer & Warsi, 1992). The preliminary results of these studies of more complicated models are generally consistent with those presented here for the three variable mediator model. The results for the three variable model studied here are important because the model provides information in deciding how an experimental manipulation such as a prevention program achieved its effects.
This research was supported by a Public Health Service grant (DA06211). Part of this work was presented at the 1991 Psychometric Society Meeting. We thank Michele Nowling for manuscript preparation and Michael Sobel, Leona Aiken, Sanford Braver, and Steve West for comments on the manuscript. The following derivations can be obtained by writing to the first author: the independence of the estimators a and b, true variances of a and b, and the first and second order Taylor series solutions for the proportion and ratio measures. The first author may be reached at the Department of Psychology, Arizona State University, Tempe, Arizona 85287-1104.
Statistical Analysis System is a registed trademark of SAS Institute, Inc., Cary, North Carolina.
David P. MacKinnon, Arizona State University.
Ghulam Warsi, Arizona State University.
James H. Dwyer, University of Southern California.