The purpose of this article was to evaluate two alternatives to improve confidence limit coverage for the indirect effect. One method incorporated the distribution of the product to construct confidence limits and the second method used resampling approaches that make fewer assumptions about the distribution of the indirect effect. Both methods led to improved confidence limit coverage compared to the traditional method based on the normal distribution. In Study 1, the confidence limit coverage for the method based on the distribution of the product was nearly always better than the traditional method for a large number of combinations of effect size. In Study 2, resampling methods had better performance than the method based on the normal distribution, with the exception of the jackknife. Not all resampling methods were superior to the distribution of the product methods, however, which suggests caution in selecting a resampling method. The best single sample tests are based on the confidence limits using

*M* from the

Meeker et al. (1981) tables or the empirical-

*M* confidence limits based on empirically generated critical values. Both use the distribution of the product to create confidence limits and test the significance of the results. Either of these tests are the single sample method of choice. The empirical-

*M* test did have slightly better performance than the

*M* test in terms of times outside the robustness interval but overall power and Type I error rates were very similar. If a researcher does not have the raw data available for analysis, resampling methods described in this article cannot be conducted and these single sample methods are the only methods available.

The bias-corrected bootstrap is the method of choice if resampling methods are feasible. There are limitations to the use of resampling methods, however. These include the lack of statistical software to conduct the analysis and the increased computational time required for the analysis. The bias-corrected bootstrap is included as an option in the AMOS (

Arbuckle & Wolhke, 1999) covariance structure analysis program. The EQS program and LISREL programs can be used to conduct resampling analyses but the user must write a separate program to compute the bootstrap method after these programs write out the results of each individual bootstrap sample. All of the resampling tests used in this article are now included in an updated version of the SAS program in

Lockwood and MacKinnon (1998). Computation time does not seem like a reasonable argument for not using resampling methods because of the speed of current computing, although relatively large models may take considerable time.

Another potential limitation of resampling methods is that the difference between resampling methods and single sample methods is small in many cases. In fact,

Bollen and Stine (1990) report percentile bootstrap confidence limits and not bias-corrected bootstrap confidence limits because their limits were so similar. In the present study for example, the bias corrected bootstrap and the

*z* test came to the different conclusions regarding whether the population value was inside or outside the confidence interval only 5.5% of the time across all confidence intervals examined in this study. The discrepancy between the bias-corrected bootstrap and the

*M* test was only 4.5% As shown in the example data, the confidence limits were very similar across the methods, although only the traditional

*z* included zero in the confidence interval. There is some evidence from this study that for cases where the values of either δ

_{α} and δ

_{β} are large, resampling methods do not provide much improvement over single sample methods. The additional effort required for resampling methods may not be justified in these cases. When confidence limits are required for rather small values of δ

_{α} and δ

_{β}, then the resampling methods are more accurate than single sample tests. A final limitation of resampling tests has been called the first law of statistical analysis (

Gleser, 1996). The law requires that any two statisticians analyzing the same data set with the same methods should come to identical conclusions. With resampling methods (other than the jackknife), it is possible that different conclusions could be reached because the resampled data sets generated by different statisticians will differ.

The results of this study highlight a difference between testing significance based on critical values versus confidence limits. There is a problematic result of testing the significance of the indirect effect with the distribution of the product using the critical value for the distribution for α = 0 and β = 0, which is 2.18 for a .05 Type I error rate. In this situation, either α or β can be nonsignificant but the test based on the distribution of the product may be significant, indicating that the indirect effect is larger than expected by chance alone while one of the regression coefficients contributing to its effect is not. The statistical test of whether α = 0 and β = 0 is more likely to be judged significant when the true values are α = 0 and β ≠ 0 or α ≠ 0 and β = 0, because the distribution of the indirect effect with these parameter values differs from the distribution for α = β = 0. By selecting the critical value of 2.18 from the distribution of the product for α = 0 and β = 0, the null hypothesis is *H*_{0}: α = 0 and β = 0, which is rejected in three situations: when α ≠ 0 and β ≠ 0, α = 0 and β ≠ 0, or α ≠ 0 and β = 0. The traditional *z* test is a test of the null hypothesis *H*_{0}: αβ = 0, which is rejected only when α ≠ 0 and β ≠ 0 but the assumption of the *z* distribution does not appear to be accurate for the product of α and β, altering the Type I error rates and statistical power of this test. The *M* test of significance based on confidence limits also tests whether αβ = 0 and requires different critical values for the upper and lower limits. In this case a test of *H*_{0}: α = 0 and β = 0 yields a different result than the test based on confidence limits.

There are other methods to evaluate indirect effects. These include the steps mentioned in

Baron and Kenny (1986) and

Judd and Kenny (1981) and the joint significance test of α and β described in

MacKinnon et al. (2002), which do not include explicit methods to compute confidence limits. There are other methods to compute confidence limits for the indirect effect based on the standard error of the difference in coefficients, τ − τ′ (e.g.,

Allison, 1995b;

Clogg, Petkova, & Cheng, 1995;

Clogg, Petkova, & Shihadeh, 1992;

Olkin & Finn, 1995) but these methods perform similar to the traditional

*z* test described in this article, in part because a normal distribution for the indirect effect is assumed.

There are several limitations of this article. The results may not extend beyond the single indirect effect model investigated. The influence of different distributions of

*X*,

*X*_{M}, and

*Y*_{O} was not evaluated. However, Study 1 included a binary independent variable as another condition. Because the results were virtually identical to the results for a continuous independent variable, they were not reported in this article. Use of the

*M* test confidence limits for indirect effects consisting of the product of three or more paths would require the use of the distribution of the product of three or more random variables. There are analytical solutions for this distribution (

Springer, 1979) but tabled values of the distribution are not available. The resampling methods can be applied to these more complicated models.

This article has also been silent regarding important conceptual issues in interpreting indirect effects. Here it is assumed that the indirect effect model is known and that

*X* precedes

*X*_{M} and

*X*_{M} precedes

*Y*_{O}. In practice, the hypothesized chain of effects in an indirect effect may be wrong and there may be several equivalent models that will explain the relations equally well. For example, the mediating variable may actually change the independent variable that may then affect the dependent variable. In the case of a randomized experiment, the independent variable improves interpretation because it must precede the mediating variable and the dependent variable, but even in this situation the interpretation of indirect effects is more complicated than what might be expected (

Holland, 1988;

Sobel, 1998). Issues regarding the specificity of the effect to one or a few of many mediating variables and future experiments targeted at specific mediating variables improve the veracity of indirect effects (

West & Aiken, 1997). None of these methods to test the statistical significance or compute confidence limits for indirect effects answer these critical conceptual questions, but when combined with careful replication studies these relations are clarified.

There are several ways that the confidence limits described in this article may be further improved. One approach is to use more extensive resampling methods such as bootstrapping residuals, iterated bootstrap, or methods to compute confidence limits based on the permutation test (

Manly, 1997). More extensive tables for the

*M* test critical values may also improve the performance of this method. Analytical work on the distribution of the product of two regression coefficients especially for small sample sizes may also lead to more accurate confidence limits.

The practical implication of the results of this article is that the traditional *z* test confidence limits can be substantially improved by using a method such as the *M* test that incorporates the distribution of the product of two normal random variables. The bias-corrected bootstrap provided the most accurate confidence limits and greatest statistical power, and is the method of choice if it is feasible to conduct resampling methods.