|Home | About | Journals | Submit | Contact Us | Français|
This article describes a program, PRODCLIN (distribution of the PRODuct Confidence Limits for INdirect effects), written for SAS, SPSS, and R, that computes confidence limits for the product of two normal random variables. The program is important because it can be used to obtain more accurate confidence limits for the indirect effect, as demonstrated in several recent articles (MacKinnon, Lockwood, & Williams, 2004; Pituch, Whittaker, & Stapleton, 2005). Tests of the significance of and confidence limits for indirect effects based on the distribution of the product method have more accurate Type I error rates and more power than other, more commonly used tests. Values for the two paths involved in the indirect effect and their standard errors are entered in the PRODCLIN program, and distribution of the product confidence limits are computed. Several examples are used to illustrate the PRODCLIN program. The PRODCLIN programs in rich text format may be downloaded from www.psychonomic.org/archive.
An indirect effect implies a causal relation in which an independent variable generates a mediating variable, which in turn generates a dependent variable (Sobel, 1990). Indirect effects are important in basic and applied research. For example, the effect of attitude on behavior is hypothesized to be mediated by intention (Ajzen & Fishbein, 1980). Parental education level affects the child’s education, which then affects the child’s potential income (Duncan, Featherman, & Duncan, 1972). Likewise, neighborhood degradation affects neighborhood cohesion, which then affects crime rates (Sampson, Raudenbush, & Earls, 1997). Applied health promotion and disease prevention programs provide many other examples of indirect effects, and such programs are designed to change mediators that are hypothesized to be causally related to an outcome (Judd & Kenny, 1981; MacKinnon & Dwyer, 1993).
The statistical properties of estimators of the indirect effect and its standard error have received much research attention recently. MacKinnon, Lockwood, Hoffman, West, and Sheets (2002) and Shrout and Bolger (2002) demonstrated the low power of some tests of the indirect effect. Methods of computing confidence limits for the indirect effect often have substantial imbalances, in part due to the assumption that the indirect effect follows a normal distribution (MacKinnon, Lockwood, & Williams, 2004). Simulation studies and other research have demonstrated that confidence limits for the indirect effect based on the distribution of the product method (MacKinnon et al., 2002; Pituch, Whittaker, & Stapleton, 2005) or resampling methods (MacKinnon et al., 2004) are more accurate than other methods. In particular, the confidence limits computed using the distribution of the product method are asymmetric, consistent with the nonnormal distribution of the indirect effect. MacKinnon et al. (2004) demonstrated that the method used to construct confidence limits based on the distribution of the product, described in MacKinnon et al. (2002), was more accurate than other methods. For example, the distribution of the product confidence limits have more power than the normal-theory confidence limits. Most recently, Pituch et al. provided another demonstration of the improvement obtained by confidence limits derived using the distribution of the product method described in MacKinnon et al. (2002). The purpose of this article is to describe a computer program called PRODCLIN, which computes confidence limits for the indirect effect based on the distribution of the product but is more precise than the distribution of the product programs used in prior research. PRODCLIN has not been described in the published literature until now. Other programs that compute indirect-effect measures (Lockwood & MacKinnon, 1998; Preacher & Hayes, 2004) have proven useful for researchers.
In these equations, Y is the dependent variable, X is the independent variable, and M is the mediating variable. is the coefficient relating the independent variable and the dependent variable, and ′ is the coefficient relating the independent variable to the dependent variable adjusted for the effects of the mediating variable. 01, 02 and 03 represent the intercept in Equations 1, 2, and 3, respectively, and 1, 2 and 3 represent residuals. The residuals are assumed to be independent across equations and have an expected mean of zero.
This article focuses on a computer program for a product of coefficients method of assessing the indirect effect that involves estimation of Equations 2 and 3. First, the coefficient relating the mediating variable to the dependent variable is estimated, , in Equation 2. Second, as shown in Equation 3, the coefficient, , relating the independent variable to the mediating variable, is estimated. The product of these two coefficients, , is the estimator of the indirect effect. The coefficient relating the independent variable to the dependent variable, adjusted for the mediating variable, ′, is the estimate of direct effect.
An estimator of the variance of the indirect effect, , is based on the variance of the product of the and regression coefficients. The exact variance of the product of two independent random variables (Mood, Graybill, & Boes, 1974, p. 180), such as and , derived using a second-order Taylor series, is
Sobel (1982, 1986) derived the approximate variance of the indirect effect using the multivariate delta method (Bishop, Fienberg, & Holland, 1975) and showed its application to research data (see also Folmer, 1981). The formula based on the multivariate delta method,
is used to calculate the standard error of the indirect effect in many statistical software packages, such as EQS (Bentler, 1997) and LISREL (Jöreskog & Sörbom, 1993). The approximate variance in Equation 5 is based on first derivatives, so it does not include the term found in Equation 4, which is usually small in comparison with the other two terms. An unbiased estimator of the variance subtracts from Equation 5, as shown by Goodman (1960). All three of these estimators assume that the coefficient vector containing and is consistent, efficient, and asymptotically normal.
These variance estimators can be used to calculate standard errors and confidence limits for the indirect effect. For nonzero values of both α and β, Monte Carlo studies suggest that all three variance estimators appear to have relative bias of less than 5% for a sample size of 100 or more in a simulation study of the single indirect-effect model (MacKinnon et al., 1995) and a sample size of 200 for the multivariate delta standard error in a simulation study of a recursive model with seven indirect effects (Stone & Sobel, 1990). In many studies, the indirect effect is divided by its standard error and the resulting ratio is then compared with the normal distribution to test its significance (Bollen & Stine, 1990; MacKinnon et al., 1991; Wolchik, Ruehlman, Braver, & Sandler, 1989). Confidence limits for the indirect effect lead to the same conclusion with regard to the null hypothesis. Confidence limits are constructed using Equation 6,
where z1−ω/2 is the value on the z-distribution corresponding to the desired Type I error rate, ω.
Although the variance and standard error estimates of the indirect effect may be unbiased, confidence limits based on these values are often inaccurate. Simulation studies (MacKinnon et al., 2004; MacKinnon et al., 1995; Stone & Sobel, 1990) have shown an imbalance in the number of times a true value falls to the left or right of the confidence limits. For an indirect effect where α and β are both positive or both negative, the confidence limits are more often to the left rather than to the right of the true value. Bootstrap estimation of the indirect effect confidence limits leads to similar imbalances (Bollen & Stine, 1990; Lockwood & MacKinnon, 1998; MacKinnon et al., 2004). An explanation for the imbalance in confidence limits is that the confidence limit estimation assumes a normal distribution of the indirect effect, when in fact the distribution of the product is skewed for nonzero indirect effects and has different values of kurtosis for different values of the indirect effect (MacKinnon et al., 2004).
The indirect effect divided by its standard error does not have a normal sampling distribution in many situations. MacKinnon, Lockwood, and Hoffman (1998) developed an alternative method to test for the indirect effect based on the distribution of the product of two normally distributed random variables (Aroian, 1944; Craig, 1936). Because the indirect effect is the product of regression estimates that are normally distributed (Hanushek & Jackson, 1977), the distribution of the product can be applied to the use of the product as a test of the indirect effect based on the product zαzβ, where zα = /α and zβ = /β.
The distribution of the product of two normal variables is not normal (Lomnicki, 1967; Springer & Thompson, 1966). In the null case, where both α and β (or zα and zβ) have means equal to zero, the distribution is symmetric with predicted kurtosis of six (Craig, 1936), even for very large sample sizes. When the product of the means, αβ or zαzβ is nonzero, the distributions are skewed and have excess kurtosis, although Aroian, Taneja, and Cornwell (1978) showed that the product approaches the normal distribution as either zα, zβ, or both get large in absolute value. The four central moments of the product of two correlated normal variables were given by Craig and Aroian et al. Below are the central moments of zαzβ when the variables are uncorrelated, as is the case here.
Although the general analytical solution for the distribution of the product of two independent standard normal variables does not approximate familiar distributions commonly used in statistics, Aroian (1944) showed that the gamma distribution can provide an approximation in some situations. The analytical solution for the distribution of the product is a Bessel function of the second kind with a purely imaginary argument (Aroian, 1944; Craig, 1936). Springer and Thompson (1966) provided a table of the values of this function when α = β = 0 (or zα = zβ = 0). The formula for the case when α = β = 0 (or zα = zβ = 0) is equal to (1/π)K0, which can be programmed in the Mathematica (Wolfram, 1996) computer program, and the general formula for any value of zα and zβ is shown at the bottom of this page (Craig, 1936; Hayya & Ferrara, 1972: Equation 10), where K is the Bessel function and where Σ is a Laurent series equal to
where r is the order of the Laurent series (e.g., for Σ2, r equals 2) and
Meeker, Cornwell, and Aroian (1981; see pp. 129–144) presented tables of the distribution of the product of two standard normal variables based on an alternative formula more conducive to numerical integration. These tables of fractiles of the standardized distribution function for ( − αβ)/σαβ are given for different values of α, β, σα, and σβ. The tables assume that the population values of σαβ, σα, and σβ are known, but the authors suggest that sample values can be used in place of the population values as an approximation.
For the 95% standard normal confidence limits for the indirect effect, a critical value of 1.96 is used for z1−ω/2 and the standard error, such as the multivariate delta solution in Equation 5, is used. For the distribution of the product confidence limits, there are different critical values for the upper and lower confidence limits because of the asymmetry in the distribution. Using the Meeker et al. (1981) tables, the upper and lower limits are obtained using a table of critical values from the distribution of the product using the sample values zα and zβ for probability values of .025 and .975. Once the critical values are found from the tables, they must then be converted back into the standardized metric of the original values for α and β using Equation 14:
The standardized critical values are then substituted into Equation 6 in place of z1−ω/2 to create confidence limits for the indirect-effect estimate , where the standard error is the square root of Equation 5. Note that the critical values for confidence limits based on the distribution of the product are not the same as those for the normal-theory confidence limits, and they are not identical for the upper and lower limits, as they are for the normal-theory limits. Also, for cases in which the mediated effect is negative, the upper and lower critical values are reversed and multiplied by −1. This operation is necessary because the tables in Meeker et al. give only positive values for α and β.
The most important aspect of the PRODCLIN program (see Archived Materials) is the use of a Fortran program to compute the critical values for the distribution of the product. Because the tables provided by Meeker et al. (1981) contained critical values for combinations of zα and zβ only from 0 to 4 by .4 increments, and then for 6, 9, and 12, it was desirable not only to find a more accurate method of obtaining critical values that did not involve extensive rounding but also to have the ability to find confidence limits for any Type I error rate, not just .05. This was accomplished by editing a Fortran program by Alan Miller (1997) called FNPROD. Given specific mean values of zα and zβ, along with their correlations (equal to zero for the indirect effect from Equations 2 and 3), and a value for zαzβ, FNPROD returns the cumulative percentile for that value of zαzβ using numerical integration based on an algorithm by Meeker and Escobar (1994) and work by Morris (1992). Rather than using trial and error to find the value of zαzβ for confidence limits, the FNPROD program was edited so that, given values for α, β, σα, σβ, ρ (the correlation between α and β), and a Type I error rate, the program iteratively finds the corresponding critical values. The program was further edited so that the values of α, β, σα, and σβ could be imported from various statistical packages, and the critical values could then be returned to the statistical software so that the asymmetric confidence limits could then be computed.
PRODCLIN is presented here for the SAS macro programming language (SAS Institute, 2005), as illustrated in Figure 2, although versions for R (R Development Core Team, 2005) and SPSS (SPSS Inc., 2005) are also available from the authors at the Web site www.public.asu.edu/~davidpm/ripl/Prodclin/. To begin, the correlation between and is entered into the program, as is the desired Type I error rate. For most examples, the correlation between and is zero, but the correlation may be nonzero for some indirect-effect models. Next, the observed values for , , α, and β are entered in the “%PRODCLIN” line in the SAS program and the program is run. SAS then exports these values to a text file, “raw.txt,” which is used by the PRODCLIN Fortran program to generate the corresponding critical values. After finding the critical values, PRODCLIN outputs the values to a text file, “critval.txt,” that SAS reads in; SAS then standardizes the critical values using Equation 14. The program then computes confidence limits for the distribution of the product (lower limit = “prodlow”; upper limit = “produp”) and the normal distribution (lower limit = “normlow”; upper limit = “normup”) from the standardized critical values (lower standardized critical value = “low,” and higher standardized critical value = “high”) and the standard error of the indirect effect using Equation 5 (see Figure 2).
PHLAME (Elliot et al., 2004) was a program designed to increase the physical fitness and health behaviors of firefighters. One part of the program targeted the mediating variable of tracking food. It was hypothesized that the act of tracking food intake would reduce body weight by drawing attention to the amount and types of food eaten. The coefficient relating program exposure to tracking food intake was .3937 with a standard error (SE) of .1872 for a t value of 2.10. The coefficient relating tracking food to body weight was equal to−.8798 with an SE of .1910 for a t value of −4.61. These values are entered in the PRODCLIN program at this line: “%prodclin(a=.3937, sea=.1872, b=−.8798, seb=.1910);”. These values were entered into PRODCLIN to yield lower and upper 95% confidence limits of −.738090 and −.028209 that did not contain zero, consistent with a statistically significant mediation effect. Interestingly, the normal-theory confidence limits were −.701230 and .008480, suggesting that the mediated effect is not statistically significant.
In a classic sociology example, Duncan et al. (1972, p. 38) presented data collected during the early 1960s from a process model of achievement. One of the indirect effects found in the study was the relation of father’s education to respondent education to respondent income. The coefficient relating father’s education to respondent education was .1701 with an SE of .0156, and the coefficient from respondent education to respondent income was .1998 with an SE of .0364. The lower and upper 95% confidence limits based on the distribution of the product, .021045 and .048214, were quite similar to the normal-theory limits, .020400 and .047572. The similarity of the distribution of the product limits and the normal-theory limits is due to the large t values for the two effects (10.90 and 5.49); as one or both of the t values get larger, the distribution of the product is more similar to the normal distribution (Aroian et al., 1978).
A simulation study was conducted to compare the PRODCLIN confidence limits to the percentile and bias-corrected bootstrap confidence limits. The simulation methodology was the same as that used by MacKinnon et al. (2004), in which data for a single mediator model were generated based on zero, small (.14), medium (.39), or large (.59) population parameter values. There was evidence in the MacKinnon et al. (2004) study that statistical tests based on the bias-corrected bootstrap had excess Type I error rates for cases in which one path in the mediated effect was zero and the other path was nonzero. To investigate these results in more detail, we conducted an additional simulation study with sample sizes of 50, 100, and 200 for the four parameter combinations (zero/zero, zero/small, zero/medium, and zero/large) for the alpha and beta paths, respectively. For each combination of parameter value and sample size, 1,000 replications were obtained, and for each replication, 1,000 bootstrap samples were taken. As Table 1 shows, the PRODCLIN program returned Type I error rates comparable with those of the percentile bootstrap method and comparable with or smaller than those of the bias-corrected bootstrap method for all parameter combinations studied in this simulation.
Many research questions focus on indirect effects. Recent work on the statistical properties of estimators of indirect effects indicates that confidence limits based on the asymmetric distribution of the product have properties superior to those obtained with other methods. The PRODCLIN program computes asymmetric confidence limits based on the distribution of the product. New asymmetric confidence limits based on the distribution of the product are more exact than those based on the normal distribution. They are, therefore, more powerful and have more accurate Type I error rates, a conclusion supported by the findings of the simulation that was conducted and by prior research (MacKinnon et al., 2002; MacKinnon et al., 2004; Pituch et al., 2005). We included normal distribution confidence limits in the PRODCLIN output so that the confidence limits from the distribution of the product and those from the normal distribution could be directly compared. Resampling methods are an alternative for obtaining asymmetric confidence limits, but resampling methods require raw data that is sometimes unavailable, as was the case for the sociology study described in this article. The programming and computational demands of resampling methods may be cumbersome for some researchers. Resampling methods are included as part of covariance structure analysis programs such as EQS (Bentler, 1997), LISREL (Jöreskog & Sörbom, 1993), and Mplus (Muthén & Muthén, 2004); however, there is some evidence of inflated Type I error rates for resampling method tests of the indirect effect (MacKinnon et al., 2004). The PRODCLIN program is the only program available for computing asymmetric confidence limits for the indirect effect on the basis of the distribution of the product.
The main limitation of the PRODCLIN program is that confidence limits for indirect effects consisting of the product of more than two regression coefficients cannot yet be computed. The statistical theory for these critical values exists in several references but no statistical software is yet available to compute the confidence limits.
This research was supported by National Institute on Drug Abuse Grant DA09757.
David P. MacKinnon, Arizona State University, Tempe, Arizona.
Matthew S. Fritz, Arizona State University, Tempe, Arizona.
Jason Williams, Research Triangle Institute, Research Triangle Park, North Carolina.
Chondra M. Lockwood, Arizona State University, Tempe, Arizona.