Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2869473

Formats

Article sections

- Abstract
- Consensually Meaningful Questions an Investigator Might Be Trying to Address When Using ANCOVA/APV For Cases in Which the Groups Differ on the Covariate
- Bias in ANCOVA/APV Due to Unreliability in the Service of Addressing Consensually Meaningful Questions When the Independent Variable and the Covariate are Correlated
- Omitted Covariate Bias as an Extreme Form of Underadjustment Due to Unreliability
- Potential Alternatives to Conventional ANCOVA/APV
- Minimizing the Impact of Bias Due to Unreliability on Drawing Valid Inferences When Using Conventional ANCOVA/APV and the Independent Variable Correlates with the Covariate
- Minimizing the Impact of Omitted Variable Bias on Drawing Valid Inferences
- Limitations and Conclusion
- References

Authors

Related links

J Abnorm Psychol. Author manuscript; available in PMC 2010 May 14.

Published in final edited form as:

PMCID: PMC2869473

NIHMSID: NIHMS187858

Correspondence concerning this article should be addressed to Richard Zinbarg, Department of Psychology, Northwestern University, 2029 Sheridan Rd., Evanston, IL 60208-2710: Email: ude.nretsewhtron@grabnizr

The publisher's final edited version of this article is available at J Abnorm Psychol

See other articles in PMC that cite the published article.

Miller and Chapman (2001) argued that one major class of misuse of analysis of covariance (ANCOVA) or its multiple regression counterpart, analysis of partial variance (APV), arises from attempts to use ANCOVA/APV to answer a research question that is not meaningful in the first place. Unfortunately, there is another misuse of ANCOVA/APV that arises frequently in psychopathology studies even when addressing consensually meaningful research questions. This misuse arises from inflated type I error rates in ANCOVA/APV inferential tests of the unique association of the independent variable with the dependent variable when the covariate and independent variables are correlated and measured with error. Alternatives to conventional ANCOVA/APV are discussed as are steps that can be taken to minimize the impact of this bias on drawing valid inferences when using conventional ANCOVA/APV.

Analysis of Covariance (ANCOVA) or its multiple regression (MR) counterpart, Analysis of Partial Variance (APV, Cohen & Cohen, 1983), is commonly used in psychopathology research. The most unambiguous case when conventional ANCOVA/APV has served a legitimate and useful purpose is the one for which conventional ANCOVA was developed in which the dependent variable (*DV*) is correlated with the covariate (*Cov*) but the main independent variable (*IV*) of interest is not. In this case, conventional ANCOVA/APV reduces error in the *DV* and thereby increases statistical power for testing the relationship between the *IV* and the *DV*. Unfortunately, there are many cases in psychopathology research in which conventional ANCOVA/APV has been used in a more controversial fashion.

In these more controversial cases, the researcher uses ANCOVA/APV to test whether a relationship between the *IV* and the *DV* is actually due to a confounder variable. Concern about possible misuses of conventional ANCOVA in these cases has stimulated numerous articles, chapters and books (e.g., Cochran, 1957; Elashoff, 1969; Fleiss & Tanur, 1973; Huitema, 1980; Lord, 1960, 1967, 1969; Maxwell, Delaney & Manheimer, 1985; Porter & Raudenbush, 1987; Reichardt, 1979; Wainer, 1991; Wildt & Ahtola, 1978). Whereas random assignment to conditions often eliminates confounds, thereby obviating the need for these more controversial uses of ANCOVA/APV, random assignment to the different levels of a psychopathology variable represented in a given study “is routinely unfeasible and/or unethical” (Miller & Chapman, 2001, p. 40). Thus, the *Cov* is correlated with the *IV* in a typical psychopathology study. Psychopathology researchers, therefore, need to have a thorough understanding of the possible misuses of conventional ANCOVA/APV and how to avoid or minimize them.

An accessible treatment of some misuses of ANCOVA/APV was provided by Miller and Chapman (2001), who articulate the problems that can arise in conventional ANCOVA when adjusting for the *Cov* may remove part of the effect of the *IV*. They assume a simple design having one *IV*/grouping variable (*Grp*), one *DV* and one *Cov* and they frame their discussion using a MR approach. When *Cov* is entered into the regression, this removes the variance *Cov* shares with *Grp*, leaving a residual portion of *Grp*, *Grp _{res}*, that is not correlated with

Unfortunately, there is another problem in the use of conventional ANCOVA/APV that can arise even when applied to less controversial, consensually meaningful questions. For statistical reasons reviewed below, ANCOVA/APV often generates biased results. Though discussed by several methodologists, even some of the best designed and most conceptually significant, recent psychopathology studies provide little indication that psychopathology researchers are aware of this bias. Indeed, two of us (A.A.U. and A.R.L.) independently coded all 61 articles in three recent issues of this journal (Volume 116, Issue 4 Volume 117, Issue 4, and Volume 118, Issue 1) and agreed that 12 (19.7%) of these articles involved the use of ANCOVA/APV that were likely to be vulnerable to this bias (kappa = .60). Of these 12 articles, we agreed that in 10 (83.3%) of them the researchers provided no indication of awareness of this bias (kappa = .63).

In light of the prevalence of ANCOVA/APV in psychopathology research without indication that researchers are aware of the bias in ANCOVA/APV, this article has three aims. The first is to clarify the nature of the questions a psychopathologist might try to answer with the use of ANCOVA/APV that are more consensually meaningful than the type of question criticized by Miller and Chapman (2001). The second is to raise awareness of the bias in ANCOVA/APV when used to address such consensually meaningful questions. The final aim is to describe alternatives to conventional ANCOVA/APV and discuss how one can strengthen the validity of inferences when using conventional ANCOVA/APV.

Miller and Chapman (2001) use the example of comparing depressed patients with non-patient controls using anxiety as a *Cov* to illustrate their central contention that the questions that some investigators try to address with conventional ANCOVA/APV are not meaningful. Anxiety is higher in patients than controls and Miller and Chapman note that if “we believe that the negative affect that depression and anxiety share is central to the concept of depression, then removing negative affect (by removing anxiety) will mean that the group variance that remains has very poor construct validity for depression (p. 43).” They contend that it is simply not a meaningful question to ask whether depression would relate to another variable if depression did not include a facet widely thought to lie at its core.

Unfortunately, Miller and Chapman (2001) could be read to imply that the only purpose an investigator might have in using anxiety as a *Cov* and depression as the *IV* is understanding the effects of “pure depression”. For example, when discussing the possibility of using a self-report anxiety measure as a *Cov* in an ANCOVA with diagnostic group as the *IV* and the brain-wave measure known as P300 as the *DV*, they state “the hope in such an analysis would be to control anxiety and thus be able to observe the relationship between pure depression (not confounded with anxiety) and P300.” Because Miller and Chapman did not consider any other possible motivations for this analysis, researchers with other motives for such an analysis might be led to wonder whether their questions are consensually meaningful.

There are indeed questions that arise frequently in psychopathology research other than the form criticized by Miller and Chapman that are consensually meaningful and to which ANCOVA/APV is frequently applied. What these questions share in common is that their inferential focus is not on the *Grp* latent variable (LV) that an important facet has been removed from it. Rather, either the Cov is not thought to measure an important facet of the *Grp* LV in the first place or the inferential focus is explicitly on either the *Grp _{res}* LV or is not on any LV but rather is on the

Miller and Chapman (2001) illustrate the first class of consensually meaningful questions when noting that if the comorbidity between anxiety and depression were thought to arise because of variance due to factors not central to depression, then ANCOVA might be effective in removing this variance, leaving Grp_{res} interpretable as “pure” depression. For example, imagine that anxiety and depression comorbidity arises solely from depression in some people triggering anxiety focused on the worry that their depression will never remit or will recur. In this case, the depression experienced by individuals who do not also experience an elevation in their anxiety is a valid representation of depression.

A second consensually meaningful class of questions that a researcher might try to address by ANCOVA/APV involve asking whether a specific *component* of a *Grp LV* exists that uniquely relates to the *DV* above and beyond the *Cov*. Thus, even if one believes that the negative affect that depression and anxiety share is central to the concept of depression, unless one believes that depression and anxiety are the same *LV*, then one (perhaps implicitly) believes that there is reliable, unique variance in depression and/or anxiety that provides the basis or bases for differentiating them. According to Clark and Watson (1991), for example, the variance in anxiety and depression can be decomposed into: (a) negative affect, which is common to anxiety and depression, (b) physiological hyperarousal, which is specific to anxiety and (c) anhedonia which, is specific to depression. Thus, rather than using anxiety as a *Cov* to observe the relationship between “pure depression” and P300, an investigator might be hoping to observe the relationship between anhedonia and P300. It is important to note that this consensually meaningful question does not take Grp_{res} to be a good observed indicator of the *LV* that *Grp* is intended to measure. Rather, this question explicitly recognizes that Grp_{res} is just one component of the *LV* that *Grp* is intended to measure. Of course, if the investigator’s hope is to observe the association between anhedonia and P300, a superior design would involve measuring negative affect and anhedonia more directly.

A third class of consensually meaningfully research questions frequently addressed via conventional ANCOVA/APV involve differential change in a construct occurring across two time points. For example, there is a great deal of scientific interest in whether some *IVs* (such as anxiety sensitivity) predict increases in various outcome variables (such as fear) in response to a stressor. Such questions are often tested by measuring the *IV* and the outcome prior to the stressor and then re-administering the outcome measure after the stressor. The conventional analysis of the resulting data would be an ANCOVA/APV in which the outcome measured after the stressor is treated as the *DV* and the outcome measured before the stressor is treated as the *Cov*. Here, the question is primarily focused on the residual portion of *DV*, *DV _{res}*, that is not correlated with

A final class of consensually meaningful questions that could be addressed via conventional ANCOVA/APV is whether the observed *IV* measure (e.g.. a measure of cognitive vulnerability to depression) has unique effects above and beyond the effects of the observed *Cov* measure (e.g., a measure of neuroticism). Such a question might be relevant, for example, in deciding whether to include the *IV* measure in addition to the *Cov* measure in a battery designed to identify individuals for treatment or a preventive intervention. For this class of question, the inferential focus is entirely on the observed measures rather than on the *LV*s that the measures might be purported to be indicators of. (Of course, this class and the second class of questions represent classic applications of hierarchical regression.) As will be discussed in more detail below, ANCOVA/APV generates unbiased answers to this class of question.

Given that there are consensually meaningful questions that conventional ANCOVA/APV might be used to answer when the *IV* and the *Cov* are correlated (or, equivalently, when groups that serve as the levels of a categorical *IV* differ on the *Cov*), it becomes important to ask whether conventional ANCOVA/APV provides unbiased answers to such research questions. The answer to this question depends on whether (a) the inferential focus is on the observed indicators versus the *LV*s measured by those indicators, (b) the observed indicators of the *LV*s contain measurement error and (c) there are any unmeasured confounders. In particular, when drawing inferences about *LV*s, if the *Cov* is measured with error and/or there is an unmeasured confounder then the answer is almost certainly no (e.g., Carroll, Ruppert, Stefanski & Crainiceanu, 2006; Huitema, 1980; Kahneman, 1965; Kenny, 1979; Lord, 1960; Maxwell & Delaney, 2004; Sorbom, 1979; Vargha, Rudas, Delaney & Maxwell, 1996). When the *IV LV* does not have a unique effect above and beyond the effect of a correlated but less than perfectly reliable *Cov LV*, ANCOVA/APV is systematically biased toward underadjusting for the effects of the *Cov LV*. Since an unmeasured variable is equivalent to a variable that is measured with zero reliability (Judd & Kenny, 1981), the most extreme version of such underadjustment occurs when a relevant *Cov LV* is not included in the analysis (e.g., Kenny, 1979). Though Miller and Chapman (2001) addressed the issue of underadjustment due to unreliability in passing (e.g., p. 42), they were concerned primarily with contexts in which conventional ANCOVA/APV removes too much of the variance in the *IV*. In contrast, this article is concerned primarily with contexts in which ANCOVA/APV does not remove enough of the variance in the *IV* (i.e., it does not remove enough of the shared variance with the *Cov LV*).

Figure 1 shows a structural equation model (SEM) representation of this article’s main running example. Paths *a*, *b* and f represent the standardized loadings of the *Cov* observed indicator (i.e., anxiety), the *IV* observed indicator (i.e., depression), and the *DV* observed indicator on their respective *LV*s and paths *a*′, *b*′ and f′ represent the standardized loadings of alternative, congeneric indicators that could be used to measure the *LV*s (note that, for model identification, three indicators are needed for any LV that is uncorrelated with the other LVs in a model). Thus, the reliabilities of the three observed indicators are *a*^{2}, *b*^{2} and f^{2} and their standardized measurement errors are 1–*a*^{2}, 1–*b*^{2} and 1– f^{2}. Assuming that the model in Figure 1 is valid, including that all errors (not shown in Figure 1) are independent, there are two pathways originating at the observed measure of the *IV* and ending at the observed measure of the *DV LV*. The first is the unique pathway that begins with the observed measure of the *IV LV* and runs through to the observed measure of the *DV LV* (pathbef in Figure 1). The second is the pathway resulting from the *IV LV* being correlated with the *Cov LV* which is another cause of the *DV LV* (*path* bcdf in Figure 1). The zero-order correlation between the observed measures of the *IV and* the *DV* (*r _{DV,IV}*) is the sum of these two pathways (bef+ bcdf or bf [e+cd]). That this estimate of the zero-order correlation between the observed measures of the

Path diagram of a model with latent variables corresponding to an independent variable (*IV*; depression), a covariate (*Cov*; anxiety) and the dependent variable. Circles represent latent variables and squares represent observed indicators. Paths *a b* and **...**

The separate components of the IV-DV relationship reflecting the two pathways, (1) the *IV LV* being correlated with the *Cov LV*, which is another cause of the *DV LV*, (path *cd* in Figure 1) and (2) the unique association of the *IV and DV LVs* (path *e* in Figure 1), cannot be estimated without at least one observed measure of the *Cov LV*. Conventional ANCOVA/APV uses a single measure of the *Cov LV* and a single measure of the *IV and* the *DV LVs* to do this. The ANCOVA/APV estimate of the unique association of the *IV LV with* the *DV LV* (*path* e in Figure 1) expressed in terms of a standardized partial regression coefficient is

$${\beta}_{DV,IV.Cov}=\frac{{r}_{DV,IV}-{r}_{DV,Cov}{r}_{Cov,IV}}{1-{r}_{Cov,IV}^{2}}.$$

In terms of the SEM representation depicted in Figure 1, it is

$$\begin{array}{c}{\beta}_{DV,IV.Cov}=\frac{(bef+bcdf)-(adf+acef)(acb)}{1-{(acb)}^{2}}\\ =bf\left[e\left(\frac{1-{a}^{2}{c}^{2}}{1-{(acb)}^{2}}\right)+cd\left(\frac{1-{a}^{2}}{1-{(acb)}^{2}}\right)\right].\end{array}$$

(1)

Equation 1 is either identical to (e.g., Bollen, 1989) or a more general expression of (e.g. Kenny, 1979) equations presented previously and shows that the conventional ANCOVA/APV estimate of the unique association of the *IV* is quite complicated and subject to multiple biasing influences. The multiple biasing influences may result in underestimation or overestimation of the unique association of the *IV* or, very infrequently, may even completely offset each other to yield an unbiased estimate (Reichardt, 1979).

Fortunately, Equation 1 simplifies considerably in the special case corresponding to the use of ANCOVA/APV common in psychopathology research and of primary concern here. This use consists of testing the hypothesis that *r _{dv}*

$${\beta}_{DV,IV.Cov}=\mathit{\text{bfcd}}\left(\frac{1-{a}^{2}}{1-{(acb)}^{2}}\right)\phantom{\rule{thinmathspace}{0ex}}.$$

(2)

Note that the hypothesis that *r _{DV}*

Examples illustrating the size of the underadjustment bias and associated type I error rates given different values of the parameters governing this bias are given in Table 1. The effect size of the bias is small to very small ($({f}_{\mathit{\text{effect}}}^{2}\le .02,{\text{Cohen}}^{,}\mathrm{s}d\le .20)$ in most of the examples. Even with small effect sizes, however, the inflation in type I error rates are large enough to be concerned about in all but one example. This is especially true at the larger sample sizes. Consider example 2 in Table 1, with reliabilities of .81 for the *Cov* (*a*^{2} =.900^{2}) measure and .90 for the *IV and DV* measures (*b*^{2} = f^{2} = .949^{2}), a correlation of .450 between the *IV and Cov* measures (abc = .949 × .949 × .50), and a unique association of .500 between the *Cov* and the *DV LVs*. In this case, the small bias inherent in *β _{DV,IV.Cov}* as an estimate of

Examples of underadjustment bias and actual type I error rates when the unique association of the independent variable (IV) is zero with a nominal alpha level of .05 as a function of the loadings of the covariate (a), IV (b) and dependent variable (f) **...**

Applied contexts in which inferences are focused on the observed measures, such as personnel selection or selection into an intervention program, may be conceptualized as cases in which the *IV*, *Cov* and *DV* measures are perfectly reliable (*a* = *b* = f = 1). Thus, in such contexts, when *e* = 0, Equation 2 reduces to *β _{DV,IV.Cov}* = 0 and (and even when

This discussion reiterates a message that has been all too often ignored by psychopathology researchers. ANCOVA/APV fully adjusts for the *Cov* measure and thus is unbiased when inferences are focused on the observed measures. However, *when there is an association between the IV and the Cov LVs, and inferences are focused on the associations among the LVs, there will usually be underadjustment for the Cov LV and positive bias in the type I error rate of the ANCOVA/APV test of H*_{0} :*e* = 0 .

Omitted variable bias (*OVB*) is well known among methodologists (e.g., Kenny, 1979). Figure 2 shows a problematic omitted variable (*OV*) added to the simple model considered above. A problematic *OV* is one that correlates with the *IV* and is a cause of the *DV* (e.g., Judd & Kenny, 1981). In the running depression and anxiety example, a possible example of a problematic *OV* is life stress.

Path diagram of a model with latent variables corresponding to an independent variable (*IV*; depression), a covariate (*Cov*; anxiety), an omitted variable (*OV*; life stress) and the dependent variable. Circles represent latent variables and squares represent **...**

The higher the correlation between the *OV* and *Cov*, the more the *Cov* measure also adjusts for the *OV*. Thus, no correlation between the *Cov* and the OV results in the most problematic case. For simplicity, therefore, Figure 2 depicts an *OV* that is uncorrelated with the *Cov LV*. Thus, Figure 2 contains two new paths (as the *OV* is, by definition, omitted, there are no observed indicators of it): the correlation between the *IV* and the *OV LV*s (*g*) and the unique association between the *OV* and *DV LV*s (*h*). Example 6 in Table 1 added an *OV* to the model underlying Example 5. Whereas Example 5 involved a trivial underadjustment bias, the bias is considerable in Example 6. *OVB* may be seen as equivalent to the case in which a confounding *Cov LV* exists but is measured with a reliability of zero (for a closely related discussion, see Judd & Kenny, 1981, p. 192). That *OVB* can be conceptualized as an extreme form of the underadjustment bias due to unreliability is illustrated in Example 7 in which the *Cov LV* has identical correlations with the *IV* and *DV LV*s as does the *OV* in Example 6. With the reliability of the Example 7 *Cov* indicator equaling only .01 (*a*^{2} = .1^{2}) bias is almost as great as in Example 6.

As randomized studies generally support stronger causal inferences than do non-randomized studies (e.g., Bollen, 1989; Shadish, Cook & Campbell, 2002), analogue studies using randomized designs can make important contributions to the study of psychopathology. However, analogue studies can never entirely supplant studies of participants with clinical diagnoses or symptoms, given their limitations in terms of external validity (Sher & Trull, 1996). Therefore, it is important to consider how non-randomized psychopathology studies can minimize underadjustment bias due to unreliability.

Though researchers will never be able to entirely eliminate the effects of measurement error in their analyses, they can minimize its impact via the judicious incorporation of multiple observed indicators of their *IV*, *DV*, and (especially) *Cov LV*s. The resulting data could then be analyzed in one of two ways. The first option would be to attempt to explicitly model measurement error using SEM analyses (e.g., Huitema, 1980; Maxwell & Delaney, 2004; for an excellent example, see Aiken, Stein & Bentler, 1994). The second option would be to aggregate the multiple measures of each construct into composites and use ANCOVA/APV. The latter option might be called an Aggregated Measures ANCOVA/APV to distinguish it from a conventional ANCOVA/APV in which there is a single measure of each *LV*. The potential advantage of this approach is that if the multiple measures are properly selected then error variance would tend to be smaller in a composite measure of the *Cov* compared with a single indicator measure of the *Cov LV* (e.g., Cronbach, 1951). In terms of comparing SEM versus an Aggregated Measures ANCOVA/APV, SEM would require larger samples but, when sample size is adequate, would have the advantage of allowing a formal assessment of the goodness of fit of one’s measurement model.

Of course, incorporating multiple measures of the *LV*s will not automatically ameliorate bias (e.g., DeShon, 1998). Imagine that the multiple indicators of the *IV LV* share method variance and this shared method variance is also associated with the *DV* measure(s). In this case, the shared method variance would constitute an *OV*. Though random error in the measurement of the *Cov LV will be reduced*, *OVB* will certainly not be reduced and may even be exacerbated (given that the *OV* likely accounts for a larger proportion of the variance common to the set of *IV* indicators than in any single member of that set).

Theory should play a central role in guiding the selection of measures whenever possible (Little, Lindenberger & Nesselroade, 1999). Indeed, blind reliance on selecting those measures that are most highly correlated can increase bias under some conditions. For example, if choosing between two self-report measures of depression or one self-report and one other report measure, it is likely that the highest correlation will be the one between the two self-report measures. Thus, reliability would be maximized by using the two self-report measures. However, use of the two self-report measures would also be more likely to create what Cattell (1978) and Little et al. (1999) would call a “bloated specific” factor and possibly exacerbate *OVB* due to shared method variance. That is, if the shared method variance is also shared with the *DV* measure then that method variance would constitute an *OV*. Selecting one self-report measure and one other-report measure is likely to produce a smaller increase in the reliability of measurement of depression but might reduce the potential for *OVB* due to shared method variance. Little et al. (1999) provide a discussion of four key dimensions of indicator selection that many psychopathology researchers should find helpful. Of course, when measures are clustered (e.g., several measures of each of several methods are included), it is also important to follow DeShon’s (1998) recommendation to take this clustering into account (DeShon’s recommendation could also be generalized to an Aggregated Measures ANCOVA/APV; instead of forming a single *Cov* composite the researcher would form several *Cov* composites with one per method/cluster).

When a longitudinal design includes two time points and the research question concerns differential change over that interval, the conventional analysis is an ANCOVA/APV analysis of “regressed change”. That is, the time 2 measure of the outcome variable is entered as the *DV* and the time 1 measure of the outcome variable is entered as the *Cov* to “control” for the association between the *IV* and the time 1 outcome measure (or for group differences at time 1). When the time 1 outcome *LV* is correlated with the *IV LV*, these analyses will generate biased estimates of *e* when this path in fact equals 0 with a corresponding inflation in the type I error rate of the test of *H*_{0} : *e* = 0 .

Whereas gain scores have been much criticized, some methodologists have refuted these criticisms and/or articulated the advantages of gain scores (e.g., Allison, 2005; Rogosa, 1995; Rogosa, Brandt, & Zimowski, 1982; Willett,1988; Williams & Zimmerman, 1996). Maxwell and Delaney (2004) conclude that an ANCOVA is often preferable to an analysis of gain scores for randomized designs; they also conclude (p. 448) that “in intact group studies, then the ANOVA of gain scores is to be preferred.” Thus, ANCOVA/APV will often be preferable in randomized psychotherapy studies. However, gain scores should be seriously considered in longitudinal, two wave studies of psychopathology as subtracting the time 1 outcome measure from the time 2 outcome measure rather than using the time 1 outcome measure as a *Cov* produces an unbiased estimate of true change (Willett, 1988). Of course, gain scores are only interpretable when the measures of the outcome variable demonstrate factorial temporal invariance (e.g., Horn & McArdle, 1992). Raw gain score analysis further assumes that the variance of the outcome measures also demonstrate temporal invariance and when this assumption is violated standardized gain score analysis should be used (e.g., Judd & Kenny, 1981).

Among those who argue that gain scores are more appropriate than an analysis of “residualized change” for designs with two measurement waves, many recognize that such designs are limited in the first place (e.g., Rogosa,1995; Willett, 1988). When possible, three or more waves of data should be collected when studying change, and autoregressive structural equation models, hierarchical linear modeling, conventional growth curve analysis, latent growth curve analysis or survival analysis should be used (e.g., Bollen & Curran, 2004; Hertzog & Nesselroade, 2003; Singer & Willett, 2003).

Propensity score analysis (PSA) was developed by Rosenbaum and Rubin (1983) for analyzing data from quasi-experimental research with many confounder *Cov LV*s so as to “control for naturally occurring systematic differences in background characteristics between the treatment group and the control group” (Rubin, 1997, p. 757). There are two steps to PSA. First, all available *Cov*s are used to predict group membership in a logistic regression. Plugging a participant’s values on the Covs into the logistic regression equation yields their expected probability of being in the treatment (psychopathology) group rather than the control group. This expected probability is the person’s propensity score. In the second step, participants across the two groups are then matched or stratified on the basis of their propensity scores. The propensity score could also be used as a *Cov* in ANCOVA. Though PSA may have some advantages over conventional ANCOVA (e.g., Rubin, 1997; Shadish et al., 2002), reliability has been a largely neglected topic in the PSA literature (Glynn, Schneeweiss & Sturmer, 2006). When the multiple Covs involved in a PSA are correlated, it seems likely that PSA will minimize bias due to unreliability (analogous to an aggregated measures ANCOVA). However, the logic of PSA does not call for the Covs to be correlated. Thus, it is not clear PSA is less vulnerable than conventional ANCOVA/APV in general to underadjustment bias due to unreliability.

When it is impractical to include multiple measures of the *LV*s under study, there is often little choice but to conduct a conventional ANCOVA/APV. At least eight recommendations can be offered to minimize the impact of bias due to unreliability on the validity of the inferences drawn from such results.

The first recommendation is that one could estimate each of the parameters in Equation 2 to evaluate by how much one’s estimate of the unique association between the IV and the DV LVs and type I error rate are inflated given the sample size and parameter estimates. That is, based on the estimated value of the regression coefficient and assuming normal distributions, one could compute an effect size estimate and (using power tables or calculators) estimate the probability of rejecting the null hypothesis given that effect size and the sample size in a given study. In the case of non-normal distributions, one could conduct a Monte Carlo study to estimate the type I error rate. One could then use this information to adjust the regression coefficient and the nominal type I error rate of the test of *H*_{0} :*e* = 0 such that the actual type I error rate equaled the desired level. For instance, in example 2 in Table 1, if the nominal type I error rate in a study with a sample size of 400 were set to .00719, the actual type I error rate would be .05 (rather than the actual type I error rate of .181 if one used a nominal type I error rate of .05). Of course, the extent to which this approach would succeed depends on the accuracy of the estimates of the parameters in Equation 2. It is likely that the reliability of the *IV*, *Cov* and *DV* measures can be readily estimated in one’s sample (and, when that is not the case, may be available from other studies), and one could then use these reliability estimates to estimate the correlation between the *IV* and *Cov LV*s. However, empirical estimates of the unique association between the *Cov* and *DV LV*s may not be readily available. Relatedly, one could use the reliability estimates of the IV, Cov and DV measures to fix their measurement errors in a SEM model with single indicators and simultaneously estimate each of the parameters in the model including the unique association between the Cov and DV LVs (e.g., Hayduk, 1987; McDonald, Behson & Seifart, 2005; also see the somewhat related approach of simulated extrapolation, Carroll, Ruppert, Stefanski & Crainiceanu, 2006).

A drawback to the use of the reliability estimates in either Equation 2 or in a single indicator SEM is that these approaches are likely to be very sensitive to the reliability estimates and these estimates are known to under-estimate reliability in some conditions and over-estimate reliability in others (e.g., Zinbarg, Revelle, Yovel and Li, 2005). It is also well known that reliability estimates are sample specific.Therefore, reliability estimates obtained from other studies may also fail to accurately represent the reliabilities of the measures in the sample in hand. If the IV, Cov and DV measures are composite scores derived from multiple items, these issues could be addressed by conducting item-level SEMs with careful measurement modeling. As item-level SEM can be problematic when items have few response options and non-linear relationships with their factors (e.g., Bernstein & Teng, 1989; Little, Cunningham, Shahar & Widaman, 2002; Waller, Tellegen, McDonald & Lykken, 1996), such analyses will often benefit from using SEM approaches for categorical data (e.g., Muthén, 1984; also see Bauer & Curran, 2004) or from grouping items into parcels (e.g., Little et al., 2002). Given our earlier discussion of indicator selection in SEM and that the most commonly used reliability estimates are often inflated by correlated residual variance (e.g., Judd & Kenny, 1981), a limitation common to using reliability estimates in Equation 2, single indicator SEM, item-level SEM and parcel-level SEM is that each of these approaches will typically be vulnerable to bias arising from correlated residuals. Thus, thoughtful design involving multiple indicators carefully chosen to be heterogeneous with respect to residual variance should often lead to greater bias reduction than will the choice of data analytic approach.

A second recommendation is to conduct a sensitivity analysis to assess the extent to which biases of various sizes would change the results of the study when empirical estimates of the unique association between the *Cov* and *DV LV*s are not available (e.g., Marcus, 1997; Rosenbaum, 2002; Rosenbaum & Rubin, 1983). That is, one can determine by how much the nominal type I error rate would need to be adjusted to achieve an actual type I error rate of .05 for each of a plausible range of values of the unique association between the *Cov* and *DV LV*s (and/or for each of a plausible range of values of the reliabilities as suggested by Judd & Kenny, 1981, p. 114). The observed result might remain significant at the adjusted levels for all but the most extreme estimates of the unique association between the *Cov* and *DV LV*s. This would indicate that a conclusion that the *IV* has a unique association with the *DV* would not be biased by underadjustment for the *Cov* unless the unique association between the *Cov* and *DV LV*s is very large. Alternatively, the observed result might only remain significant at the adjusted levels associated with small estimates of the unique association between the *Cov* and *DV LV*s. This pattern would indicate that a conclusion that the *IV* is uniquely related to the *DV* would be warranted at conventional type I error rates only if the unique association between the *Cov* and *DV LV*s is small.

The pattern of adjusted significance levels and the size of the unique association between the *Cov* and *DV LV*s required to produce them will vary over studies. This is illustrated in Table 2 and Table 3 that provide examples of sensitivity analyses of the results from two hypothetical studies. In both studies, the reliabilities of the *IV* and the *DV* measures equal .90 and the correlation between the *IV* and *Cov LV*s equals .50 (thus the observed correlation between the *IV* and the *Cov* measures equals .45?). In the hypothetical study presented in Table 2, the reliability of the *Cov* measure equals .88 whereas it equals .72 in the hypothetical study presented in Table 3. In addition, the sample size in the hypothetical study presented in Table 2 is 140 whereas it equals 300 in the one presented in Table 3. Clearly the study presented in Table 3 is much more sensitive to underadjustment bias than the one presented in Table 2.

Hypothetical results of a sensitivity analysis of a study in which a = .938, b = f = .949, c = .50 and sample size is 140.

Hypothetical results of a sensitivity analysis of a study in which a = .850, b = f = .949, c = .50 and sample size is 300.

To make these examples even more concrete, imagine that the test of the regression coefficient of the unique association between the *IV* and the *DV* using a nominal type I error rate is associated with a *p* value of .020 in both studies. From the results in Table 2 we would infer that the test of the unique association between the *IV* and the *DV* in that study would remain significant unless the unique association between the *Cov* and the *DV LV*s is larger than .7. This does not entirely rule out the presence of underadjustment bias in this study as an explanation for the significant result at the nominal type I error rate of .05; bias would be present if the unique association between the *Cov* and the *DV* was greater than .7. If such a large unique association would be implausible however, it would make underadjustment bias implausible as an explanation for the significant result obtained at the unadjusted type I error rate of .05. In contrast, from the results in Table 3 we would infer that the test of the unique association between the *IV* and the *DV* in the study presented in that table would remain significant only if the unique association between the *Cov* and the *DV LV*s were smaller than .3. Unless one could compellingly argue that it is plausible to assume that the unique association between the *Cov* and the *DV LV*s is smaller than .3, underadjustment bias would remain a plausible alternative explanation for the result that was significant at the unadjusted type I error rate of .05.

A third recommendation is specific to ANCOVA and is to always closely examine the group means and standard deviations (sds) on the *Cov* and the *DV*. In the classic example presented by Lord (1967) in which ANCOVA indicates a sex difference in residualized post-test weight when the sexes did not differ in average weight gain from pre-test to post-test, an examination of the group means and sds would have made very clear to the analyst who chose to analyze the data via ANCOVA that the sex difference in weight was no greater at post-test than at pre-test and that the mean weight at post-test was the same as the mean weight at pre-test for both sexes. It would have therefore been clear that there could not have been a sex difference in average weight change over time. Thus, close examination of the group means and sds would have suggested that the ANCOVA result was misleading.

The fourth recommendation is to use great care when selecting observed indicators of their *LV*s (Little et al., 1999). This point underscores the importance of careful psychometric assessment of our measures. Errors in the *IV* and *DV* measures will attenuate the estimate of the zero-order relation between their *LV*s. Such attenuation obviously reduces statistical power which is typically rather poor in most psychological research (Cohen, 1962; Rossi, 1990; Sedlmeier & Gigerenzer, 1989). Even when we have sufficient power to detect a zero-order association between the *IV* and the *DV*, however, error in the *Cov* measure will positively bias tests of *H*_{0}:*e* = 0. Thus, when using conventional ANCOVA/APV to test *H*_{0}:*e* = 0, it is especially important to select a highly reliable *Cov* measure that does not share method variance with the DV (see example 5 in Table 1).

A fifth recommendation is to not dichotomize a continuous *Cov*. Dichotomization produces underadjustment bias (Vargha et al., 1996), a result consistent with the effects of unreliability focused on here because dichotomization is a source of measurement error.

A sixth recommendation concerns a design feature to strengthen the validity of inferences based on a conventional ANCOVA/APV, the nonequivalent dependent variable (Shadish et al., 2002). A great deal of inferential leverage can be gained by incorporating a second *DV* (*DV2*) in a study that is expected to show a unique association with *Cov* but not with the *IV*, as depicted in Figure 3. If the expected pattern of results is obtained in which the *IV* shows a reliable unique association with the first *DV* but not *DV2* and the zero-order correlation of the *Cov* with the two *DV*s are comparable in magnitude, then one can be more confident that the unique association of the *IV* with the first *DV* is not merely the result of underadjustment due to unreliability of the *Cov*. That is, the *Cov* is reliable enough in this population to account for a relationship for which one would expect the estimate of the unique association to be at least as biased as one would expect the estimate of the unique association of the *IV* and the first *DV* to be. Thus, it could be concluded with more confidence that the *IV* does have a unique association with the first *DV* above and beyond the effects of the *Cov* than would be the case in a study that did not include *DV2*.

Path diagram of a model with latent variables corresponding to an independent variable (*IV*; depression), a covariate (*Cov*; anxiety) and first dependent variable hypothesized to have a unique association with the *IV* (DV1) and a second dependent variable **...**

To make this inference, it is crucial that the zero-order correlation between the *Cov* and *DV2* is at least as great as that between the *Cov* and the first *DV*, so that the estimated unique association between the *IV LV and* the *DV2 LV is* equivalently or more vulnerable to underadjustment bias than that between the *IV LV and* the first *DV LV*. That is, assuming that there is no unique association between the IV LV and the DV2 LV, the bias in the estimate of this association would equal $\mathit{\text{bhcg}}(\frac{1-{a}^{2}}{1-{(acb)}^{2}})$. Dividing this quantity by Equation 2 to compare the size of the bias in the two estimates assuming that both unique associations equal zero yields $\frac{\mathit{\text{hg}}}{\mathit{\text{fd}}}$. Further, if DV2 correlates at least as highly with the *Cov* as does the first DV then hg ≥ fd. That is, if DV2 correlates at least as highly with the Cov as does the first DV, then the estimate of DV2’s unique association with the IV would be associated with an even more positively biased type I error rate. Thus, the lack of significant unique association with the DV2 can’t be attributed to the test of this association having a smaller inflation in its type I error rate. Of course, the conclusion would be strengthened further by showing that the IV’s unique association is significantly stronger with the first DV than with DV2 such that the conclusion is not dependent on accepting a null hypothesis. That is, it would have then been demonstrated that the unique association of the IV with the first DV is significantly larger than an association with at least as much underadjustment bias, allowing one to rule out the possibility that underadjustment bias entirely accounts for the unique association of the IV with the first DV. A strength of the *DV2* approach is that it can be used when *a*, *b* and f cannot be estimated with confidence (such as when using single-item measures).

A seventh recommendation stems from the recognition that though the ANCOVA/APV estimate will be positively biased, the ANCOVA/APV estimate will be less biased than the zero-order correlation as an estimate of the unique association of the *IV* latent variable with the *DV* latent variable when in fact no such unique association exists. In the case in which there is no unique association of the *IV* with the *DV* (e = 0), the zero-order correlation equals bfcd and Equation 2 shows that in this case *bfcd* **≥** *β _{DV,IV.Cov}* with equality holding only in the unrealistic case in which the

To illustrate this point, consider Example 3 in Table 1 in which the ANCOVA/APV estimate of the unique association of the IV with the DV is positively biased with a type I error rate of nearly 50% when sample size equals 400. The zero-order correlation between the IV and DV measures (.225) in this example would be more than twice as positively biased (Cohen’s d = .46) with a type I error rate of 100% when sample size equals 400 if it were taken as an estimate of the unique association of the IV latent variable with the DV latent variable. That is, the ANCOVA/APV estimate in this case does lead to substantial bias reduction and could be useful. To increase the usefulness of ANCOVA/APV in such cases, however, we recommend focusing less on the significance of the ANCOVA/APV estimate and more on the fact that inclusion of the Cov did result in a substantial reduction in the effect size estimate. Along these lines, it might be useful to test the significance of the Cov in accounting for at least a portion of the association between the IV and DV using techniques developed for the testing of mediation such as those developed by Mackinnon and colleagues (e.g., Mackinnon, Lockwood, Hoffman, West & Sheets, 2002), Preacher and Hayes (2004) or Shrout and Bolger (2002). When ANCOVA/APV results in a substantial reduction in effect size with the Cov accounting for a significant portion of the association between the IV and DV measures and sensitivity analyses suggest that underadjustment bias remains a plausible explanation for the significant ANCOVA/APV result, we recommend that researchers should acknowledge that the results might be taken as evidence that the zero-order correlation between the IV and DV measures is spurious and due to the confound of the IV with the Cov.

A final recommendation is to refrain from using the language of control – such as claiming an effect of the *IV* after “controlling for” the *Cov* - when discussing ANCOVA/APV results (Miller & Chapman, 2001). Phrases such as “after partialing” the *Cov* measure or “after covarying” it have less potential to foster overconfidence in ANCOVA/APV results.

In practice *OVB* may often be unavoidable as all of the relevant variables in many areas are not known and the inclusion of some, but not all, relevant variables does not necessarily reduce *OVB* (Clarke, 2005; Rubin, 2006). *Thus*, there is no simple solution to *OVB when randomization is unfeasible or unethical as is often the case in psychopathology research*. Rather, minimizing *OVB* is facilitated by the iterative process of articulating specific *OV*s that might have confounded a given result and then designing studies less vulnerable to that confound or that otherwise allow predictions derived from the original explanation to be pitted against those derived from the confounder explanation. As noted earlier, even the inclusion of a relatively unreliable *Cov* can result in a substantial reduction in bias compared with omission of the *Cov*. To illustrate this point, again consider Example 7 in Table 1 in which the reliability of the *Cov* is nearly zero and therefore approximately equivalent to the case in which the Cov had been omitted. Inclusion of a *Cov* with a reliability of .50 would have resulted in substantial bias reduction (*β _{DV,IV.Cov}* = .076 with Type I error rates ranging from .074 when n equals 60 to .239 when n equals 400).

Another practice that might help to minimize the impact of *OVB* on drawing valid inferences is to follow the recommendation of Blalock (1964) and Bollen (1989) to restrict the language of unique effects to those variables within a specific model. As *OV*s are identified and included in subsequent studies as additional covariates, the results would begin to clarify whether uniqueness could then be claimed with respect to the expanded set of covariates (Rosenbaum, 1999; Shadish & Cook, 1999).

Finally, SEM fit indices are sensitive to OVB in many cases (Tomarken & Waller, 2003). There are also tests of model misspecification (e.g., Long & Trivedi, 1993) and sensitivity analyses (e.g., Marcus, 1997; Rosenbaum, 2002; Rosenbaum & Rubin, 1983) that can be helpful in suggesting the extent of *OVB*.

One limitation of the approach taken here is that there are potentially important problems with ANCOVA/APV that were not addressed here including nonlinear associations among the *LV*s and heteroscedasticity. Another limitation is that our approach assumes a classical measurement error model. Techniques for handling other error models such as multiplicative error have been developed and may prove useful in some areas of psychopathology research (e.g., Browne, 1984; Carroll, Ruppert, Stefanski & Crainiceanu, 2006; Marsh, 1989). Almost all extant measurement error models, however, make the assumption that shared method variance exerts a positive bias on correlations. In contrast, Campbell and O’Connell (1982) have raised the provocative possibility that hetero-method correlations may have an attenuating effect and mono-method correlations may be unbiased (in a fashion analogous to that in which differential skew attenuates associations relative to associations among measures that are similarly skewed). This possibility warrants further study. In addition, we only considered type I error rate inflation when there is no unique association between the *IV* and *DV LV*s. When there is a unique association between the *IV* and *DV LV*s, underadjustment for the *Cov LV* can lead to underestimation of this unique association and inflated type II error rates (e.g., Reichardt, 1979). Such effects will arise under different circumstances than those confronting the psychopathologist concerned that a simple correlation between the *IV* and *DV* measures is due to a confounder. These circumstances may be relevant to some psychopathology research, however, and type II error rate inflation in ANCOVA/APV also warrants greater attention than it has received by psychopathologists. Finally, designs in psychopathology studies are often more complex than those involving a single *IV*, a single *Cov* and a single *DV*. The potential for bias in more complex designs is at least as great as in the simpler design considered here and at least as much caution is therefore required for interpreting ANCOVA/APV analyses of more complex designs.

Kenny (1975, p. 360), in likening the difference between true experiments and quasi-experiments to that between testimony from a sighted person and a blind person wisely noted that “when we have only the blind man, we would not dismiss his testimony, especially if he were aware of his biases and had developed faculties of touch and hearing that the sighted man could have developed but has neglected.” Unfortunately, psychopathologists rarely give evidence of the awareness of the underadjustment bias in ANCOVA/APV let alone of having made use of approaches that might help to compensate for that bias. Thus, we do not propose to “dismiss testimony” from an ANCOVA/APV. Rather, we hope to raise awareness that as psychopathologists we experience partial blindness due to our inevitable reliance on non-random assignment and we further hope that our recommendations might encourage more widespread use of strategies that can help compensate for our partial blindness.

We thank J. Michael Bailey, Emily Durbin, Lewis R. Goldberg, Michael B. Gurtman, Lynne M. Knobloch-Fedders, William Revelle, and the students in Zinbarg’s graduate seminar in clinical research methods for their comments on earlier drafts of this article and/or their discussion of the ideas contained in this article. Preparation of this article was supported by the Patricia M Nielsen Research Chair of the Family Institute at Northwestern University and by National Institutes of Health Grants R01- MH65652-01 to Richard E. Zinbarg and R01- EY014110 and EY018197 to Satoru Suzuki, and National Science Foundation grant BCS0643191 to Satoru Suzuki.

The following manuscript is the final accepted manuscript. It has not been subjected to the final copyediting, fact-checking, and proofreading required for formal publication. It is not the definitive, publisher-authenticated version. The American Psychological Association and its Council of Editors disclaim any responsibility or liabilities for errors or omissions of this manuscript version, any version derived from this manuscript by NIH, or other third parties. The published version is available at www.apa.org/pubs/journals/abn.

- Aiken LS, Stein JA, Bentler PM. Structural equation analyses of clinical subpopulation differences and comparative treatment outcomes: Characterizing the daily lives of drug addicts. Journal of Consulting and Clinical Psychology. 1994;62:488–499. [PubMed]
- Allison PD. Fixed effects regression methods for longitudinal data using SAS. Cary, NC: SAS Institute; 2005.
- Bauer DJ, Curran PJ. The integration of continuous and discrete latent variable models: Potential problems and promising opportunities. Psychological Methods. 2004;9:3–29. [PubMed]
- Bernstein IH, Teng G. Factoring items and factoring scales are different: Spurious evidence for multidimensionality due to item categorization. Psychological Bulletin. 1989;105:467–477.
- Blalock HM. Causal inferences in nonexperimental research. Chapel Hill: University of North Carolina; 1964.
- Bollen KA. Structural equations with latent variables. New York: Wiley; 1989.
- Bollen KA, Curran PJ. Autoregressive latent trajectory (ALT) models: A synthesis of two traditions. Sociological Methods and Research. 2004;32:336–383.
- Browne MW. The decomposition of multitrait-multimethod matrices. British Journal of Mathematical and Statistical Psychology. 1984;37:1–21. [PubMed]
- Campbell DT, O’Connell EJ. Methods as diluting trait relationships rather than adding irrelevant systematic variance. In: Brinberg D, Kidder L, editors. New Directions for methodology of social and behavioral science: Forms of validity in research, no. Vol. 12. San Francisco: Jossey-Bass; 1982. pp. 93–111.
- Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement error in nonlinear models: A modern perspective. 2nd edition. Boca Raton: Taylor & Francis; 2006.
- Cattell RB. The scientific use of factor analysis in behavioral and life sciences. New Yor: Plenum; 1978.
- Clark LA, Watson D. Tripartite model of anxiety and depression: Psychometric evidence and taxonomic implications. Journal of Abnormal Psychology. 1991;100:316–336. [PubMed]
- Clarke KA. The phantom menace: Omitted variable bias in econometric research. Conflict Management and Peace Science. 2005;22:341–352.
- Cochran WG. Analysis of covariance: Its nature and uses. Biometrics. 1957;44:261–281.
- Cohen J. The statistical power of abnormal-social psychological research: A review. Journal of Abnormal & Social Psychology. 1962;65:145–153. [PubMed]
- Cohen J, Cohen P. Applied multiple regression/correlation analysis for the behavioral sciences. 2nd edition. Hillsdale, NJ: Erlbaum; 1983.
- Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika. 1951;16:297–334.
- DeShon RP. A cautionary note on measurement error corrections in structural equation models. Psychological Methods. 1998;3:412–423.
- Elashoff JD. Analysis of covariance: A delicate instrument. American Educational Research Journal. 1969;6:383–401.
- Fleiss JL, Tanur JM. The analysis of covariance in psychopathology. In: Hammer M, Salzinger K, Sutton S, editors. Psychopathology: Contributions from the social, behavioral and biological sciences. New York: Wiley; 1973. pp. 509–527.
- Glynn RJ, Schneeweiss S, Sturmer T. Indications for propensity scores and review of their use in pharmacoepidemiology. Basic & Clinical Pharmacology & Toxicology. 2006;98:253–259. [PMC free article] [PubMed]
- Hayduk LA. Structural equation modeling with LISREL: Essentials and advances. Baltimore, MD: Johns Hopkins Press; 1987.
- Hertzog C, Nesselroade JR. Assessing psychological change in adulthood: An overview of methodological issues. Psychology and Aging. 2003;18:639–657. [PubMed]
- Horn JL, McArdle JJ. A practical and theoretical guide to measurement invariance in aging research. Experimental Aging Research. 1992;18:117–144. [PubMed]
- Huitema B. Analysis of covariance and alternatives. New York: Wiley; 1980.
- Judd CM, Kenny DA. Estimating the effects of social interventions. New York: Cambridge University Press; 1981.
- Kahneman D. Control of spurious association and the reliability of the controlled variable. Psychological Bulletin. 1965;64:326–329. [PubMed]
- Kenny DA. A quasi-experimental approach to assessing treatment effects in the nonequivalent control group design. Psychological Bulletin. 1975;82:345–362.
- Kenny DA. Correlation and causality. New York: Wiley; 1979.
- Little TD, Cunningham WA, Shahar G, Widaman KF. To parcel or not to parcel: Exploring the question, weighing the merits. Structural Equation Modeling. 2002;9:151–173.
- Little TD, Lindenberger U, Nesselroade JR. On selecting indicators for multivariate measurement and modeling with latent variables: When “good” indicators are bad and “bad” indicators are good. Psychological Methods. 1999;4:192–211.
- Long JD, Trivedi PK. Some specification tests for the linear regression model. In: Bollen KA, Long JS, editors. Testing structural equation models. Newbury Park, CA: Sage; 1993.
- Lord FM. Large-sample covariance analysis when the control variable is fallible. Journal of the American Statistical Association. 1960;55:307–321.
- Lord FM. A paradox in the interpretation of group comparisons. Psychological Bulletin. 1967;68:304–305. [PubMed]
- Lord FM. Statistical adjustments when comparing preexisting groups. Psychological Bulletin. 1969;72:336–337.
- Mackinnon DP, Lockwood CM, Hoffman J, West SG, Sheets V. A comparison of methods to test mediation and other intervening variable effects. Psychological Methods. 2002;7:83–104. [PMC free article] [PubMed]
- Marcus SM. Using omitted variable bias to assess uncertainty in the estimation of an AIDS education treatment effect. Journal of Educational and Behavioral Statistics. 1997;22:193–201.
- Maxwell SE, Delaney HD. Designing experiments and analyzing data: A model comparison approach. 2nd edition. Mahwah, N.J: Erlbaum; 2004.
- Maxwell SE, Delaney HD, Manheimer JM. ANOVA of residuals and ANCOVA: Correcting an illusion by using model comparisons and graphs. Journal of Educational Statistics. 1985;10:197–209.
- McDonald RA, Behson SJ, Seifert C. Strategies for Dealing with Measurement Error in Multiple Regression. Journal of Academy of Business and Economics. 2005;53:80–97.
- Miller GA, Chapman JP. Misunderstanding analysis of covariance. Journal of Abnormal Psychology. 2001;110:40–48. [PubMed]
- Muthén BO. A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika. 1984;46:115–132.
- Porter AC, Raudenbush SW. Analysis of covariance: Its model and use in psychological research. Journal of Counseling Psychology. 1987;34:383–392.
- Preacher KJ, Hayes AF. SPSS and SAS procedures for estimating indirect effects in simple mediation models. Behavior Research Methods, Instruments, & Computers. 2004;36:717–731. [PubMed]
- Reichardt CS. The statistical analysis of data from nonequivalent group designs. In: Cook T, Campbell D, editors. Quasi-Experimentation: Design & analysis issues for field settings. Boston, MA: Houghton Mifflin; 1979. pp. 147–205.
- Rogosa D. Myths and methods: "Myths about longitudinal research" plus supplemental questions. In: Gottman J, editor. The analysis of change. Mahwah, NJ: Erlbaum; 1995. pp. 3–65.
- Rogosa D, Brandt D, Zimowski M. A growth curve approach to the measurement of change. Psychological Bulletin. 1982;90:726–748.
- Rosenbaum PR. Choice as an alternative to control in observational studies. Statistical Science. 1999;14:259–278.
- Rosenbaum PR. Observational studies. 2nd edition. New York: Springer; 2002.
- Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55.
- Rossi J. Statistical power of psychological research: What have we gained in 20 years? Journal of Consulting & Clinical Psychology. 1990;58:646–656. [PubMed]
- Rubin DB. Estimating causal effects from large data sets using propensity scores. Annals of Internal Medicine. 1997;127:757–763. [PubMed]
- Rubin D. Matched Sampling for Causal Effects. New York: Cambridge Univ. Press; 2006.
- Sedlmeier P, Gigerenzer G. Do studies of statistical power have an effect on the power of studies? Psychological Bulletin. 1989;105:309–316.
- Shadish W, Cook T. Comment-design rules: More steps toward a complete theory of quasi-experimentation. Statistical Science. 1999;14:294–300.
- Shadish W, Cook T, Campbell D. Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin; 2002.
- Sher KJ, Trull TJ. Methodological issues in psychopathology research. Annual Review of Psychology. 1996;47:371–400. [PubMed]
- Shrout PE, Bolger N. Mediation in experimental and nonexperimental studies: New procedures and recommendations. Psychological Methods. 7:422–445. [PubMed]
- Singer JD, Willett JB. Applied longitudinal data analysis: Modeling change and event occurrence. New York: Oxford University Press; 2003.
- Sorbom D. An alternative to the methodology for analysis of covariance. In: Joreskog KG, Sorbom D, editors. Advances in factor analysis and structural equation models. Lanham, MA: University Press of America; 1979.
- Tomarken AJ, Waller NG. Potential problems with “well fitting” models. Journal of Abnormal Psychology. 2003;112:578–598. [PubMed]
- Vargha A, Rudas T, Delaney HD, Maxwell SE. Dichotomization, partial correlation, and conditional independence. Journal of Educational and Behavioral Statistics. 1996;21:264–282.
- Wainer H. Adjusting for differential base rates: Lord’s Paradox again. Psychological Bulletin. 1991;109:147–151. [PubMed]
- Waller NG, Tellegen A, McDonald RP, Lykken DT. Exploring nonlinear models in personality assessment: Development and preliminary validation of a negative emotionality scale. Journal of Personality. 1996;64:545–576.
- Wildt AR, Ahtola OT. Analysis of covariance. Beverly Hills, CA: Sage; 1978.
- Willett J. Questions and answers in the measurement of change. Review of Research in Education. 1988;15:345–422.
- Williams RH, Zimmerman DW. Are simple gain scores obsolete? Applied Psychological Measurement. 1996;20:59–69.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |