Structural Equation Modeling (SEM) has been widely used in sociological, psychological, and social science research. One of the appealing attributes of SEM is that it allows for tests of theoretically derived models against empirical data. For researchers using SEM techniques, evaluation of the fit of a hypothesized model to sample data is crucial to the analysis. A key feature of SEM is the test of the null hypothesis of

**∑** =

**∑** = (

**θ**), also known as the test of exact fit, where

**∑** is the population covariance matrix,

**∑**(

**θ**) is the covariance matrix implied by a specific model, and

**θ** is a vector of free parameters defined by the model. The model test statistic

*T* enables an asymptotic test of the null hypothesis of H

_{0}:

**∑** =

**∑** (

**θ**). A significant

*T*, often reported as the model chi-square, would suggest misspecification of the model. However, such a test of exact fit of the proposed model is generally unrealistic, as hardly any model using real data is without error (e.g.,

Browne and Cudeck 1993). A trivial misspecification, particularly with large sample sizes, can lead to rejection of the model even when it may otherwise adequately reproduce the population covariance matrix.

As a result, a variety of goodness-of-fit measures was developed to augment the

*T* statistic.

Steiger, Shapiro, and Browne (1985) demonstrated that the

*T* statistic does not follow a central χ

^{2} distribution under misspecification. Instead, it follows a noncentral χ

^{2} distribution, with the noncentrality parameter λ (estimated by

*T* −

*df*) denotes the degree of misfit in the model. Several baseline fit indices make use of the noncentrality parameter (e.g., Tucker-Lewis index [TLI], relative noncentrality index [RNI], comparative fit index [CFI]) (

Bentler 1990,

Goffin 1993), which are essentially comparisons between λ

_{T} and λ

_{B}, with λ

_{T} measuring the amount of misfit of the target model and λ

_{B} that of the baseline model. However, such measures are heavily dependent on the baseline null model. In addition, these measures were found to be particularly susceptible to the influence of estimation methods (

Sugawara and MacCallum 1993) and do not utilize the feature that the known distribution of the

*T* statistics actually allows for hypothesis testing through construction of confidence intervals around λ.

Originally presented by

Steiger and Lind (1980) and popularized by

Browne and Cudeck (1993), the root mean square error of approximation (RMSEA) measure is closely tied to the noncentrality parameter λ, which is estimated in a sample as

=

*T* −

*df*, reflecting the degree of misfit in the proposed model. If

*T* −

*df* is less than zero, then

is set to zero. The estimate of RMSEA (

) uses

and is given as follows:

It ranges from zero to positive infinity, with a value of zero indicating exact model fit, and larger values reflecting poorer model fit.

A key advantage of the RMSEA is that confidence intervals can be constructed around the point estimate because the RMSEA asymptotically follows a rescaled noncentral χ

^{2} distribution for a given sample size, degrees of freedom, and noncentrality parameter λ. The confidence interval is as follows:

where

_{L} and

_{U} are the specific lower and upper values that define the limits of the desired interval (

Browne and Cudeck 1993, Equation 14).

Researchers can use the RMSEA in two ways to assess model fit. The first is simply to examine the point estimate and to compare it with an arbitrary fixed cutoff point. The second is to conduct a more formal hypothesis test, by jointly considering the point estimate and its associated confidence interval. There are three such types of hypothesis tests available (

MacCallum, Browne, and Sugawara 1996). The first type of test is the test of exact fit, with the null hypothesis as H

_{0}: ε = 0, where ε is the population value of the RMSEA. The null hypothesis is rejected if the lower CI (confidence interval) is greater than zero. This corresponds to the standard χ

^{2} test given above. The second type of test is the test of close fit, with the null hypothesis being H

_{0}: ε ≤

*c*, where

*c* is an arbitrary constant. The null hypothesis is rejected if the test statistic exceeds a cutoff value

*c* that defines an area α in the upper tail of the noncentral chi-square distribution (i.e., if the lower CI is greater than

*c*). Retention of the null hypothesis supports the proposed model, while rejection of the null suggests a poor fit of the model. The test of close fit is sometimes considered more realistic than the test of exact fit (

MacCallum et al. 1996).

MacCallum et al. (1996) proposed a third type of test, the test of not close fit, with the null hypothesis as H

_{0}: ε ≥

*c*. They contended that it was not easy to argue for support of a model with either the test of exact fit or the test of close fit, because failure to reject the null hypothesis (indicating a good model fit) merely suggested the absence of strong evidence against it. In this test, the null hypothesis is rejected if the test statistic is below a value that cuts off an area α in the lower tail of the noncentral chi-square distribution (i.e., if the upper CI is less than or equal to the arbitrary constant

*c*).

Whether the researcher uses the point estimate alone or adopts the hypothesis testing framework by jointly considering the point estimate and its related CI, choosing the optimal cutoff point (

*c*) is of utmost importance in the success of the RMSEA as a measure for goodness-of-fit.

Browne and Cudeck (1993:144) recommended that “a value of the RMSEA of about 0.05 or less would indicate a close fit of the model in relation to the degrees of freedom,” and that “the value of about 0.08 or less for the RMSEA would indicate a reasonable error of approximation and would not want to employ a model with a RMSEA greater than 0.1.” Similar guidelines were recommended by

Steiger (1989). However, both

Browne and Cudeck (1993) and

Steiger (1989) warned the researchers that these cutoff points were subjective measures based on their substantial amount of experience. Similarly, in their power analysis of different hypothesis tests using the RMSEA,

MacCallum et al. (1996) used 0.01, 0.05, and 0.08 to indicate excellent, good, and mediocre fit respectively but clearly emphasized the arbitrariness in the choice of cutoff points.

Hu and Bentler (1999) recommended the test of RMSEA > .05 (or .06) as one of the alternative tests for detecting model misspecification, although they noted the test tended to over-reject at small sample size.

Marsh, Hau, and Wen (2004) further cautioned researchers about using the cutoff criteria provided by

Hu and Bentler (1999) as “golden rules of thumb,” particularly due to their limited generalizability to mildly misspecified models. Other researchers echoed the point by suggesting that the use of precise numerical cutoff points for RMSEA should not be taken too seriously (

Hayduk and Glaser 2000,

Steiger 2000).

Nonetheless, the cutoff point of 0.05 has been widely adopted as the “gold standard” in applied research settings. Indeed, various SEM computer programs such as LISREL, AMOS, Mplus, and PROC CALIS (SAS) now offer a test of close fit on the probability of ε ≤ 0.05. In addition, despite the known imprecision in using the point estimate alone, it is a popular measure of fit widely adopted by the researchers. How reasonable are these current practices? It is against this backdrop that we conduct the current study. We contend that an empirical evaluation of the choice of fixed cutoff points is essential in the assessment of the success of the RMSEA as a measure of goodness-of-fit. Using data from a large simulation experiment, we first examine whether there is any empirical evidence for the use of a universal cutoff, whether it be 0.05 or any other value. We want to stress that our goal is *not* to develop a new recommended cutoff point; instead, we wish to highlight ranges of values that can be consulted in practice and to provide empirically based information to applied research for the valid and thoughtful use of the RMSEA in practice.

We also examine whether the limitations of using a point estimate are overcome by considering the CI in addition to the fixed cutoff point. While the theoretical imprecision in using a point estimate alone has been well argued by

Browne and Cudeck (1993) and

MacCallum et al. (1996), the attraction for researchers can be easily understood because of its parsimony. Alternatively, using the point estimate and its CI is more complicated. For example, one almost always must consider the power of the test within the framework of hypothesis testing, thus making it necessary to take into account sample size, degrees of freedom and other model characteristics (see

MacCallum et al. [1996],

Nevitt and Hancock [2000], and

Hancock and Freeman [2001] for discussions on the power assessment of different tests as well as recommendations for applied researchers). Hence, it is extremely useful for the applied researchers to know empirically the extent of difference between these two approaches in terms of their success in model fit assessment. By directly comparing these two practices, we hope to provide some practical guidance to applied SEM researchers for the optimal use of this fit statistic.

There are a number of well-designed Monte Carlo simulation studies examining the performance of SEM fit indices, including the RMSEA (

Hu and Bentler 1998;

Fan, Thompson, and Wang 1999;

Kenny and McCoach 2003;

Nasser and Wisenbaker 2003). However, much attention was paid to finite sampling behavior of the point estimate but not to the corresponding CI or the use of RMSEA in hypothesis testing. An exception is a study by

Nevitt and Hancock (2000), which compared the rejection rates of tests of exact fit, close fit, and not close fit using the RMSEA for nonnormal conditions in SEM. While the study provided important information on comparisons of the three hypothesis tests using the RMSEA, our study departs from it in several major ways.

First,

Nevitt and Hancock (2000) used the critical value of 0.05 throughout their hypothesis tests. As we argued earlier, we consider this specific cutoff value to be arbitrary and in need of further empirical investigation. In particular, we are interested in how the choice of the cutoff point c affects the performance of the RMSEA, whether it is used as a point estimate alone or used jointly with its related CI. Second,

Nevitt and Hancock (2000) used an oblique confirmatory factor analysis model (CFA) as the base underlying population model in their Monte Carlo study. To expand on the external validity, we incorporate a range of general SEM model types that are commonly used in social science research in our simulation study. Third,

Nevitt and Hancock (2000) considered one properly specified and one misspecified model. We are interested in how the degree of misspecification can affect the power of the tests. Thus, we studied three types of properly specified models and nine types of misspecified models. In addition, as

Nevitt and Hancock (2000) acknowledged, it was difficult to evaluate the hypothesis tests for misspecified models under nonnormal conditions because the true lack of fit in the population for the misspecified models is due to both misspecification and nonnormality. To avoid the confounding effects of nonnormality, we generate our variables from a multivariate normal distribution. The violation of normality assumption is obviously an important question and occurs often in research, but this is beyond the scope of the current study.

Most important, it is not our goal to directly compare the performance of the test of close fit and not close fit. We are interested in comparing the practice of using the point estimate *alone* versus that of using the point estimate *jointly* with its related CI, when a fixed cutoff point is used in the test. We believe that it is unnecessarily confusing to compare the performance of the test of close fit and not close fit when we are considering both properly specified and misspecified models in the analysis. The meaning of model rejection is the opposite for the test of close fit and not close fit, with the former suggesting a good fit of the model and the latter indicating a poor fit of the model. In addition, the test of close fit is the one that is readily available through various SEM packages, thus making the investigation most relevant to practical researchers. So, in this article, we propose two tests that make use of the CI in a consistent way, using the criteria of lower bound of CI ≤ 0.05 and upper bound of CI ≤ 0.1 as two candidate cutoff values. Rejection of the model thus suggests a poor fit in both tests.

We also want to make a note that the focus of the article is exclusively on how well sample estimates of the RMSEA and its related CIs perform in applied social science research settings. Although the sampling distributions of the RMSEA are known asymptotically, the assumptions of no excess multivariate kurtosis, adequate sample size, and errors of approximations being not “great” relative to errors of estimation are often violated in applied research. Therefore, it is critical to understand the sampling characteristics of the RMSEA point estimates and CIs when the conditions are not met. Indeed, the success of the RMSEA as a measure for model fit also depends on whether the test statistic

*T* indeed follows a noncentral chi-square distribution. Recent research shows that the noncentral χ

^{2} approximation is conditioned on factors such as sample sizes, degrees of freedom, distribution of the variables, and the degree of misspecification (

Olsson, Foss, and Breivik 2004,

Yuan 2005;

Yuan, Hayashi, and Bentler 2007). Findings from our own research team have indicated that the noncentral chi-square distribution is generally well approximated and that the sample RMSEA values and CIs appear to be unbiased estimates of the corresponding population values, at least for models with small to moderate misspecification, and when the sample size is reasonably large (

Curran et al. 2002;

Curran et al. 2003). Future Monte Carlo studies are needed to specifically investigate the conditions under which the noncentral chi-square distributions are not followed.