|Home | About | Journals | Submit | Contact Us | Français|
A demographic measure is often expressed as a deterministic or stochastic function of multiple variables (covariates), and a general problem (the decomposition problem) is to assess contributions of individual covariates to a difference in the demographic measure (dependent variable) between two populations. We propose a method of decomposition analysis based on an assumption that covariates change continuously along an actual or hypothetical dimension. This assumption leads to a general model that logically justifies the additivity of covariate effects and the elimination of interaction terms, even if the dependent variable itself is a nonadditive function. A comparison with earlier methods illustrates other practical advantages of the method: in addition to an absence of residuals or interaction terms, the method can easily handle a large number of covariates and does not require a logically meaningful ordering of covariates. Two empirical examples show that the method can be applied flexibly to a wide variety of decomposition problems. This study also suggests that when data are available at multiple time points over a long interval, it is more accurate to compute an aggregated decomposition based on multiple subintervals than to compute a single decomposition for the entire study period.
Demographers often wish to compare two populations—either two populations at the same moment or the same population at two points in time—in terms of some variable of interest. Measures of most demographic processes (e.g., fertility, mortality, migration, marriage) have changed significantly over time and show considerable variation across populations or groups (e.g., race/ethnicity, nationality, sex, and region of residence). Such a measure can often be expressed as a function of several covariates (with or without an error term) and is thus regarded as the dependent variable. A general problem (the decomposition problem) is to assess contributions of changes or differences in the covariates between the two populations to the corresponding change or difference in the dependent variable. For example, women tend to live longer than men, and we may inquire about the relative contribution of differences in death rates by age and cause to the sex difference in life expectancy (Arriaga 1984; Pollard 1982, 1988). Likewise, the mean completed parity has increased or decreased in various contexts, and we may wish to express these temporal changes as functions of trends in parity progression ratios (Pullum, Tedrow, and Herting 1989).
As Das Gupta (1991) noted, there are two fundamentally different types of decomposition problems, depending on whether the populations involved are treated as homogeneous or heterogeneous with respect to the dependent variable of interest and its covariates. In the first type of decomposition problem, the dependent variable is described as a function of covariates; only a single set of values for the dependent variable and its covariates is known for a population at any given moment, and thus the change in the dependent variable is decomposed into effects due to the change in each covariate for the population as a whole. In the second type, the dependent variable takes on different values for population subgroups; these subgroups are defined by the associated set of covariate values, and thus the change in the population mean of the dependent variable is decomposed into effects due to changes in the dependent variable within the various subgroups and effects due to changes in the population distribution across the subgroups. The distinction between these two types will be presented more formally later in the article (Eqs. (13) – (15)). Various decomposition methods of these two types are reviewed by Canudas Romo (2003).
This article focuses on the first type, in which the difference in a dependent variable is expressed as a sum of the effects of differences in its covariates. (Hereafter, “decomposition” means the first type only, unless specified otherwise.) Decomposing the difference in the dependent variable is straightforward if it can be written as an additive function of its covariates. For example, the total fertility rate (TFR) is the sum of age-specific birth rates. Therefore, the contribution to a change in the TFR that is attributable to a change in the birth rate at some age is merely the change in the age-specific birth rate itself. Likewise, a difference in the dependent variable of any standard linear regression equation can be expressed as an additive function of differences in the covariates and the error term. However, many demographic measures cannot be expressed in a simple additive format. This problem has stimulated the development of various decomposition methods that deal with nonadditive relationships.
Previous approaches to decomposition problems have been based on some manner of discrete change in the value of each covariate from the first population to the second. In this article, we propose a method of decomposition analysis relying on an assumption that values of the covariates change continuously, or gradually, along an actual or hypothetical dimension. This assumption seems like the natural choice for time-trend analyses because many variables change gradually over time. As we discuss later, this assumption provides a reasonable justification for the additivity of covariate effects, which is a fundamental condition for decomposition analysis, and also for the elimination of interaction effects, which was an important issue in many previous decomposition studies.
The proposed method also requires an assumption about the relationship between covariates as they change gradually between two observation points. A convenient assumption is that changes in the covariates are proportional to one another. In other words, equal proportions of the total change in each covariate are assumed to occur simultaneously. This assumption has a precise mathematical specification, as described later in the article.
Although a number of previous methods were developed for specific dependent variables (e.g., life expectancy, mean completed parity, proportion of population in old age), this method can be applied to any type of dependent variable and its covariates, so long as the former is a differentiable function of the latter. Thus, the method is not limited to demography but applicable in any scientific fields that are concerned about the difference between two observations of a function of multiple variables. The relationship between the dependent variable and its covariates can be deterministic or stochastic.
This method was devised as a generalization of two methods that had been independently developed for some specific dependent variables (Pletcher, Khazaeli, and Curtsinger 2000; Wilmoth and Horiuchi 1999). We and our collaborators used this method in a few recent studies (Glei and Horiuchi 2007; Wilmoth et al. 2000), but their reports described the method very briefly and cited an early, unpublished version of this article for methodological details. In what follows, we describe the method, present two examples of its application, and compare the method with previous methods.
Let us start by clarifying the meanings of some key terms. In this article, x is called a covariate of y if y can be expressed as a mathematical function of x (and some other variables), regardless of whether x is associated with y through some causal pathway. An effect of x on y is a change or difference in y produced by a change or difference in x. (In some cases, we write “effect of the change (difference) in x” instead of “effect of x” to emphasize the variation in x.) A change or difference in y is decomposed by expressing it as the sum of effects of its covariates (and in some cases, we include additional terms such as interaction effects and residuals).
In decomposition analysis, effects of the covariates are assumed to be additive, even though the dependent variable is not usually an additive function of the covariates. (If it is an additive function, decomposition is simple and no special method is needed.) In the previous literature, it has been unclear whether this apparent paradox (nonadditive function of covariates, yet additive covariates effects) is justifiable, or if the decomposition is simply a computational trick without a firm theoretical foundation. The method proposed here is based on a mathematical model that logically justifies the additivity of covariate effects.
Suppose that a population is described by a numerical characteristic y, which is a differentiable function f of n covariates denoted by x = [x1,x2, . . . ,xn]. Assume that both y and x depend on an underlying dimension t, which is typically time, and that observations of y and x are available at two points, t1 and t2. Assume also that x is a differentiable vector function of t between t1 and t2. Then, since
by the fundamental theorem of calculus. By applying the chain rule for partial derivatives of a composite function, we obtain
Exchange of the integration and the summation, and application of the substitution rule of definite integrals lead to the following equation:
Writing y(t1), y(t2), xi(t1), and xi(t2), as y1, y2, xi1, and xi2, respectively, and dropping the t in y(t) and xi(t), the difference in y between t1 and t2 can be expressed as follows:
In this notation, ci is the total change in y produced by changes in the i-th covariate, xi. Thus ci can be considered the effect of xi on y.
The preceding discussion provides a general theoretical foundation for decomposition analysis because it implies that even if a dependent variable is not an additive function of its covariates, a change in the dependent variable can be expressed as a sum of effects of the covariates. Geometrically, the vector x(t) is a point in an n-dimensional space, and the difference between y(t1) and y(t2) is an integral of the change in y along a curve on which the point moves from x(t1) to x(t2). The integral can be split into n additive components, as shown in Eq. (3). This type of integral is called line integral, which is widely used in mechanics (Williamson, Crowell, and Trotter 1968). In this sense, the mathematical framework represented by Eqs. (1)–(5) may be labeled the line integral model of decomposition.
It is important to note that regression analyses and decomposition analyses are based on different notions of the “effect” of a covariate on a dependent variable. The linear regression coefficient for xi shows a change in y produced by a unit change in xi, y / xi. A decomposition analysis estimates a change in y produced by a particular change in xi from t1 to t2, which can be expressed as .
The computational procedure of this method is essentially a combination of the delta method (i.e., the approximation of small finite changes by derivatives) and numerical integration.1 A change in the dependent variable can be considered an accumulation of many small changes. Each of these small finite changes in the dependent variable can be approximated by a linear combination of n partial derivatives of the dependent variable with respect to the covariates. Then the additive terms of the linear combinations can be aggregated for each of the n covariates over the path of entire change.
To compute these partial derivatives, we need some information about the trajectory of the curve in the n-dimensional space (i.e., the joint patterns of change in the xi’s between t1 and t2). For a given i, the partial derivative within the integral of Eq. (4) depends not only on xi but also on xj for all j ≠ i. Therefore, in order to perform the desired calculations, we need some means of specifying the intermediate values of xj between xj(t1) and xj(t2) that are associated with a given level of xi. This necessity leads us to consider possible assumptions about connections among changes in the various covariates.
Probably the simplest assumption is that xi(t) changes linearly as a function of t:
for all i and any t between t1 and t2. With this assumption, it is possible to compute y / xi over the range of xi1 to xi2, and then to obtain values of ci by numerical integration. However, a more general assumption is that changes in xi(t) are merely proportional to changes in the other covariates. Formally, assume there exists a continuous function g(t) such that
for all i and t [t1,t2]. Note that g(t1) = 0, and g(t2) = 1. With this assumption, given some intermediate value of xi (i.e., xi between xi1 and xi2), corresponding values of xj, for j ≠ i, are known as well by Eq. (7). Thus, based on nothing more than an assumption of proportional changes in the covariates, it is possible to compute values of ci. Furthermore, the exact form of g(t) does not matter, so long as it changes from 0 to 1 in a continuous (but not necessarily monotonic) fashion. This is true because in Eq. (4)y is differentiated with respect to xi, not with respect to t, and because, for any function g(t), the relationships among the xi’s are identical.
The proportionality assumption, presented as Eq. (7), is equivalent to assuming that the curve between x(t1) and x(t2) is a straight line. Lacking information about the true path between the two data points, this linear path is justified by the principle of Occam’s razor.2
Eq. (7) can be easily adapted to specific decomposition problems. The assumption of proportionality can be applied to the covariates in their original scale or to some transformations thereof. For example, we may assume proportional changes in xi and log xj, rather than xi and xj.
With the simple assumption of Eq. (7), the ci’s in Eq. (5) can be found by numerical integration (i.e., by dividing xi2 − xi1 into N intervals, evaluating y / xi at the midpoint of each interval, and summing as appropriate; see the Appendix for a more detailed description of this procedure). As N increases, the proportional error in , where ĉi is an estimate of ci found by numerical integration, approaches zero. Thus, this sum should equal y2 − y1, which is a known quantity. The proportional error is computed as
and we can select a value of N that makes ε practically zero.
Additional information on the trajectory, if available, will help to improve the accuracy of the assumed trajectory and, in turn, the accuracy of decomposition results. In some cases, we have observations at several intermediate points on the curve between the initial and end points. For example, if we want to decompose some demographic change between the 1950 and 2000 censuses in a country with decennial censuses, we should decompose the change in each of the five decades between 1950 and 2000 by assuming proportional changes during the 10-year period and then aggregate the five sets of decomposition results, instead of decomposing the change in the entire 50-year period directly by assuming proportional changes from 1950 to 2000. (This will be illustrated later with some empirical examples.)
A few words are needed about interaction effects. If a dependent variable is not an additive function of its covariates, the effect of an individual covariate often depends on values of the other covariates. In regression analysis, this interdependency among the effects of different covariates is called interaction. Similarly, in some previous decomposition studies, if the sum of covariate effects (also called main effects) did not match y2 − y1, the discrepancy was called an interaction effect. However, such interaction effects are more difficult to interpret than simple main effects; furthermore, they represent an incomplete separation of the contributions of individual covariates to the overall change (or difference) in a dependent variable. For these reasons, it has usually been considered desirable to reallocate the interaction effect among the main effects (Das Gupta 1993: chap. 1)
In previous studies, decomposition was based on a discrete change of each covariate from the first population to the second, while holding constant the other covariates at certain levels. In order to avoid interaction effects, these constant values must be selected in such a way that the main effects add up exactly to y2 − y1. However, the method proposed here relies on an assumption of gradual changes in the covariates, which makes it impossible for any interaction effect to enter the decomposition equation. From this viewpoint, interaction effects in decomposition studies are merely the result of insufficient information. If we know all details of the (continuous) transition process between the two populations, the change from y1 to y2 can be described in the additive format of Eq. (5), which fully separates the effects of individual covariates without any interaction component.
A distinction can be made between two kinds of decomposition problems to which the present method can be applied. In the first case, the dependent variable and its covariates change gradually between two sets of observations, typically as a function of time, and thus the variable t refers to some real dimension of change. In the second case, however, the two sets of observations refer to populations that are qualitatively different (e.g., males and females), and thus t is merely a hypothetical underlying dimension. In applying this method to the latter case, we assume implicitly that y and x change gradually between two qualitatively different populations—as if they were changing over time—even though actual changes are discrete, not continuous.
Furthermore, decomposition analyses (here and in general) can be classified according to the form of the underlying functional relationship. Although deterministic relationships are assumed in Eqs. (1)–(5) and (8), the proposed method can easily be extended to probabilistic models. For example, if y = ŷ + e, where ŷ is some function of x and e is an observed value of some random variable, then a change or difference in y can be decomposed into ci’s and the change or difference in e. In this case, the y’s in Eqs. (1)–(5) and (8) are replaced by the corresponding ŷ’s.
This section presents two applications of the proposed method to show that it can be used in different ways for different purposes. In Example 1, we decompose changes over time in three summary measures of mortality (the median, mean, and standard deviation of ages at death in the life table) into effects attributable to changes in death rates at various ages. In most previous decompositions of life table quantities, the sole variable of interest was the mean age at death, or life expectancy at birth e0 (Arriaga 1984; Carlson 2006; Pollard 1982, 1988; Ponnapalli 2005; Vaupel and Canudas Romo 2003). Example 1 shows that the proposed method can be applied not only to the mean but also to the median, the standard deviation, and other summary measures, such as the interquartile range. (However, it would not be appropriate to apply this method in a similar way to the modal age at death, which is not a differentiable function of age-specific death rates.)
In Example 2, the regional difference in self-reported health between Minnesota and Mississippi is decomposed into effects of some socioeconomic and behavioral characteristics. The dimension t is time in Example 1 but a hypothetical dimension in Example 2. Example 1 deals with deterministic relationships between the dependent variable and its covariates, but Example 2 illustrates an application of the proposed method to a relationship that includes a stochastic error term.3
After World War II, the level of mortality in Japan declined at an unprecedented pace. The age distribution of deaths in the life table has shifted to older ages, raising the median and mean ages at death considerably. In addition, the mortality decline in Japan reduced the standard deviation of ages at death by lowering the proportion of deaths at young ages and concentrating deaths into old ages (Wilmoth and Horiuchi 1999).
However, the increases in the median and mean and the decrease in the standard deviation proceeded differently (Figure 1). Although changes in these measures generally slowed, the deceleration was most pronounced for the decline of the standard deviation, followed by the rise of the mean, but was modest for the rise of the median age.
We used the decomposition method in order to investigate reasons for these somewhat different trends among the median, mean, and standard deviation. Some methods were developed previously for decomposing changes or differences in life expectancy, but to our knowledge no comparable technique has been proposed for the median or standard deviation. The present method can be used flexibly to decompose various summary measures of the life table into effects of age-specific, or age- and cause-specific death rates.
We decomposed changes in these three measures into effects due to changes in death rates by single years of age (0, 1, 2, . . . , 102, 103, and 104+). The method was applied to each of the 54 pairs of successive years between 1950 and 2004. Changes in the logarithms of the 105 age-specific death rates between two successive years were assumed to be proportional to each other.4
Thus, the decomposition of changes in life expectancy, for example, was based on the following equations. The effect of the death rate for the i-th age group on the change in life expectancy e0(t) from the period t1 to the next period t2 can be calculated as
where Mi(t) is the death rate for the i-th age group at time t (i = 1, 2, . . . , n). The partial derivative in Eq. (9) can be obtained numerically from
where f indicates an algorithm that transforms the vector of death rates into the value of life expectancy at birth. The numerical integration relies on the following assumption:
for any pair of age groups i and j and for any t between t1 and t2. The decomposition of changes in the median age and that of changes in the standard deviation were done in a similar manner.
The number of intervals (N) used for numerical integration was set at 20 for each pair of successive years. The proportional errors of the various decompositions (ε’s) were very small: the maximum ε for the 54 pairs of period life tables was 0.1% for the median, 0.005% for the mean, and 0.001% for the standard deviation.
In Table 1, the decomposition results are aggregated for three 18-year time periods and four broad age categories: 0 (infants), 1–14 (children), 15–64 (adults), and 65 and older (the elderly). The 12 (3 × 4) effects for each measure in Table 1 are actually a summary of 5,670 (54 × 105) computed effects. The major findings of this analysis can be summarized as follows: (1) for each of the three measures, the effects of changing infant and child mortality diminished over time, resulting in decelerating rates of change in the summary measure; (2) nevertheless, the median and mean ages continued to rise noticeably, thanks to the growing significance of mortality reduction at older ages; (3) in contrast, the trend in the standard deviation virtually leveled off, since the effect of old-age mortality reductions on the standard deviation was small or even positive (because mortality reduction at older ages stretches out the upper tail of the distribution of ages at death); and (4) the rise of the median age in earlier periods was less pronounced than the rise of the mean age, since reductions in infant and child mortality affected the median much less than the mean. In summary, the decomposition analysis shows that age-specific death rates affected trends in these three measures of mortality in noticeably different ways.
Health conditions differ substantially by region. For instance, the proportion of residents whose health conditions are reported as “fair” or “poor” varies among U.S. states. The age-adjusted proportion for adults above age 18 in 2003–2005 ranges from 10.9% in Minnesota and New Hampshire to 23.1% in Mississippi and West Virginia (National Center for Health Statistics 2007). What factors account for the difference between, for example, Minnesota and Mississippi?
In order to investigate regional differences in health status, we assume that the age-adjusted proportion of population in the state whose health conditions are reportedly fair or poor, denoted by θ, can be expressed as
where x is a row vector of covariates including a constant of 1, β is a column vector of their coefficients, and ε is an independent random variable that is normally distributed with a mean of 0 and the same variance for each state.
State-level data on self-reported health as well as some socioeconomic and lifestyle characteristics were downloaded from the Web sites of the U.S. Census Bureau (2007) and the National Center for Health Statistics (2007). The regression coefficients were estimated from data for the 50 states and the District of Columbia around 2005 by minimizing the squared errors of the following model: logit(θ) = xβ + e. (Although this estimation procedure appears similar to the usual form of logistic regression, it is fundamentally different because the dependent variable here is not binary but continuous between 0 and 1, and the coefficients are not estimated on the basis of maximum likelihood and binomial distributions.)
Nine covariates were included in the initial model of Eq. (12), but the correlation matrix of those covariates included a number of notably high values. After stepwise removal of variables whose coefficients seemed to be strongly affected by the multicollinearity problem, four covariates remained in the final model (R2 = .89): the proportion of persons aged 25 years and older who completed high school (including equivalency), the proportion of persons aged 18–64 who are not covered by health insurance, the age-adjusted proportion of those aged 18 and older who are currently smoking, and the age-adjusted proportion of those aged 20 and older who are obese.
Results of the regression analysis are shown in the rightmost column of Table 2. In terms of the p value, the strongest among the four factors is the proportion of adults who completed high school. This probably reflects substantial impacts of socioeconomic status on health through various pathways (other than health insurance coverage, smoking, and obesity) as well as contextual effects on the health of residence in well-to-do states.
The decomposition analysis was applied to the difference in θ between Minnesota and Mississippi using the four-covariate model of Eq. (12). Splitting the difference into six intervals was sufficient to make the proportional error as low as 0.001%. Table 2 shows that about 95% of the difference is “explained” by the four factors. More than half of the difference is attributed to the proportion who completed high school, partly because of the large difference in the proportion between the two states (90.9% in Minnesota and 78.5% in Mississippi) and partly because of its relatively large regression coefficient. This analysis confirms the well-known socioeconomic effects on health and indicates that the difference between Minnesota and Mississippi is no exception.
We now try to clarify characteristics of the proposed method through a comparison with previous ones. First, as mentioned earlier, there are two fundamentally different types of decomposition analysis. The distinction, originally made descriptively by Das Gupta (1991, 1993), can be expressed more formally as follows. In the first type, the variable of interest is a function of multiple variables, that is,
and the decomposition analysis expresses a change of the dependent variable (y) as the sum of effects of its covariates (xi’s). In the second type, the variable of interest is the mean of a function y = f(x1, …, xn; t). The variables x1, . . . , xn have a joint frequency distribution in the population at t.
If the covariates are continuous variables, the mean of y can be expressed formally as
where w(x1, . . . , xn; t) is the probability density function for the joint distribution of x1, . . . , xn at t, such that ∫. . .∫ w(x1,. . . ,xn; t) dx1 . . . dxn = 1 for any t (see also Vaupel and Canudas Romo 2002). If the covariates are discrete variables, the mean value for the entire population is given by
where fj1… jn (t) is the value of y for the group characterized by n categorical attributes at t, and ki is the number of categories of the i-th attribute. Likewise, wj1… jn (t) is the proportion of the population in the group at t, and thus .
The goal of the second type of decomposition analysis is to separate the change (or difference) in into two distinct parts: a component due to changes in the functional relationship, f, and another one due to changes in the joint distribution of xi’s or ji’s. (Usually the second component is divided further into n or more subcomponents.) Thus, the second type of decomposition is fundamentally different from the first type (the case considered here) for at least two reasons. First, at a given moment t, each covariate takes a certain value in the first type but has a frequency distribution in the second type.5 Second, in the first type, the functional relationship, f, is both known and unchanging; however, in the second type, the mathematical form of the relationship is usually unknown, and the relationship between y and xi’s may vary with t. Methods of the second type of decomposition include those developed by Clogg (1978), Das Gupta (1994), Liao (1989), Vaupel and Canudas Romo (2002), and Xie (1989).
Since the proposed method belongs to the first type, we should compare it only with others of the same type. As described earlier, the method is based on the assumption that the change in y is produced by gradual changes of its covariates, but in all previous methods, the effect of a covariate is calculated as the change in y produced by a discrete change of the covariate from t1 to t2, while holding constant the other covariates at certain values. Thus, different choices of constant values of the other covariates lead to different methods, which may be grouped as discrete-change methods as opposed to the continuous-change method proposed here. We will discuss four different discrete-change approaches (labeled here as Methods A, B, C, and D) adopted in previous decomposition studies. All of them are widely applicable methods, and methods that are limited to particular dependent variables are not considered here.
In Method A (Kitagawa’s method), one of the two populations is chosen as the reference population. The effect of the i-th covariate, ci, is calculated as the change in y produced by the change in xi, while the other covariates are held at their values in the reference population. This is one of two versions of Kitagawa’s method (1955).6 For a given data set, this method can have three versions: the population at t1 (Method A1) or that at t2 (Method A2) may be selected as the reference, or results of these two decompositions may be averaged (Method A3). Usually the ci’s do not add up to y2 − y1, and the discrepancy is considered an interaction effect. Keyfitz (1968) used Method A1 for decomposing a change of the intrinsic growth rate into effects of changes in age-specific birth rates and death rates.
However, interaction effects are not only difficult to interpret, they also make the exercise unsatisfying because the purpose of a decomposition is to separate the effects of individual covariates. Method B avoids an interaction effect by changing values of covariates in a certain order. This idea is called stepwise replacement (Andreev, Shkolnikov, and Begun 2002). Effects of covariates are estimated in the order of x1, x2, . . . , xn, and unlike Method A, once the value of xi is shifted from xi(t1) to xi(t2), it remains as xi(t2) when effects of covariates that are later in the sequence are calculated. Thus, the effect on y of the first covariate, c1, is computed by changing x1 from x1(t1) to x1(t2), keeping values of the others at t1. Next, c2 is obtained by changing x2 from x2(t1) to x2(t2), keeping the first covariate now at t2 (i.e., at x1(t2)), but the others (x3, . . . , xn) still at t1. Then c3 is calculated, keeping x1 and x2 at t2 but x4, . . . , xn at t1. Proceeding in this manner, the effect of the last covariate, cn, is obtained by changing xn from xn(t1) to xn(t2), keeping values of all other covariates at t2. Thus, the value of the dependent variable is initially y1, changes in n steps, and finally reaches y2, assuring that ci’s add up exactly to y2 − y1 without any interaction term.
For a given data set, this method also has three versions: ascending order from x1 to xn (Method B1), descending order from xn to x1 (Method B2), and the average of B1 and B2 results (Method B3). This method is applicable only if the covariates can be meaningfully ordered. For example, if the covariates are age-specific death rates, they may be ordered from young to old. Thus, Arriaga’s (1984) technique for decomposing changes in life expectancy into effects of age-specific death rates follows the ascending order of age (Method B1), and Pollard’s (1988) method takes the average of young-to-old and old-to-young decompositions (Method B3).
However, it is not always possible to arrange covariates in a meaningful order. If it is impossible to select one particular sequence of the covariates, Method C (Das Gupta’s method) seems more appropriate: the stepwise replacement is carried out for each of all mathematically possible sequences (permutations), and their average is taken as the final decomposition result. Andreev et al. (2002) found that this algorithm is equivalent to the method developed by Das Gupta (1999).
Method D (the delta method) implicitly assumes continuous changes but uses a single discrete change for actual calculation. Although the partial derivative of y with respect to xi varies between t1 and t2, the varying partial derivative may be approximated by a constant partial derivative evaluated at a certain point (typically the midpoint) between t1 and t2. Then the effect of xi is estimated as the product of the particular value of y / xi and (xi(t2) − xi(t1)). Because this is an approximation, the effects do not add up to y2 − y1, and the discrepancy may be called a residual. This approach was adopted by Pullum et al. (1989) for decomposing a change in the mean completed parity into effects of parity progression ratios.
In order to understand differences among these methods further, we applied them to the same data set and compared the decomposition results. The mortality data for Japanese women in Example 1 were used, and the e0 change between 1950 and 2004 was decomposed using each method in two different ways: by examining changes in each pair of successive calendar years and then aggregating 54 decompositions across the entire period (Table 3), and by examining the difference between 1950 and 2004 without using data for calendar years between them (Table 4). In each case, effects of 105 single-year age groups were calculated and then aggregated for four broad age categories as in Table 1.7
Table 3 shows that the aggregated results of 54 annual-change decompositions using Methods A3, B3, C, D, and the continuous-change method are nearly identical. The selection of reference population in Method A (A1 and A2) and the reversal of order of stepwise replacement in Method B (B1 versus B2) make some difference, but the averaging out of those differences (A3 and B3) makes the results of Methods A and B very close to those of the other methods. Table 4 seems to suggest that if those methods are used for decomposition of a small change, they tend to produce similar results.
Table 4 shows results of a single decomposition (of the difference between 1950 and 2004) and compares them with the average of the five nearly identical results in Table 3, which may be considered good proxies of “true” effects. Differences among the methods in Table 4 are larger than those in Table 3, suggesting that the choice of decomposition method may make nonnegligible differences if applied to relatively large changes in a long period. The estimated effects differ notably between A1 and A2, and also between B1 and B2, indicating that the decomposition result may be sensitive to the selection of the reference population for Method A and the order of stepwise replacement for Method B. In terms of the index of dissimilarity, the most accurate results were produced by the continuous-change method, but the results of Methods B1, B3, C, and D seem fairly close to the “true” result as well.8
However, the comparative study in Table 4 does not necessarily suggest that the continuous-change method always produces most accurate results. For example, if actual changes follow the stepwise-replacement scenario more closely than the proportional-change scenario, results of Method B should be more accurate than those of the continuous-change method, if the right sequence of covariates is chosen.
The discrete-change decomposition methods can be interpreted in terms of the line integral model. Method B can be considered as a special version of the continuous-change decomposition method, with the assumption that the point in n-dimensional space (as defined by vector x) follows a stepwise trajectory with n − 1 orthogonal turns: first, the point moves from its initial location along the x1 axis, then turns perpendicularly and moves along the x2 axis, and so on, until it reaches its final location by moving along the xn axis. Method C is the average of results for all possible stepwise paths. Methods A and D can be considered as attempts to evaluate the line integral by replacing the function that yields the dependent variable with an additive and a linear approximation, respectively. The error of approximation is regarded as the interaction effect or the residual. No assumption about the trajectory is needed for Methods A and D because if the function is additive or linear, n components of the line integral can be evaluated from information on the two end points only.
In this article, we proposed a method for decomposing a change or difference in a function of multiple variables. The method relies on the assumption that covariates change gradually along an actual or hypothetical dimension and the dependent variable is a differentiable function of the covariates. It has a few major theoretical and practical advantages, as summarized below.
The proposed method is based on a mathematical model (the line integral model of decomposition) that justifies the additivity of covariate effects and the elimination of interaction effects. In decomposition analysis, the effects are assumed to be additive, even though the dependent variable is usually a nonadditive function of its covariates. In the previous literature, it was not fully clear whether this apparent paradox was logically justifiable. The line integral model provides a theoretical foundation for decomposition analysis.
The model also implies that interaction terms should be eliminated, not merely because they complicate decomposition results, but because in a model of continuous change, they do not exist. From this viewpoint, interaction effects in previous decomposition studies may be regarded as the result of incomplete information about patterns of change between observation points.
A few general methods for decomposing a change or difference in a multivariate function were developed previously, but they have some practical limitations. Method A (Kitagawa’s method) produces an interaction term, Method D (delta method) produces a residual term, and both of the terms are not easily interpretable. Method B (stepwise replacement) should not be used if covariates cannot be ordered in a meaningful sequence. Furthermore, a logically meaningful sequence is not necessarily justifiable as an appropriate order of stepwise replacement: for example, vital rates at younger ages do not necessarily tend to change earlier (or later) than those at older ages. Method C (Das Gupta’s method) is based on permutations of the covariates, which may require an astronomical amount of memory and computation if the number of covariates is large.9
The proposed method has none of these limitations. It does not have an interaction term or a nonnegligible residual term, nor does it require a meaningful ordering of covariates. It can easily handle data with many covariates because the amount of computation increases linearly with the number of covariates, not geometrically or in proportion to the number of their permutations (see the Appendix for more details of computation amount).
Its major difference from previous methods is the assumption that covariates change gradually along an actual or hypothetical dimension. This assumption fits some decomposition problems very naturally. For example, this method seems highly appropriate for decomposing time trends if relevant variables can be reasonably assumed to change gradually over time. On the other hand, if some covariates actually change in noticeably discrete manners, the assumption is not compatible with reality. This could be a limitation of the proposed method in certain cases. However, although vital events (such as birth and death) are discrete changes at the individual level, many of the corresponding measures at the aggregate level (such as birth rates and death rates) can be reasonably approximated as continuous variables.
The method requires an additional assumption about the trajectory of changes between the two data points. Recommended as the “default” is the straight-line path—that is, the assumption that increments of the covariates are proportional to each other. The validity of this assumption should vary among research subjects and data.
In addition, our empirical results (Tables 3 and and4)4) suggest (as seems logically reasonable) that it is better to aggregate decomposition results for relatively short time intervals than to carry out one decomposition for the entire period if data for some intermediate time points in the decomposition period are available.
We are grateful to Juha Alho, Ronald Lee, and the anonymous reviewers for comments on earlier versions of this article. Joel E. Cohen gave us a useful technical suggestion. Supplementary documents for this article (including sensitivity analyses, additional examples, and a MATLAB program) are available online at http://www.demog.berkeley.edu/~jrw/Papers/decomp.suppl.pdf.
The right side of Eq. (5) can be approximated by numerical integration. For each covariate xi, the range between xi1 and xi2 is divided into N intervals of equal length:
In order to change the value of xi in the k-th interval while keeping the others constant, we define two vectors,
where xik+ = xi1 + kΔxi, xik• = xi1 + (k − 0.5)Δxi, and xik− = xi1 + (k −1)Δxi. xik+, xik•, and xik− are values of xi at the end, midpoint, and beginning, respectively, of the k-th interval. Note that a change from xik− to xik+ means that xi moves from the beginning to the end of the k-th interval, but the other covariates remain at the midpoint of the interval.
If N is large, we have
This method is computationally intensive. For Example 1, with N = 20, a complete life table had to be constructed 113,455 times ((20 intervals × 105 variables × 54 period-pairs) + 55 periods). Nevertheless, it took only about 8 minutes of CPU time for the MATLAB 6.1 program on a PC with 1.8 GHz, 256 MB RDRAM, and 384 MB of virtual memory to carry out the entire calculation. The MATLAB function of the proposed method is available from the first author upon request.
This research was supported by Grant R01-AG11552 from the National Institute on Aging.
1.The Divisia decomposition (Divisia 1925), which is widely used in economics for analyzing changes in monetary aggregates, may be considered a simple case of this approach.
2.We conducted sensitivity analyses with two sets of empirical data and found that the decomposition results were reasonably insensitive to deviations from the proportionality assumptions. Details of the sensitivity analyses are given online at http://www.demog.berkeley.edu/~jrw/Papers/decomp.suppl.pdf.
3.Two other examples—a decomposition of changes in the intrinsic growth rate in Sweden into the effects due to changes in age-specific death rates and a decomposition of sex difference in the life expectancy of fruit flies into the effects of the logistic model parameters—are shown online at http://www.demog.berkeley.edu/~jrw/Papers/decomp.suppl.pdf.
4.Lee and Carter (1992) adopted the same assumption in their model of mortality change.
5.The original method by Kitagawa (1955), which decomposes a difference in the proportion of those who have a characteristic of interest, belongs to both of the types. It is a special case of the second type in which there is only one covariate that has a frequency distribution among n categories at t. Thus, Eq. (15) becomes a simple form, , where wj(t) is the proportion in the j-th group. This can also be viewed as Eq. (13) with 2n covariates, (t) = g (f1(t), . . . ,fn(t), w1(t), . . . , wn(t)) that is, as a special case of the first type.
6.The other version, which does not include an interaction term, can be considered a special case of Method B3 and also Method C.
7.An exception was necessary in the case of Method C. To apply this method in the standard fashion would have required computing life tables while making sequential changes in all possible permutations of 105 single-year age groups. Clearly, the computational demands of such an exercise are overwhelming. As a practical alternative, changes were introduced for all ages simultaneously within one of the four broad age groups, and life tables were computed (as for the other methods) using single-year data. Thus, this adaptation of the method took into consideration all possible orderings of changes for four broad age groups, rather than for 105 single-year age groups.
8.The index of dissimilarity was calculated using the broad age categories in Table 4.
9.Our MATLAB 6.1 program of Method C on a PC with 1.8 GHz, 256 MB RDRAM, and 384 MB of virtual memory worked with nine or fewer covariates, but not with 10 or more covariates because of insufficient memory. Although more sophisticated programming and enlarged virtual memory will increase the possible number of covariates, it may be difficult to use Method C even for a problem of modest size (e.g., decomposing a difference in an overall demographic measure into effects of about 20 vital rates for five-year age groups).