Home | About | Journals | Submit | Contact Us | Français |

**|**Demography**|**v.46(3); 2009 August**|**PMC2831344

Formats

Article sections

Authors

Related links

Demography. 2009 August; 46(3): 429–449.

PMCID: PMC2831344

ALBERT CHEVAN, University of Massachusetts, Amherst, MA 01003; e-mail:ude.ssamu.cos@navehc.

Copyright © 2009 Population Association of America

This article has been cited by other articles in PMC.

Standardization and decomposition are established and widely used demographic techniques for comparing rates and means between groups with differences in composition. The difference in rates and means has heretofore been resolved in terms of the contribution of variables to compositional effects for each variable and an overall rate effect. This study demonstrates that the resolution of differences is attainable at the categorical level for both compositional effects and rate effects. Refinements to Das Gupta’s equations yield a complete decomposition because of the additivity of categorical compositional and rate effects. Other refinements allow the decomposition of polytomous variables. Extensions to the method provide for the decomposition of the standard deviation and the multivariate index of dissimilarity.

Standardization has traditionally been applied when rate comparisons for groups are confounded by substantial group differences in composition. It is an axiom of demography that what appears to be a difference in rates between groups may be due to differences in population structure. Some form of standardization has been the method of choice used for adjusting rates in such a situation. Decomposition takes standardization a step further by allocating the rate difference into components. The refinements and extensions we propose are based on, and dovetail with, Das Gupta’s (1989, 1991, 1993, 1994) method of decomposing a difference in rates between two or more groups. Das Gupta developed three decomposition models. Only the model using cross-classified data is of concern here. The other models are distinguished by controlling for scalar factors in the case of one model and vectors of continuous variables for the other. The cross-classification model is specifically designed to decompose the difference in rates between groups into the effects of rate and composition differences. For the cross-classification model, the effects found have heretofore extended only to the level of variables as a whole. This article refines the expression of effects to reveal the contribution of categories of compositional variables to the difference between groups in the measure undergoing decomposition.

Our refinements to the cross-classification model make explicit what is implicit in Das Gupta’s formulation and thereby reveal considerably more about the source of a difference in rates than has previously been available. The proposed refinements add depth to the decomposition of a difference. Additionally, we expand the scope of decomposable measures by applying the cross-classification model to the decomposition of the standard deviation and the multivariate index of dissimilarity. These extensions allow questions to be addressed that have been difficult, if not impossible, to otherwise entertain.

Das Gupta’s model for cross-classified data is based on several incremental methodological developments in standardization and decomposition. Decomposition of cross-classified data has been formally linked to rate standardization since Kitagawa’s (1955) seminal paper showed how a difference between the rates of two groups could be divided into a rate component and a composition component. Prior to Kitagawa, there were several earlier attempts at rate decomposition. Wolfbein and Jaffe (1946) and Jaffe (1951) demonstrated how standardization could be used to remove the influence of differences in occurrence rates by standardizing for differences in such rates rather than for differences in composition. Their efforts included an attempt at decomposition using standardization for compositional differences and rate differences together. Kitagawa’s signal contribution was to combine standardization for compositional differences with standardization for rate differences into a single equation. This equation allowed the simultaneous identification of separate but additive composition and rate components that summed to the rate difference.

When used with more than one compositional variable, Kitagawa’s method generates an awkward joint or interaction component. The method is untenable for more than two variables because of the proliferation of interactions with categorical data. Durand (1948) used a decomposition, attributed to Edwin Goldfield, in which multiple standardization was followed by the allocation of the interaction component to variables. The resolution of the interaction problem lay in finding appropriate equations that did not involve interaction terms. Das Gupta took a symmetric approach to the interaction component and developed an algebraic solution that distributed interactions equally among the cross-classified variables. Cho and Retherford (1973) and Kim and Strobino (1984) provided alternative decompositions. Both decompositions are based on hierarchical strategies in which the results are conditioned by the order of entry of the variables into the decomposition. A nonsymmetric approach may be used with data in which one variable is logically prior to the other. However, this approach does not lend itself to more unambiguous decompositions when more than two variables are involved. Under some circumstances, nonsymmetric decompositions may be an alternative to symmetric decompositions. Kim and Strobino anticipated our efforts and demonstrated that their decomposition could be used to attribute effects to categories of variables, but Das Gupta’s work did not benefit from this insight.

Vaupel and Canudas-Romo (2002) and Canudas-Romo (2003) developed an elegant method that decomposes the rate of change for demographic measures. With its focus on the rate of change rather than the absolute change, their method has some of the same capabilities as the method we propose and yields similar, although not equal, results. Those features include the control of composition factors and multidimensional and categorical decomposition. However, because their method uses calculus to solve for the rate of change, the solutions are approximations and there is a slight closure problem: the rate and composition components don’t always sum to the exact difference in rates between groups. The method was proposed and demonstrated for the decomposition of change over time, but it could be adapted to the difference between groups. Wang et al. (2000) contributed to the enhancement of decomposition methods stemming from Das Gupta’s work by developing tests of significance for decomposed rates using bootstrapping techniques to estimate standard errors.

By taking as weights the average cell composition and the average cell rate, Das Gupta achieved a method that yielded standardized rates for each group. These weights were applied in combinations of categories that standardize the data and isolate the effect of each variable in a multivariate decomposition. Composition coefficients are central to the calculation of effects. The number of equations to be solved differs with the number of compositional variables being considered. As variables are added, the equations become progressively more complex because of the proliferation of relationships between variables. When the standardized rates of one group are subtracted from the standardized rates for the other and these differences are summed, the resulting sum matches the difference between the crude rates of each group.

Das Gupta’s cross-classification model provides a decomposition of a measure into two components: a composition effect and a rate effect. A composition effect is developed for each variable, and a single rate effect is developed for all variables taken together. Das Gupta’s symmetric approach to interactions makes possible the refinements we propose. Decomposition of a measure’s difference between two groups into a composition effect and a rate effect involves a slight, but critical modification to several of Das Gupta’s equations.

Two refinements are proposed: decomposition by categories of the composition variables and decomposition of polytomous response variables.

For illustrative purposes, we use two composition variables, *I* and *J*, and assume we are decomposing the difference in rates between two groups. Cross-classified rates and population counts for the cells of the cross-classification are necessary for this task. We adopt Das Gupta’s convention of using upper- and lowercase letters to identify each group. Eq. (1), taken from Das Gupta (1993), expresses the difference between the crude rate of two groups, *t*.. and *T*.., as a sum of a rate effect and the composition effect for each variable:

$$t..\hspace{0.17em}-\hspace{0.17em}T..\hspace{0.17em}=\hspace{0.17em}R-\text{effect}\hspace{0.17em}+\hspace{0.17em}I-\text{effect}\hspace{0.17em}+\hspace{0.17em}J-\text{effect}$$

(1)

*I* and *J* define the composition effects for variables *I* and *J*, and *R* is the rate effect that applies equally to both variables.

The *I* and *J* composition effects of (1) are developed from differences in rates standardized with composition coefficients.

$$t\mathrm{..}\hspace{0.17em}-\hspace{0.17em}T\mathrm{..}\hspace{0.17em}=\hspace{0.17em}\left[R\left(\overline{t}\right)-R\left(\overline{T}\right)\right]\hspace{0.17em}+\hspace{0.17em}\left[I\left(\overline{a}\right)\hspace{0.17em}-\hspace{0.17em}I\left(\overline{A}\right)\right]\hspace{0.17em}+\hspace{0.17em}\left[J\left(\overline{b}\right)\hspace{0.17em}-\hspace{0.17em}J\left(\overline{B}\right)\right]$$

(2)

The composition coefficients in (2), *ā*, *Ā*, , and are from Das Gupta (1993). These coefficients are specific to joint categories of the variables in the decomposition and together with the average weights accomplish the standardization. Our modification is to add subscripts to each effect. These subscripts allow us to track not only the composition effect of variables in the manner of Das Gupta but also the composition effect of each variable’s categories. For example, the standardized *I* (*Ā*) effect in (2) becomes *I* (*Ā*)* _{i}*. for each category of

$$I\left(\overline{A}\right)\hspace{0.17em}=\hspace{0.17em}\sum _{ij}\frac{{t}_{ij}\hspace{0.17em}+\hspace{0.17em}{T}_{ij}}{2}\hspace{0.17em}\frac{{b}_{ij}\hspace{0.17em}+\hspace{0.17em}{B}_{ij}}{2}\hspace{0.17em}{A}_{ij}$$

(3)

With the addition of subscripts (3) becomes

$$I{\left(\overline{A}\right)}_{i.}=\hspace{0.17em}\sum _{j}\frac{{t}_{ij}\hspace{0.17em}+\hspace{0.17em}{T}_{ij}}{2}\hspace{0.17em}\frac{{b}_{ij}\hspace{0.17em}+\hspace{0.17em}{B}_{ij}}{2}\hspace{0.17em}{A}_{ij}$$

(4)

while the *J*() effect in (2) is expressed as

$$J{\left(\overline{B}\right)}_{.j}\hspace{0.17em}=\hspace{0.17em}\sum _{i}\frac{{t}_{ij}\hspace{0.17em}+\hspace{0.17em}{T}_{ij}}{2}\hspace{0.17em}\frac{{a}_{ij}\hspace{0.17em}+\hspace{0.17em}{A}_{ij}}{2}\hspace{0.17em}{B}_{ij}$$

(5)

In a sense, the introduction of subscripts to Das Gupta’s formulation creates a secondary decomposition because

$$I\left(\overline{A}\right)\hspace{0.17em}=\hspace{0.17em}\sum _{i.}I{\left(\overline{A}\right)}_{i.}\hspace{0.17em}\text{and}\hspace{0.17em}J\left(\overline{B}\right)\hspace{0.17em}=\hspace{0.17em}\sum _{.j}J{\left(\overline{B}\right)}_{.j}$$

(6)

Das Gupta (1993) calculated the standardized rate as

$$R\left(\overline{T}\right)\hspace{0.17em}=\hspace{0.17em}\sum _{ij}\frac{\frac{{n}_{ij}}{n\mathrm{..}}+\frac{{N}_{ij}}{N\mathrm{..}}}{2}{T}_{ij}$$

(7)

As in (4) and (5), we add subscripts but now use them to track the standardized rates of categories for variables *I* and *J*:

$$R{\left(\overline{T}\right)}_{i.}\hspace{0.17em}=\hspace{0.17em}\sum _{j}\frac{\frac{{n}_{ij}}{n\mathrm{..}}+\frac{{N}_{ij}}{N\mathrm{..}}}{2}{T}_{ij}\frac{1}{NV}$$

(8)

and

$$R{\left(\overline{T}\right)}_{.j}\hspace{0.17em}=\hspace{0.17em}\sum _{i}\frac{\frac{{n}_{ij}}{n\mathrm{..}}+\frac{{N}_{ij}}{N\mathrm{..}}}{2}{T}_{ij}\frac{1}{NV}$$

(9)

Within each group, the unstandardized rate of the variables are equal because

$$T\mathrm{..}\hspace{0.17em}=\hspace{0.17em}\sum _{i.}\frac{{T}_{i.}{N}_{i.}}{N\mathrm{..}}\hspace{0.17em}=\hspace{0.17em}\sum _{.j}\frac{{T}_{.j}{N}_{.j}}{N\mathrm{..}}\hspace{0.17em}=\hspace{0.17em}\sum _{ij}\frac{{T}_{ij}{N}_{ij}}{N\mathrm{..}}$$

(10)

Hence, the sums of the standardized category rates in (8) and (9) are also equal:

$$\sum _{i}R{\left(\overline{T}\right)}_{i.}\hspace{0.17em}=\hspace{0.17em}\sum _{j}R{\left(\overline{T}\right)}_{.j}$$

(11)

In words, standardized category rates are generally unequal, but (11) says the sums of the category rates are equal for variables *I* and *J*. Within each variable, the sum for each category reflects the effect of that category relative to other categories of the variable. The effect of a category on the overall rate is established by scaling its relative effect by the reciprocal of the number of composition variables (*NV*) in the decomposition. Dividing the rate effect equally between variables is comparable to the course taken by Das Gupta in dividing Kitagawa’s joint effect equally among variables. The sums of the standardized rates are the same for each variable. Composition and rate effects for categories are additive and yield the total category effect (*CE*) in (12) and (13).

$$C{E}_{i.}\hspace{0.17em}=\hspace{0.17em}I{\left(\overline{A}\right)}_{i.}\hspace{0.17em}+\hspace{0.17em}R{\left(\overline{T}\right)}_{i.}$$

(12)

$$C{E}_{.j}\hspace{0.17em}=\hspace{0.17em}J{\left(\overline{B}\right)}_{.j}\hspace{0.17em}+\hspace{0.17em}R{\left(\overline{T}\right)}_{.j}$$

(13)

The sums of the category effects across all variables equal the difference in rates between groups, as shown in (14).

$$t\mathrm{..}\hspace{0.17em}-\hspace{0.17em}T\mathrm{..}\hspace{0.17em}=\hspace{0.17em}\left(\sum _{i.}c{e}_{i.}\hspace{0.17em}+\hspace{0.17em}\sum _{.j}c{e}_{.j}\right)\hspace{0.17em}-\hspace{0.17em}\left(\sum _{i.}C{E}_{i.}\hspace{0.17em}+\hspace{0.17em}\sum _{.j}C{E}_{.j}\right)$$

(14)

For the period 1970–1985, Lichter and Costanzo (1987) used a decomposition of change in the labor force participation rates among the female population^{1} to establish the extent to which compositional shifts in four variables accounted for the change in crude rates. The four variables were fertility as measured by the number of children under age 18 present in the family, marital status, educational attainment, and age structure. Lichter and Costanzo argued that compositional changes, such as declines in the number of children in families, the rise in never-married women, increased educational attainment, and the aging of the baby boomers, lay behind much of the increase in the labor force rate for women ages 25–49. Based on their results, they conjectured that the female labor force rate would remain high because it would be supported by future demographic changes. Table 1 displays the percentage distribution for the categories of each variable and the marginal labor force participation rates for the categories. Data for the decomposition consist of 3 × 3 × 3 × 3 tables of jointly classified population counts and rates in each year.

Percentage Distribution and Labor Force Participation Rates of Women for Selected Characteristics, 1970 and 1985

Labor force participation rates underwent large increases in all categories at the same time as the distribution of women within each variable shifted to categories that had the highest levels of participation. Table 2 incorporates the proposed categorical refinements into a replication and categorical extension of the Lichter and Costanzo decomposition.

Rate increases and compositional changes both contributed to the overall increase in the labor force participation rate, with rate increases contributing somewhat more than compositional changes. The change in the distribution of the number of children under age 18 had the largest compositional effect among the four variables, and changes in age distribution had the smallest effect. These results were noted by Lichter and Costanzo. Decomposition by category reveals many more details about the sources of change than are gained through attribution by variables as a whole. The decomposition in Table 2 may be considered a transformation of the changes shown in Table 1 into a different and consistent metric. Thus, category increases in Table 1 are represented by positive effects in Table 2, and category declines are represented by negative effects. For example, the very large increase in the percentage of women with more than 12 years of education is realized as the largest positive effect, 12.86, in Table 2. Similarly, the large decrease in the percentage of married women is realized as a large negative effect, −9.33. Unless there is no change in the distribution of a variable’s categories, there will usually be positive and negative composition effects in the categories of the variable. Standardized composition values for a category are obtained by holding constant the labor force participation rates at the average rate for 1970 and 1985 and the average distribution of all variables other than the variable to which a category belongs. This means that each standardized category value in Table 2 is obtained from a summation over 27 cross-classified data cells. The interpretation of category effects is similar to the interpretation of variable effects: category effects are the amount by which the difference in crude rates is increased or decreased from an initial observed value.

Standardized rate values for a category are obtained by holding constant the distribution of all variables while allowing the rates to vary. Category effects for rates are most useful when employed in a comparative sense, principally because their absolute size is partly determined by the number of variables in the decomposition. Thus, after standardization, married women had the largest contribution to the increase in labor force participation rates stemming from a rate or behavioral change, as shown in Table 2. If left to choose the category with the largest influence without the aid of the categorical refinement, we would probably choose one of the categories from Table 1 that had a large crude rate increase between 1970 and 1985. Rate effects and composition effects for variables and categories are additive. The combination of the effect of the increase in the number of women with more than 12 years of schooling and the increase in the rate of participation for these women accounted for more than 60%, (12.86 + 1.39) / 23.11 × 100, of the change in the crude rate.

Das Gupta (1991) provided a method of decomposing differences in rates among three or more groups. His approach resolved the problem of internal inconsistency that arises when taking pairwise group decompositions. These inconsistencies appear when decomposing time-series data. The method outlined above may be applied to tracking the effects of categories among more than two groups.

Decomposition has been restricted to rates, means, and percentages with the methods of Kitagawa and Das Gupta. Polytomous response variables have been overlooked as candidates for decomposition. The focus on response variables as indivisible entities in the form of rates and means parallels the focus on composition variables as a whole. Standardization and decomposition of a polytomous response variable is a refinement that can be readily accomplished within the Das Gupta framework and requires little more than the simultaneous use of all categories of the polytomous variable in the decomposition. The goal of such a decomposition is to demonstrate how the distribution of the response variable changed over time or differed between two groups. Results from the decomposition of a polytomous response variable are couched in terms of the effects of composition changes or differences across groups and the effects of shifts or differences in the propensity to occupy the various categories of the response variable. In proposing the decomposition of polytomous variables, we offer an orientation to data analysis rather than a new method.

Standardization and decomposition of a polytomous variable is based on the percent each data cell is of the total cases with the same composition characteristics within a group. In addition to the *i* and *j* subscripts representing the composition variables, a subscript, *k*, is needed to represent the categories of the response variable. For each data cell, *N _{ijk}* and

$${T}_{ijk}\hspace{0.17em}=\hspace{0.17em}\frac{{N}_{ijk}}{\sum {N}_{ij.}}100$$

(15)

These percentages are substituted for rates in (4), (5), (8), and (9). The number of distributions generated with the same coding on variables *I* and *J* is the product of the number of categories in each variable. A separate standardization and decomposition is conducted for each of the *k* categories of the polytomous response variable. An estimate of the contribution of a composition variable category to the difference between groups is obtained from the standardization and decomposition of the response variable’s categories into composition effects and percentage distribution effects. Composition effects for a category are found by summing across the decomposed response categories. Some categories of a composition variable will have positive composition effects, and others will have negative composition effects. Within each composition variable, the sum of the composition effects across response categories always sum to zero because the percentage distributions in each group, the *T _{ijk}* and

In Tables 3, ,4,4, and and5,5, we demonstrate the decomposition of the substantial changes in the distribution of marital status in the United States between 1950 and 2000. We use six categories of marital status cross-classified by two compositional variables, age and sex. Data entering the decomposition are shown in Table 3, along with tabulations of the age and sex distribution for 1950 and 2000. Change in the sex distribution was slight compared with change in the age distribution, which reflected the drop in fertility and the aging of the population. Panel a of Table 4 contains a tabulation of the distribution of marital status in 1950 and 2000. There was a sharp decline in the percentage married and a smaller decline in the percentage widowed. The other marital statuses experienced increases, particularly divorced and never married. Panel b of Table 4 provides a complete standardization and decomposition of the married category.

In comparison with the total percentage distribution effects (−12.64), the total compositional effects are small (0.02). These are the only effects that would be observed in a customary decomposition analysis. However, a finding of an overall small compositional effect is misleading because the compositional age effects for two categories are quite large. Negative effects for those below age 30 are balanced by positive effects at other ages. Percentage distribution effects are also large at the younger ages and indicate that the change in the percentage married declined primarily among the young. Increases in joint survivorship probably account for the positive effects at ages 60 and older. Compositional contributions by males and females to the change in the percentage married are moderate and opposite in sign, while percentage distribution effects are equal. The decomposition in panel b of Table 4 is abstracted to Table 5, along with parallel decompositions of the other five marital statuses.

Decomposition can provide a comprehensive view of the sources of change in the categories of a polytomous variable. The column labeled “total” for composition effects for age in Table 5 indicates that although all age groups contributed to changes in the marital status distribution, the lower and upper extremes of the age distribution accounted for a major share of changes attributable to compositional shifts. The graying of the population influenced all categories of the marital status distribution, but most particularly among the married and widowed.

Composition and percentage distribution effects may be summed across variables. These summations are shown as sources of change in Table 5 and indicate the contribution of each source to the shift away from marriage to never marrying or divorcing. Change in each response category is equal to the sum of the composition and percentage distribution effects. Changes due to shifts in the percentage distribution made a larger contribution than changes in composition for all marital statuses.

Decomposition of measures other than rates, means, and percentages is feasible when the measures at the group level are the weighted sum of a measure’s cross-classified values. It is this attribute that allows rates, means, and percentages to be decomposed. Shorrocks (1980) described a similar decomposition criterion for measures of inequality. The formal definition of a decomposable measure is given in (16). *T* represents the measure, *P* is the weight, and *i* and *j* are subscripts that identify the cross-classified data cells for *P* and *T*.

$$T\mathrm{..}\hspace{0.17em}=\hspace{0.17em}\sum _{ij}{P}_{ij}{T}_{ij}$$

(16)

*P _{ij}* is commonly a proportion of the total group population and carries the composition component of a decomposition. It may sometimes be an average weight across the two groups involved in the decomposition. Measures consisting of the sum of more than one term are decomposable provided that

The difference between two standard deviations has two components: the customary composition effect and a dispersion effect. Decomposition is accomplished by shifting from focusing directly on the standard deviation to using the building blocks of the standard deviation: population counts, which are the source of the composition effect, and sums of squares, which are the source of the dispersion effect. Decomposition as stated in (16) requires additivity of the decomposed measure, and although the standard deviation is not additive, sums of squares and population counts are additive. Each cross-classified cell contributes data based on three ratio scale measures: the sum of squares within the cell (*SSW _{ij}*), the cell mean (

The size of a sum of squares is determined by the dispersion of cases about the mean and the number of cases in a population. When modified versions of Eqs. (4) and (5) are used in the decomposition, sums of squares are averaged across groups. To avoid giving undue influence to one group when this averaging occurs, the number of cases is made equal for each group. This may be accomplished either by creating an average *N* or by using the size of one group as the standard and adjusting the other group to that standard. The former procedure requires adjusting the cell values for both groups, while the latter requires adjusting the cell values of only the nonstandard group. A constant (*M*) is created from the ratio of the standard group size (*N*..) to the nonstandard group size (*n*..):

$$M\hspace{0.17em}=\hspace{0.17em}\frac{N\mathrm{..}}{n\mathrm{..}}$$

(17)

All *ssw _{ij}* and

Within each group, values for the *SSW _{ij}* are used to calculate an estimate of the total sum of squares (

$$SS{T}_{ij}\hspace{0.17em}=\hspace{0.17em}SS{W}_{ij}\hspace{0.17em}+\hspace{0.17em}SS{B}_{ij}$$

(18)

The between sum of squares for a cell is found by evaluating (19):

$$SS{B}_{ij}\hspace{0.17em}=\hspace{0.17em}{\left(\overline{X}\mathrm{..}\hspace{0.17em}-\hspace{0.17em}{\overline{X}}_{ij}\right)}^{2}\hspace{0.17em}\times \hspace{0.17em}{N}_{ij}$$

(19)

The total sum of squares is standardized for the composition effects of variables *I* and *J* with equations analogous to (4) and (5):

$$I{\left(\overline{SST\left(A\right)}\right)}_{i.}\hspace{0.17em}=\hspace{0.17em}\sum _{ij}\frac{\frac{SS{T}_{ij}}{SST\mathrm{..}}\hspace{0.17em}+\hspace{0.17em}\frac{ss{t}_{ij}}{sst\mathrm{..}}}{2}\hspace{0.17em}\frac{{b}_{ij}\hspace{0.17em}+\hspace{0.17em}{B}_{ij}}{2}\hspace{0.17em}{A}_{ij}$$

(20)

$$J{\left(\overline{SST\left(B\right)}\right)}_{.j}\hspace{0.17em}=\hspace{0.17em}\sum _{ij}\frac{\frac{SS{T}_{ij}}{SST\mathrm{..}}\hspace{0.17em}+\hspace{0.17em}\frac{ss{t}_{ij}}{sst\mathrm{..}}}{2}\hspace{0.17em}\frac{{a}_{ij}\hspace{0.17em}+\hspace{0.17em}{A}_{ij}}{2}\hspace{0.17em}{B}_{ij}$$

(21)

Standardization for dispersion effects for *I* and *J* is realized with equations similar to (8) and (9):

$$R{\left(\overline{SST}\right)}_{i.}\hspace{0.17em}=\hspace{0.17em}\sum _{ij}\frac{\frac{{n}_{ij}}{n\mathrm{..}}\hspace{0.17em}+\hspace{0.17em}\frac{{N}_{ij}}{N\mathrm{..}}}{2}\hspace{0.17em}\frac{SS{T}_{ij}}{SST\mathrm{..}}\hspace{0.17em}\frac{1}{NV}$$

(22)

$$R{\left(\overline{SST}\right)}_{.j}\hspace{0.17em}=\hspace{0.17em}\sum _{ij}\frac{\frac{{n}_{ij}}{n\mathrm{..}}\hspace{0.17em}+\hspace{0.17em}\frac{{N}_{ij}}{N\mathrm{..}}}{2}\hspace{0.17em}\frac{SS{T}_{ij}}{SST\mathrm{..}}\hspace{0.17em}\frac{1}{NV}$$

(23)

The final task is to use the standardized composition sum of squares and the standardized dispersion sum of squares to produce composition and dispersion effects. This is achieved by weighting the standard deviation for a group by the proportion each category effect is of *TE*, which is the total of all composition and dispersion effects for that group. *TE* equals

$$TE\hspace{0.17em}={\sum _{i.}I\left(\overline{SST\left(A\right)}\right)}_{i.}\hspace{0.17em}+\hspace{0.17em}{\sum _{.j}J\left(\overline{SST\left(B\right)}\right)}_{.j}\hspace{0.17em}+\hspace{0.17em}\sum _{i.}R{\left(\overline{SST}\right)}_{i.}\hspace{0.17em}+\hspace{0.17em}\sum _{.j}R{\left(\overline{SST}\right)}_{.j}$$

(24)

Establish *te* for group 2 in a similar manner.

Composition effects of the standard deviation for a category of variable *I*, *I*(*SD*)* _{i}*., equal

$$I{\left(\overline{SD}\right)}_{i.}\hspace{0.17em}=\hspace{0.17em}\sqrt{\frac{SST\mathrm{..}}{N\mathrm{..}}\hspace{0.17em}{\left(\frac{I{\left(\overline{SST\left(A\right)}\right)}_{i.}}{TE}\right)}^{2}}$$

(25)

This definition allows us to transform the standardized sums of squares into composition effects of the standard deviation. Similarly, substituting *B* for *A* in (25) defines the compositional contribution to the standard deviation of variable *J*.

Dispersion effects of the standard deviation for a category of variable *I*, *R*(*SD*)* _{i}*., mimic (25). That is,

$$R{\left(\overline{SD}\right)}_{i.}\hspace{0.17em}=\hspace{0.17em}\sqrt{\frac{SST\mathrm{..}}{N\mathrm{..}}\hspace{0.17em}{\left(\frac{R{\left(\overline{SST}\right)}_{i.}}{TE}\right)}^{2}}$$

(26)

Again, there is a companion equation of variable *J*’s dispersion effects.

The upper panel of Table 6 contains the means and standard deviations of annual wage and salary income of persons employed 50 or more weeks in the periods 1969–1971 and 1999–2001. Weighted data are from the Current Population Survey and are presented for each period for four educational categories and by sex after the income for each year is adjusted to the cost of living in 1999. Percentage distributions for each variable are included in Table 6 and indicate that there were substantial changes in the educational and sex composition of wage and salary earners during the 30-year period. While mean income increased by less than 10%, the standard deviation of income increased by two-thirds. The bottom panel of Table 6 holds the data needed for a decomposition of the standard deviation.

Table 7 displays the decomposition of the change in the standard deviation over the 30-year period. Compositional changes account for almost three-quarters of the change in the standard deviation. Income dispersion would have been greater if not for the small contributions of those who did not attend college. A single category of persons, those with a college degree, is responsible for more than half of the increase in income dispersion. Males, independently of education, also made a substantial contribution.

Das Gupta produced two decompositions of the index of dissimilarity: one for cross-classified data and the other for vector data. In a research note, Das Gupta (1987) critically appraised Bianchi and Rytina’s (1986) use of an index of dissimilarity with cross-classified data. Das Gupta showed how the interaction term that was part of Bianchi and Rytina’s decomposition could be eliminated by formulating the index decomposition in an equation that produced standardized indexes for two groups. Only one variable—a distributional variable, such as occupation or census tract, over which the index is calculated—could be used in the method Das Gupta provided. By building on Eqs. (3)–(10), we can convert Das Gupta’s univariate decomposition to a multivariate decomposition of the index of dissimilarity. In the definitions that follow, the first subscript represents the distributional variable and the second represents the groups, such as males and females or whites and blacks, whose distributions are being compared. Additional composition variables may be added as needed:

$$N\mathrm{..}\hspace{0.17em}=\hspace{0.17em}\sum _{ij}{N}_{ij}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}{N}_{i.}\hspace{0.17em}=\hspace{0.17em}\sum _{j}{N}_{ij}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}{P}_{.j}\hspace{0.17em}=\hspace{0.17em}\sum _{i}{P}_{ij}$$

The method uses several summations and proportions. The proportion (*P*_{i1}) that the first category of the group variable is of the *i*th category of the distributional variable is

$${P}_{i1}\hspace{0.17em}=\hspace{0.17em}{N}_{i1}/{N}_{i}$$

The proportion that the first category of the group variable is of the group size is

$${P}_{.1}\hspace{0.17em}=\hspace{0.17em}{N}_{.1}/N..$$

and the proportion that the *ij*th cell is of the group size is

$${P}_{ij}\hspace{0.17em}=\hspace{0.17em}{N}_{ij}/N..$$

*A* is the composition coefficient for the distributional variable and is defined as

$${A}_{ij}\hspace{0.17em}=\hspace{0.17em}{P}_{ij}/{P}_{.j}\hspace{0.17em}\times \hspace{0.17em}{P}_{i.}/P..$$

(27)

and *B* is the composition coefficient for the group variable and is defined as

$${B}_{ij}\hspace{0.17em}=\hspace{0.17em}{P}_{ij}/{P}_{i.}\hspace{0.17em}\times \hspace{0.17em}{P}_{.j}/P..$$

(28)

The contribution of the *ij* data cell to the index of dissimilarity, *D _{ij}*, is calculated from the proportions as

$${D}_{ij}\hspace{0.17em}=\hspace{0.17em}\left|{P}_{i1}/{P}_{.1}\hspace{0.17em}-\hspace{0.17em}\left(1-{P}_{i1}\right)/\left(1-{P}_{.1}\right)\right|$$

(29)

and the index equivalents of (3), (4), and (5) for the composition components are

$$I\left(\overline{A}\right)\hspace{0.17em}=\hspace{0.17em}\sum _{ij}\frac{{d}_{ij}\hspace{0.17em}+\hspace{0.17em}{D}_{ij}}{2}\hspace{0.17em}\frac{{b}_{ij}\hspace{0.17em}+\hspace{0.17em}{B}_{ij}}{2}{A}_{ij}$$

(30)

$$I{\left(\overline{A}\right)}_{i.}\hspace{0.17em}=\hspace{0.17em}\sum _{j}\frac{{d}_{ij}\hspace{0.17em}+\hspace{0.17em}{D}_{ij}}{2}\hspace{0.17em}\frac{{b}_{ij}\hspace{0.17em}+\hspace{0.17em}{B}_{ij}}{2}{A}_{ij}$$

(31)

$$J{\left(\overline{B}\right)}_{.j}\hspace{0.17em}=\hspace{0.17em}\sum _{i}\frac{{d}_{ij}\hspace{0.17em}+\hspace{0.17em}{D}_{ij}}{2}\hspace{0.17em}\frac{{a}_{ij}\hspace{0.17em}+\hspace{0.17em}{A}_{ij}}{2}{B}_{ij}$$

(32)

Rate effects are similar to (8), (9), and (10), with *D* substituted for *T*.

Bianchi and Rytina (1986) and Das Gupta (1987) used data for 480 occupational categories by sex from the 1970 and 1980 censuses (U.S. Bureau of the Census 1984) to measure the change in occupational sex segregation for the experienced labor force. Both studies used occupation as the compositional variable. Data on region are available, and region has been added as a second compositional variable in Table 8 to illustrate the potential benefits of a multivariate analysis. The results for the component sums closely match those reported by Das Gupta. Additional findings about the effects of regions and occupational categories are made possible by the refinement and extension of the index.

Results for categories may be aggregated to meaningful sums. The 480 detailed occupational categories used in the decomposition were aggregated for total, compositional, and dissimilarity effects to the 13 major categories listed in Table 8. Bianchi and Rytina used these categories in an attempt to locate the occupational source of changes in sex segregation. Das Gupta criticized their use of the change in the percentage of females in a category and instead favored using the change in the ratio of the percentage of females in an occupation to the percentage of females across all occupations. Compositional effects and dissimilarity effects sum to total effects in Table 8, thereby making possible the identification of occupational groupings and regions that contributed in large measure to a decrease or an increase in sex segregation. The refinement offered here and the decomposition of the effects of occupations and regions convey considerably more information than Bianchi and Rytina’s approach or Das Gupta’s suggestion.

In a univariate decomposition, the change in the overall sex structure of the labor force would be absorbed by, and attributed to, changes in the occupational distribution. The introduction of region into the decomposition contributes little to alter this statement when the effect of the regional variable is considered as a whole. The effect of region is negligible, and the variable might well be dropped from further consideration during the course of investigation. However, it would be an unfortunate analytical and potential theoretical error to consider the regional variable as a whole. When regional categories are observed, it appears that regions were deeply involved in the change in sex segregation. After we controlled for changes in the dissimilarity index and changes in the occupational distribution, the Northeast and North Central regions accounted for more half of the decline in occupational sex segregation. Under similar controls, the South and West contributed to an increase in sex segregation. Region at the variable level is a balance of categorical effects, and that balance is exceedingly deceiving of the true regional effect. Based on the very large compositional effect for each region, we suspect that region is an ecological-type variable that is a window on other differences in the experienced labor force that are beyond the scope of this article to investigate. Studies of racial segregation would probably benefit in a like manner from the refined and multivariate approach to the index of dissimilarity.

Das Gupta’s method is much used, but no instance of it being applied at the categorical level has come to our attention. Our intention is to make users aware of that possibility. Decomposition at the categorical level and of polytomous response variables are undeveloped aspects of Das Gupta’s efforts. Despite their simplicity, or perhaps because of their simplicity, these refinements offer powerful analytical tools for questions about why various measures differ between two or more groups. Our refinements reveal substantially more about relationships among demographic measures than has been available from decomposition techniques. To paraphrase a well-worn aphorism, they offer new wine in old bottles. Nevertheless, they do not free the analyst from making vital choices with regard to the control variables used in a decomposition. Any categorical variable entering a decomposition is bound to yield results, and these results can have meaning only insofar as the variables chosen have theoretical relevance. As is true in regression, uniqueness is a second condition for a categorical variable: it cannot fully overlap another compositional variable.

A Das Gupta type of decomposition does not utilize variances and therefore does not have predictive power in the usual sense of the term. A decomposition accounts for all of the difference in a measure. A refined Das Gupta decomposition has this characteristic but offers potentially valuable insights into the composition- and measure-based sources of differences that reside in the categories of variables. Thus, the analyst can say whether category A has a greater or lesser impact than category B on a difference between two groups, but can say little about how well either category is predictive of the difference.

Within the Kitagawa–Das Gupta decomposition framework, we propose refinements and extensions that expand the boundaries of what is achievable. Foremost, we demonstrate that a decomposition at the categorical level is built into the decomposition by variable. Observing the effect of categories occurs simultaneously with observing the effect of variables. By making explicit the equality of rates among variables in a cross-classification, we are able to decompose the rate effect among categories of variables. This step makes possible the complete attribution of a difference as the sum of composition effects and measure effects at both the category and the variable level.

Decomposition, as currently practiced, provides a circumscribed answer as to why rates differ between groups: because of the operation of compositional differences and an overall rate component. For cross-classified data, the compositional effects are rooted in group differences in the distribution of variables. At best, a decomposition at the level of variables contains hints as to the underlying sources of a difference in rates. The introduction of rate decomposition at the categorical level provides a behavioral-based explanation for the difference in rates, whereas compositional decomposition provides only a structural-based explanation. Referring to rate decompositions presented here, we can say that when the labor force participation of women rose between 1970 and 1985, the rise was concentrated and therefore caused by behavioral changes among young married women who had completed high school and had one or two children at home. When there was an increase in income dispersion between 1970 and 2000, the increase was concentrated among male college graduates. When occupational segregation between the sexes declined from 1970 to 1980, the decline was most prevalent among managerial and professional occupations and was most closely tied to the Northeastern and North Central regions of the country. None of these statements could be made unless decomposition occurred at the categorical level. The statistical truth of these statements may be tested for statistical significance with testing methods developed by Wang et al. (2000).

Kitagawa and Das Gupta developed their methods based on decomposing a dichotomous response variable. With little more effort, we show that it is feasible to decompose a polytomous response variable. Finally, we add the standard deviation and index of dissimilarity to those measures that are decomposable within the Kitagawa–Das Gupta framework. There are doubtless other additive measures that are algebraically decomposable. Establishing these additional measures would make the method even more general.

^{1.}In addition to the total female population, Lichter and Costanzo (1987) decomposed the change in labor force participation for black and nonblack women. Only the total female population is needed to demonstrate decomposition with categories.

ALBERT CHEVAN, University of Massachusetts, Amherst, MA 01003; e-mail:ude.ssamu.cos@navehc.

MICHAEL SUTHERLAND, University of Massachusetts, Amherst.

- Bianchi SM, Rytina N. “The Decline in Occupational Sex Segregation During the 1970s: Census and CPS Comparisons” Demography. 1986;23:79–86. [PubMed]
- Canudas-Romo V. Decomposition Methods in Demography. Amsterdam: Rozenberg Publishers; 2003.
- Cho L, Retherford RD. “Comparative Analysis of Recent Fertility Trends in East Asia.”. In: the International Union for the Scientific Study of Population, editor. Proceedings of the 17th General Conference of the IUSSP August 1973; Liege, Belgium: International Union for the Scientific Study of Population; 1973.
- Das Gupta P. “Comments on Suzanne M. Bianchi and Nancy Rytina’s ‘The Decline in Occupational Sex Segregation During the 1970s: Census and CPS Comparisons.’” Demography. 1987;24:291–95. [PubMed]
- Das Gupta P. “Methods of Decomposing the Difference Between Two Rates With Applications to Race-Sex Inequality in Earnings” Mathematical Population Studies. 1989;2:15–36. [PubMed]
- Das Gupta P. “Decomposition of the Difference Between Two Rates and Its Consistency When More Than Two Populations Are Involved” Mathematical Population Studies. 1991;3:105–25.
- Das Gupta P. Current Population Reports. U.S. Bureau of the Census; Washington, DC: 1993. “Standardization and Decomposition of Rates: A User’s Manual” Series P–23, No 186.
- Das Gupta P. “Standardization and Decomposition of Rates From Cross-Classified Data” Genus. 1994;3:171–96. [PubMed]
- Durand JD. The Labor Force in the United States. New York: Social Science Research Council; 1948.
- Jaffe AJ. Handbook of Statistical Methods for Demographers. Washington, DC: U.S. Government Printing Office; 1951.
- Kim YJ, Strobino DM. “Decomposition of the Difference Between Two Rates With Hierarchical Factors” Demography. 1984;21:361–72. [PubMed]
- Kitagawa EM. “Components of a Difference Between Two Rates” Journal of the American Statistical Association. 1955;50:1168–94.
- Lichter DT, Costanzo JA. “How Do Demographic Changes Affect Labor Force Participation of Women?” Monthly Labor Review. 1987;110:23–25. [PubMed]
- Shorrocks AF. “The Class of Additively Decomposable Inequality Measures” Econometrica. 1980;48:613–26.
- U.S. Bureau of the Census . Washington, DC: U.S. Government Printing Office; 1984. “Detailed Occupation of the Experienced Civilian Labor Force by Sex for the United States and Regions: 1980 and 1970.” 1980 Census of the Population, Supplementary Report PC80-S1-15.
- Vaupel JW, Canudas-Romo V. “Decomposing Demographic Change Into Direct Versus Compositional Components.” Demographic Research. 2002;7 Article 1: 1–14. Available online at http://www.demographic-research.org/Volumes/Vol7/1/7-1.pdf.
- Wang J, Rahman A, Siegal H, Fisher J. “Standardization and Decomposition of Rates: Useful Analytic Techniques for Behavior and Health Studies” Behavior Research Methods, Instruments, and Computers. 2000;32:357–66. [PubMed]
- Wolfbein SL, Jaffe AJ. “Demographic Factors in Labor Force Growth” American Sociological Review. 1946;11:392–96.

Articles from Demography are provided here courtesy of **The Population Association of America**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |