As noted earlier in the paper, the most recent Census data show that 96.6% of US adolescents in the age range 13–17 are students. It would consequently have been expected tat about 31 non-student respondents would be in the household sample (i.e., 3.4% of 904). The actual number was 25. This is too few to support extrapolation to the population of the roughly half million non-student adolescents in the US. The non-student respondents were consequently excluded from the bulk of the analyses, which concentrated on the 10,123 respondents who were students. Weighting focused on the student population. As the sample design involved a dual-frame approach, a distinct weighting scheme was used to make each sample representative of adolescents in the US household population on the cross-classification of a wide range of socio-demographic and geographic variables. The two weighted samples were then merged for purposes of analysis.

The household sample

The household sample weighting was the simpler of the two in that weights had already been developed for the NCS-R household sample. The NCS-R weights are described elsewhere (

Kessler et al., 2004) and will not be discussed here. The first step was to add these weights to the adolescent data and adjust them for differential probability of selection of adolescents as a function of number of other adolescents in the household. These doubly-weighted data were then compared with nationally representative Census data on basic socio-demographic characteristics for purposes of post-stratification. Two data files were used for this purpose. The first was the 2000 Census Public Use Microdata Sample (PUMS;

www.census.gov/support/pumsdata.html) of a 5% sample of the entire US population. Data were extracted from the PUMS for adolescents who were students at the time of the Census. The second was a small area geo-code data file prepared by a commercial firm that aggregated 2000 Census data to the level of the Block Group (BG) for each of the 208,790 BGs (

http://www.geolytics.com/resources/us-census-2000.html). These BG-level data were linked to the data record of each NCS-A respondent, while the national distributions for the population on these same BG-level variables were generated by weighting the BG-level data by the population of eligible adolescents in each BG.

A wide range of variables available in the NCS-A as well as in the PUMS or the BG-level data file was selected to post-stratify the NCS-A data. (Details available on request.) In addition, some information was available about variables not in the Census files available for the NCS-A household sample, as the NCS-R was completed in the households of all NCS-A non-respondents and non-respondents. In particular, comparisons and weighting were made for discrepancies between the DSM-IV/CIDI disorders reported by the adult NCS-R respondents in the households of NCS-A respondents and non-respondents.

The post-stratification weight was created by using an exponential weighting function to make the distributions of post-stratification variables in the adjusted weighted sample agree with the distributions in the external datasets. Specifically, the weight for case

*k* was of the form

where

is the adjusted weight,

*W*_{k} is the weight before adjustment,

*X*_{k} is the vector of characteristics associated with case

*k* (derived either from the survey data or from the BGD) including a 1 for the intercept, and β is a vector of coefficients calculated to satisfy the condition

where

**X** is the vector of population distributions of the post-stratification variables selected from the PUMS and BPS datasets. This procedure is a version of raking calibration, commonly used to adjust surveys to match census data (

Deville et al., 1993), but generalized in this case to allow for adjustment using continuous as well as categorical variables. A program written in the R programming language was used to estimate β and to create these calibrated weights. The weights resulted in the distributions of the post-stratification variables in the weighted sample being identical to those in the population datasets, while maintaining the associations among these variables found in the sample.

Some sense of the extent to which post-stratification affected variable distributions can be seen by comparing the distributions of selected post-stratification variables in the sample before versus after weighting. () For the most part, the ratios of proportions based on final (F) weights, which equal the actual population proportions found in the databases used for post-stratification, to the corresponding proportions without post-stratification weighting (U) were in the range 0.8–1.2. This means that proportions typically changed by less than 20% of their base. There were some exceptions, though, as illustrated by the fact that the proportion of the population who defined themselves as neither being Non-Hispanic White, Non-Hispanic Black, or Hispanic is only 61% as high in the population (5.0%) as in the un-weighted sample before post-stratification (8.2%).

| **Table 2**Un-weighted and weighted distributions of selected NCS-A post-stratification variables among adolescent student respondents in the NCS-A household sample (n = 879) |

The school sample

Weighting for the school sample was based on weights that controlled for three sets of variables. The first set was extracted from the Quality Education Data (QED) database, a commercially-produced database of the characteristics of all primary and secondary schools in the US (

http://www.qeddata.com), controlling to population totals of these variables (weighted by school enrollment) adjusted for discrepancies between the schools included in the sample and the population of all schools in the US. A wide range of school characteristics were examined that included such variables as size, grades covered, type of school (e.g., public versus private, special needs school, K-8 school, junior high school, high school), average size of classroom, average student:teacher ratio, and presence versus absence of various school programs. The other two sets of variables were the same PUMS and BG-level datasets used in the household sample. The same statistical approach to weighting was used as in the household sample. The within-household probability of selection weights used in the household sample, though, were not needed in the school sample, as schools and students within schools were selected with probabilities proportional to the size of the eligible student body.

As with the household sample, post-stratification did not have dramatic effects on distributions of the post-stratification variables in the school sample. (Detailed results available on request) For the most part, relative proportions based on final (F) weights compared to un-weighted (U) data were in the range 0.75–1.25. This means that proportions typically changed by less than 25% of their base. For example, the proportion of adolescents who are Non-Hispanic White was estimated to be 55.5% before post-stratification compared to the actual population distribution of 65.6%, a relative increase of 18% (i.e., 65.6/55.5) on this proportion after post-stratification. This general pattern of relatively modest adjustments in proportions held for the vast majority of the post-stratification variables included in the analysis.

Weight trimming

When weights vary greatly relative to the mean, estimates tend to have large standard errors. This, in turn, leads to inefficiency in estimation. It is possible to deal with this problem by trimming extreme weights. There is a trade-off in doing this, though, as weight trimming can lead to bias in estimates. If the reduction in variance created due to added efficiency exceeds the increase in variance due to bias, the trimming is helpful overall. Weighting is unhelpful, in comparison, if the opposite occurs (i.e., the increase in bias is greater than the decrease in imprecision).

It is possible to study this trade-off between bias and efficiency empirically in order to select an optimal weight trimming scheme by calculating the mean squared error (MSE) of estimates of substantive importance. This was done by evaluating the effects of weight trimming on ten prevalence estimates: lifetime and 12-month prevalence estimates of any DSM-IV/CIDI mood, anxiety, externalizing, substance use, and any disorder. As described in detail elsewhere (

Kessler et al., in press-b), the DSM-IV diagnoses generated in the NCS-A combine parent and adolescent reports and have good concordance with independent diagnoses based on semi-structured research diagnostic interviews with parents and adolescents by blinded clinical interviewers in an NCA-S clinical reappraisal study. In order to evaluate the effects of weight trimming on prevalence estimates based on the CIDI interviews, MSE for variable Y at trimming point p was defined as

where B

_{Yp} is the bias of the prevalence estimate at that trimming point and Var(Y

_{p}) is the variance of Y at trimming point p. An unbiased estimator of B

_{Yp}_{2}is

where

_{Yp} is an unbaiased estimator of bias and Vâr(

_{Yp})is the estimated variance of

_{Yp}. This means that and unbiased estimator for

Eq. (3) can be rewritten as

Each of the three elements in

Eq. (5) can be estimated empirically for any value of p in comparison to an untrimmed estimate (which is assumed to be unbiased), making it possible to calculate MSE across a range of trimming points to determine the trimming point that minimizes MSE for any given variable Y. The first term, (

_{Yp})

^{2}, can be estimated directly as (Y

_{p}−Y

_{0})

^{2}, where Y

_{0} represents the weighted prevalence estimate of Y based on the untrimmed data and Y

_{p} is the weighted prevalence estimate based on data trimmed at trimming point p. The other two elements in

Eq. (5) can be estimated using pseudo-replication (

Zaslavsky et al., 2001). In the present case, this was done by generating 84 separate estimates for Y

_{p} at each value of p for each of the two samples. The number 84 is based on the fact that the NCS-R sample design has 42 geographic strata (made up of PSUs or, in the case of non-self-representing PSUs, pairs of PSUs) each with two sampling-error calculation units (SECUs; constituting subsamples within self-representing PSUs and individual PSUs within strata that are made up of multiple non-self-representing PSUs), for a total of 84 stratum-SECU combinations. The separate estimates were obtained by sequentially modifying the sample and then generating an estimate based on that modified sample. The modification consisted of removing all cases from one SECU and then weighting the cases in the remaining SECU in the same stratum to have a sum of weights equal to the original sum of weights in that stratum. If Y

_{p} is defined as the weighted estimate of Y at trimming point p in the total sample and Y

_{p(sn)} is defined as the weighted estimate at the same trimming point in the sample that deletes SECU n (n = 1,2) of stratum s (s = 1–42), then Var(Y

_{p}) can be estimated as

Var(B

_{Yp}) was estimated in the same fashion by replacing Y

_{p(sn)}in

Eq. (4) with

This method was used to evaluate the effects of trimming between 1% and 10% of respondents at each tail of the weight distribution in each of the two samples. Trimming consisted of assigning the weight at the trimming point to all cases with more extreme weights on that tail of the weight distribution. The weighting analysis described in

Eq. (1)–

Eq. (2) was replicated anew for each combination of trimming points on the two tails so as to obtain an accurate post-stratification of the weighted sample to the population. Prevalence estimates and their design-based standard errors, which were estimated using the Taylor series method (

Wolter, 1985), were then calculated for each of the ten variables used in the analysis of bias-efficiency trade-off. Inspection of empirical variation in MSE with changes in trimming rules was used to select final trimming rules that were used to generate the results in .

In both samples, MSE was not strongly affected by trimming. Final trimming rules were consequently chosen that trimmed the minimum proportion of cases while approximating the minimum average MSE across all possibilities considered. In the household sample, no weight trimming was performed for low weights but the highest 2.5% of weights were trimmed. This reduced the coefficient of variation of weights (the ratio of the standard deviation of weights to the mean weight) by about 8%. This was achieved with a roughly 2% increase in MSE due to bias, for a total reduction in MSE of approximately 6%. In the school sample, the bottom 2.9% and upper 0.1% of weights were trimmed, reducing the coefficient of variation of weights by about 9%. This was achieved with a nearly 4% increase in MSE due to bias, for a total reduction in MSE of approximately 5%.

Weighting the parent sample

The weights described so far were developed for the full samples. Weights were similarly calculated for the subsamples of cases with parent data to make possible analyses requiring these responses. To make these samples nationally representative with respect to the weighting variables, the weighting analyses described above was replicated by treating the total sample as the “population” and the subsample of cases with parent SAQ data as the “sample.” The post-stratification control variables included all those used in the full-sample analyses in addition to the lifetime and 12-month prevalence estimates in the total sample of DSM-IV/CIDI mood, anxiety, impulse-control, and substance disorders. By controlling for the presence of diagnoses adjustments were made for possible tendencies of parents to be either more or less likely to respond to the SAQ when their children had certain types of diagnoses. At the same time the national representativeness of the full sample with respect to demographic and school characteristics was retained. This re-weighting was carried separately in the household and school samples and, within each of these samples, in the subsamples with full SAQ data and either full or partial SAQ data. The final trimmed weights from the total sample were included as base weights in these analyses and no further trimming was done when the post-stratification weights were applied to the data.

Combining the weighted household and school samples

The research team plans to carry out substantive analyses of the NCS-A data largely in a consolidated sample that combines the household and school samples. Some decision about relative weighting is needed to do this combining. The obvious approach is to transform the weights in each sample to sum to the number of respondents in the sample and then combine these two weighted data files into a single file. However, this approach implicitly assumes that the two samples have the same efficiency. This assumption turns out to be incorrect, as shown by the fact that the H:S ratio of design-based variance estimates of various descriptive measures in the household sample (H) relative to the school sample (S) is generally lower than the roughly 10.5:1 ratio of the two sample sizes (9,244 vs. 879) which means that the NCS-A household sample is more efficient for this set of estimates than the NCS-A school sample. () The reason for this is that the NCS-A household sample has less clustering than the school sample because the number of adolescent student respondents in the household sample (n = 879) is smaller than the number of area segments (n = 1001). In the case of the school sample, in comparison, the number of adolescent respondents (n = 9,244) is nearly 30 times larger than the number of schools (n = 320), which means that there is considerable clustering at the segment level.

| **Table 3**Ratios of design-based variance estimates of selected descriptive statistics in the household sample (H) relative to the school sample (S) |

Based on these results, the approach taken to combine the household and school samples into a single larger consolidation sample gave higher weight to the household sample in recognition of the greater efficiency of the household sample than the school component. This approach is based on the goal of combining the two samples into a consolidated dual-frame sample that minimizes the overall MSE of estimates, which is achieved when the two samples are weighted inversely proportional to their MSEs (

Lepkowski and Groves, 1986). Based on the results reported in , this was done by assuming that the variance of estimates average 6 times higher in the household sample than the school sample, which means that we constructed the consolidated sample so that the sum of weights in the school sample was six times that of the sum in the household sample. Combined samples were created using this same weighting approach for the PSAQ student sample and the short-form PSAQ sample.