|Home | About | Journals | Submit | Contact Us | Français|
To investigate whether items assessing attention problems provide evidence of quantitative differences or categorically distinct subtypes of attention problems (AP), and to investigate the relation of empirically derived latent classes to DSM-IV diagnoses of subtypes of attention deficit hyperactivity/impulsivity disorder (ADHD), e.g., combined subtype (CT), Predominantly Inattentive Type (PI), and Predominantly Hyperactive/Impulsive Type (H/I).
Data on attention problems were obtained from maternal ratings on the Child Behavior Check List (CBCL). Latent class models, which assume categorically different subtypes, and factor mixture models, which permit severity differences, are fitted to data obtained from Dutch boys at age 7 (N=8079), 10 (N=5278) and 12 (N=3139). The fit of the different models to the data is compared in order to decide which model, and hence which corresponding interpretation of AP, is most appropriate. Next, ADHD diagnoses are regressed on latent class membership in a subsample of children.
At all 3 ages, models that distinguish between three mainly quantitatively different classes (e.g., mild, moderate, and severe attention problems) provide the best fit to the data. Within each class, the CBCL items measure three correlated continuous factors that can be interpreted in terms of hyperactivity/impulsivity, inattentiveness/dreaminess, and nervous behavior. The AP Severe class contains all subjects diagnosed with ADHD-CT. Some subjects diagnosed with ADHD PI are in the moderate AP class.
Factor Mixture analyses provide evidence that the CBCL AP syndrome varies along a severity continuum of mild, moderate, to severe attention problems. Children affected with ADHD are at the extreme of the continuum. These data are important for clinicians, research scholars, and the framers of the DSM-V as they provide evidence that ADHD diagnoses exist on a continuum rather than as discrete categories.
Is it best to consider Attention deficit/hyperactivity disorder (ADHD) as a categorical disorder or as an extreme of a continuous trait? This is one of the many questions that the DSM-V Child Work group is considering. The question is an important one as the current diagnostic rules as defined in the DSM-IV-TR identify three distinct subtypes (e.g., combined (CT), predominantly inattentive (PI), and predominantly hyperactive/impulsive (H/I).1 These subtype diagnoses are based on the presence of at least 6 of 9 inattention items (PI), 6 of 9 hyperactive/impulsive items (H/I) or at least 6 of each (combined type) beginning prior to age 6 and causing impairment in at least two settings. There are no differences in recommendations for diagnostic cut-points for the age or gender of the individual. The DSM-IV items, diagnostic subtypes, and diagnostic rules have been the subject of intense research scrutiny. Critics of this categorical approach point to the fact that the same criteria are applied to girls as to boys, to 6 year olds and to 18 year olds, and set up nonsensical scenarios where a child with academic problems who has 10 of the 18 items does not meet DSM-IV diagnostic criteria for ADHD (e.g. the child has 5 symptoms of Inattention and 5 symptoms of Hyperactivity/Impulsivity) where a child with 6 symptoms (e.g. 6 Inattention symptoms) does meet criteria. 2 Indeed the members of the DSM-V Disruptive Disorder workgroup are considering how best to recraft the ADHD criteria and are considering both continuous and categorical approaches. 2
A variety of research teams have been investigating the categorical-continuum debate. Using latent class approaches, some have argued that ADHD items exist on a severity continuum divided across the attention problem items and the hyperactivity/impulsivity items. 3 Some studies conclude that the liability to develop attention problems is continuous, and that clustering of subjects in terms of subtypes neglects variation in severity3–5. Others have argued that the latent classes are replicable across a wide variety of samples, and that they represent a series of genetically discrete disorders.6–10 Where prior studies have used LCA or factor analyses in the study of ADHD, in this study we use the Factor Mixture Modeling approach, which combines latent class and factor approaches in order to determine whether ADHD is best conceptualized as a continuous trait or a categorical diagnosis.
To do this work we use the AP syndrome scale of the CBCL,11 which is a widely used instrument to screen for problem behaviors in children. In practice, screening is carried out by summing the item scores in conjunction with established cut-off points (i.e, the minimum sum score to obtain a positive diagnosis). For instance, a cut-off point of 60 when summing the item scores of the CBCL AP scale discriminated well between DSM defined ADHD and non-ADHD patients.12 In the current analysis, item level rather than symptom sum level data are used to permit a more fine grained analysis of specific attention problems. Sum scores treat the scale items uniformly, and in case of differential importance of certain items, or in case subsets of items measure subtype specific symptoms, an item level analysis is likely to capture these differences more adequately. To assess DSM-IV diagnoses and ADHD subtypes we used the Diagnostic Interview Schedule for Children-IV because it had been used by prior groups to determine the relation between categorical and quantitative conceptualizations of attention problems.
The analysis approach of the current study is similar to a recent analysis of the SWAN ADHD data obtained from the Northern Finnish Birth Cohort (NFBC), which showed that in both sexes factor mixture models with severity differences within class provided a clearly superior fit to the data when compared to latent class models.4 The results were interpreted as evidence that severity differences are substantial, and that the SWAN ADHD data can best be described in terms of quantitatively ordered classes that differentiate between the unaffected majority and a potentially affected minority. The NFBC data were obtained from adolescents whereas the current study focuses on data from children 7–12 years old. It has been argued that subtypes of ADHD might be more pronounced closer to latency age. The current study addresses this issue by comparing samples at different ages starting at age 7 and bridging the time interval to early adolescence. Our samples might therefore provide evidence of a decreasing subtype pattern with increasing age.
The central question of whether or not ADHD exists as three discrete diagnostic subtypes or a continuous liability for an attention syndrome that may change across development is largely unanswered. If ADHD exists as a continuous liability, further work will need to be done to determine mediators and moderators of that liability – these include factors such as age of the child, gender of the child, ethnicity and culture, and who the informant is. From a scientific point of view it is difficult to imagine that the same ADHD diagnostic categorical cut point would apply to a 6 year old female from China as would apply to a 18 year old male from the US. Thus contributing to solving the question of whether or not ADHD is best conceptualized as a diagnostic category versus a continuum will have impact on future research, on diagnosis, on treatment, and perhaps on the DSM-V conceptualization of ADHD.
The subjects in this study are Dutch male twins whose parents voluntarily registered with the Netherlands Twin Registry (NTR).13, 14 The NTR families largely represent the general Dutch population. Based on available data, the average age of the mother and father at birth of the twin is 30.6 and 33.01, respectively. From age 7 to age 12 of the twins, the percentage of married parents decreases from 92.3% to 88.1%. Parents’ educational and occupational levels are presented in Table 1.
We use mothers’ ratings on the CBCL attention scale at age 7, 10, and 12. Twins are treated as individuals in the current analysis. To account for non-independence of observations, a sandwich type estimator is used to obtain standard errors, see Analysis section. At age 7, N=8079, at age 10 N=5278, and at age 12 N=3139 twins. The smaller N at later ages reflects the longitudinal design of the study (not all children have reached ages 10 and 12). Note that the samples at age 10 and 12 are not exact subsamples of the sample at age 7, but may contain children who were not tested at age 7 (subjects entered the study at a later age or mother reports are missing at earlier ages).
To investigate confounding of changes over time with selection effects in our 3 age samples, we also created a subsample of subjects that was observed at each time point. This longitudinal subsample consists of N=2531twins. The pattern of results is the same as in the overlapping cross sectional sample. Since the smaller sample size provides less power to detect subtypes if present we focus on the results from the cross sectional sample. 15 The DSM symptoms (see below) were obtained in a selected subsample of N=489. Subtype prevalence in this subset is provided below.
At age 12, we asked if methylphenidate had been prescribed (question available for 86% of the 12-year-olds). This was the case for 96 children (1.4%), and 50 children (0.7%) that had used methylphenidate before but not at present. Data from these children were included in the analyses.
Parents were sent the CBCL and were asked to return it by mail. Individuals within the NTR are invited to participate at each wave of data collection, regardless of their previous participation. Previous work has demonstrated that the responders and non-responders differ only on socioeconomic status, and that the effect size of this difference is negligible.14 Data were entered, under anonymous IDs, into a phenotype database.13 The Diagnostic Interview Schedule for Children (DISC) was administered at age 12 to a subset of the current sample (N=985 total, of which N= 489 males). The selection was based on CBCL scores and aimed at obtaining subjects with high and with low CBCL scores. Since the probability of being selected for the DISC depends only on the CBCL, and since the CBCL data are included in the current analysis, the selection does not induce non-random missingness.16 Details of the selection procedure are described in Derks et al..17
The CBCL is a standardized questionnaire used for parents to respond to 118 problem behaviors exhibited by their child over the previous 6 months. The parent responds along a 3-point scale where 0, 1, and 2 indicating that the behavior is not true, sometimes true, or often true for the child. The psychometric stability of the CBCL has been well established in American and Dutch samples.11, 18 The analyses performed here use maternal reports on the 1989 version of the Dutch CBCL. The Attention problems (AP) scale consists of 11 items. The items are shown in abbreviated form in Figure 1.
The DISC is a structured interview that assesses DSM-IV symptoms including those of ADHD. Mothers were asked to indicate whether a symptom was displayed by the child during the last year. The symptoms were aggregated according to the DSM-IV type A criteria to obtain two binary variables indicating presence or absence of an H/I diagnosis and a PI diagnosis, respectively (e.g., 6 symptoms or more). Subjects with both diagnoses belong to the combined subtype (CT). In the cross-sectional samples, the sample size with DSM data at age 7, 10, and 12 are N7=449, N10=336, and N12=331. In the longitudinal sample N=284 subjects have DSM data.
Due to subject overlap, the prevalence of the three subtypes in the larger cross sectional sample is very similar across age and sample type, namely CT=.065, PI=.07, H/I=.031. Note that due to the selection structure of the DSM subsample, these rates only reflect the distribution of subtypes in the subsample with DSM data, but do not necessarily characterize the full samples.
The majority of previous studies using item level data to investigate the ADHD phenotype relied either on factor analysis (FA), or on latent class analysis (LCA). FA and LCA are latent variable models, which are based on the general concept that observed responses on items of a scale covary because of a small number of underlying latent variables that correspond to the construct of interest (e.g., attention problems). FA uses continuous latent variables (i.e., factors), which represent gradual (severity) differences. LCA, on the other hand, uses a categorical latent variable with 2 or more categories called latent classes. The latent classes represent different types within a population. Within a type or class, observed items are specified to have zero correlations, such that mean differences between classes explain the overall covariances between observed items. Importantly, fitting either a FA or a LCA model does not test whether the latent variables are continuous or categorical. Only a comparison of FA and LCA within a general statistical framework can assess which model type, and therefore, which type of latent variable, provides a better fit to the data. A better fit of FA models would suggest severity differences, whereas a better fit of LCA models would suggest subtypes.
Factor mixture modeling (FMM) provides such a general framework. FMM extends LCA and FA by combining the two in a single general model. Within each latent class, instead of specifying that variables have zero correlations as in LCA, FMM permits to specify a factor model. The factors within class can capture potential severity differences within class. An FMM with zero factor variance reduces to LCA, and an FMM with a single class reduces to FA. The model was proposed by different authors,19–21 and is used in a wide variety of different fields (e.g., developmental psychology,22, addiction,23 criminology,24, psychiatry 4).
In the current study, we fit factor models, latent class models, and factor mixture models to CBCL attention items and do so in large samples of boys at ages 7, 10, and 12. Simulation studies have shown that comparing model fit of FA, LCA, and FMM leads to correct model choice in a wide variety of settings.15, 25 This is the approach followed here. Since all models are specified within the same general framework, indices of model fit can be used to decide which model is the best fitting model.15, 21, 26 It has been shown that the Bayesian Information Criterion (BIC) performs better or equal compared to the Akaike Information Criterion (AIC), the consistent AIC (CAIC), and the adjusted BIC.27 The same study also showed that the BIC clearly outperforms the adjusted likelihood ratio test (aLRT) when comparing FMMs.28 The bootstrapped LRT performs also clearly better than the adjusted LRT, but is not feasible in this study due to computation times. We base our decisions on the BIC, and present the AIC, the CAIC, and the adjusted BIC for completeness.
DSM diagnoses of CT, H/I and PI observed at age 12 are regressed on latent class membership. We estimate the relation between the latent classes and DSM subtype diagnoses in a single analysis. An alternative would be to first assign subjects to the latent classes, and then compute the prevalence of the different diagnoses in each class. However, classification error in assigning subjects to classes can be substantial (e.g., >80% in smaller classes of affected subjects), leading to severely biased prevalence rates.29 The one step approach avoids classification errors, and regressing subtype diagnoses on latent classes results in the estimated proportions of the different diagnoses within each class in a single analysis.
All analyses are carried out with Mplus using data from all twins.30 To obtain correct standard errors in the presence of dependent observations, we use a robust “sandwich-type” estimator (MLR estimator).31, 32
An initial exploratory factor analysis showed three eigenvalues larger than 1. The corresponding 3-factor structure has a clear and interpretable loading pattern. The first factor is largely defined by items representing symptoms of hyperactivity/impulsivity (see items 1, 2, 3, 6, 9, and 10 in Figure 1). The second factor explains common variance of indicators of inattentiveness/dreaminess (items 1, 4, 5, and 11). The third factor is defined by the two items related to nervous behaviors (items 7 and 8). The high covariance between these two items as captured by the third factor may be due to similar item wording. The path diagram showing the structure of the 3-factor model is shown in Figure 1.
For the main analysis, we fitted 7 different models to each of the three age groups. Models 1–3 are factor mixture models with 2, 3, and 4 latent classes. Models 4–7 are latent class models with 3, 4, 5, and 6 classes. Based on the factor structure of the initial exploratory factor analysis, the factor mixture models are specified with three correlated factors within class. In the first part, we focus on the structure of AP and fit models without integrating DSM diagnoses. Then the selected best fitting model is fitted again to the data while incorporating logistic regressions of the DSM diagnoses (CT, H/I and PI) on the latent class variable. Suppose a latent class model is the best fitting model. Since DSM diagnoses are available only in a selected subsample, this part of the analysis will show whether the diagnosed subjects are more likely to belong to a particular latent class (e.g., PI subjects might have a very high probability of belonging to an “inattentive” class, or, all diagnosed subjects might belong to a “severe” class).
Results are compared across the age groups focusing on (i) the best fitting model, (ii) the relative sizes of the latent classes (i.e., class proportions), and (iii) the differences across classes with respect to the response patterns on the 11 CBCL attention items, and the relation of the classes to the DSM diagnoses.
The general pattern of results is the same for the larger, cross-sectional samples and the smaller longitudinal sample that contains individuals with complete data at all three time points. Due to this similarity, we focus on the results of the larger cross-sectional sample because of the higher power to detect subtypes. In all samples, FMMs have a clearly better fit than the FA or LCA models when considering BIC. Table 2 shows the results for the cross-sectional samples. All information criteria are clearly lower for any of the factor mixture models than for the LCA models with similar model parsimony. To achieve a minimum BIC, LCA requires 8-classes (208 estimated parameters) for the 7-year olds, and 7 classes (183 parameters) for the 10 and 12 year olds. The fact that LCA models have a much higher BIC than FMMs indicates lack of model parsimony.
The model fitting results show that there is substantial variation in attention problems within the classes. When comparing the factor mixture models with 2, 3, and 4 classes, it is evident that the power to detect smaller classes decreases with sample size. At age 7, the BIC favors the 3-factor 3-class factor mixture model, at age 10, the BIC does not differ much between the 2 and the 3-class models, and at age 12, the BIC favors the 2-class model. The lack of power to detect the third class is also evident in the smaller sample of subjects with data at all three ages. In the smaller sample, the two-class FMM is the best fitting model based on all information criteria at all three ages.
Since lack of power is the main reason for the better fit of the 2-class model in the sample of 12 year olds compared to the larger samples at age 7 and 10, we base the more detailed comparison below on results corresponding to the 3-class factor mixture model in all age groups.
In the larger overlapping cross sectional samples, the 7-year olds have a larger high scoring class (20.3%) than the 10 and 12 year olds who are very similar with respect to the class proportions (14.5% and 15.6%). The unaffected low scoring majority class has approximately the same size at all ages (62.3%, 63.6%, and 61.5%).
The response patterns on the 11 items are very similar in the larger, cross sectional samples. Figure 2 shows the results for the larger samples since power to detect three classes was sufficient only in the larger samples. As can be seen in the figure, the three classes are mainly quantitatively ordered in all three age groups. Class 1 has a higher probability of scoring “very true” on most items, and class two has higher probabilities than class three. Strictly quantitative differences would be reflected in parallel response profiles on the 11 items, whereas qualitative differences would be reflected in cross-overs with one class scoring high on hyperactivity items but low on inattentiveness items, and another class showing the reverse profile. Cross-overs in response profiles are largely absent in Figure 2. There are three items at age 7 that form an exception. Class two has a slightly higher response probability on items 5 (daydreams) and item 9 (“poor schoolwork”). In addition, class 3 scores slightly higher on item 3 (“can’t sit still”) than class 2, although class 1 has clearly the highest probability of scoring in the highest response category on that item. At age 10, items 5 and 9 show the same tendencies as at age 7. However, none of those differences reach statistical significance in this large sample.
The relation between the CBCL classes and the DSM based diagnoses of H/I, PI, and CT is summarized in Table 3, which provides the proportions of boys with a given subtype diagnosis in the high, moderate and low scoring classes at age 7, 10, and 12. The proportions are derived from the logistic regression of CT, H/I, and PI diagnoses on the latent class variable in the 3-class 3-factor mixture model. It should be noted that the proportions contain the prediction error of the logistic regression. A different approach would be to assign all subjects to their most likely latent class, and then compute the proportions of subtype diagnoses in each class. However, this approach would accumulate the prediction error and the error in class assignment. Class assignment error can be extremely high especially in the smaller classes (e.g. >80% incorrect assignment).29
Based on our analysis we can conclude reliably that the probability of either diagnosis in the low scoring majority class is zero in all three age groups. Furthermore, in all three age groups, the highest scoring class contains all or almost all subjects with a diagnosis of CT or H/I. Depending on age, 30% to 45% of the subjects with a diagnosis of PI belong to the high scoring class, and the moderate scoring class contains the remaining subjects with a diagnosis of PI.
The results of the current analysis show quantitative differences in the AP syndrome of the CBCL in children aged 7–12. The FMM analyses of the CBCL data reveal similar results as our earlier findings when we used the same approach with SWAN data obtained in the Finnish adolescents.4 The analysis of CBCL AP items shows that the samples consist of three latent classes that are located along correlated continua (“severe”, “moderate”, and “low” scoring AP classes). The severe and moderate classes are small (6–15%, depending on the age group) whereas the ‘low’ scoring class is the largest class (consistent with over 50% of children having low or no attention problems). These findings, using the FMM approach, advance the argument that the AP syndrome exists on a severity continuum, with evidence of a similar class structure across the developmental period of ages 7–12. Especially the sample of 7-year old boys (N=8079) has sufficient power to detect subtypes if present. With a general prevalence of ADHD of 8–12%, approximately 800 of the 7-year olds would be diagnosed, and subtypes within such a large group would be detectable using FMMs.15, 25 The current analysis, however, shows that even in younger children AP is best described in terms of severity differences, which matches our conclusion drawn from the analysis of adolescents.4
When comparing the different age groups, it is interesting that the two items that most closely map on the DSM H/I subtype, ‘can’t sit still’ and ‘impulsive’ both diminish in intensity with age (e.g. from age 7 to 10 to 12). This finding is consistent with the literature that hyperactivity symptoms diminish with age yet attention problems persist. We observed this pattern in both the larger, overlapping age samples, and in the sample with identical subjects at the three time points, hence the diminished intensity is not due to changes in the composition of the samples.
The fact that the three CBCL classes are ordered quantitatively is also reflected in the relation between AP and DSM-IV subtype diagnoses in a subsample observed at age 12. All or almost all children with a DSM-IV ADHD CT and H/I are in the ‘severe’ AP class. Children with DSM-IV PI are divided over the severe and moderate AP classes. Perhaps most important, none of the children with a DSM-IV ADHD subtype are in the low scoring majority class.
As the DSM-V process moves forward it will be important to consider these findings in light of the consideration of including a quantitative axis of diagnostic description. We have argued that a quantitative approach that allows for differences across ages and genders makes sense for both research and clinical work in children who suffer from psychopathologic conditions such as ADHD.3
From this work a clinician will benefit from knowing that attention problems exist on a severity continuum, thus presenting a clear invitation to develop evidenced based interventions that aim towards diminishing the severity of the symptoms within the continuum. In this way the treatment of ADHD is no different than the treatment of hypertension, in which a reasonable evidenced based method can be developed to evaluate and measure the movement from a pathological level (e.g. severe class AP or a diastolic pressure of 100) to a non-pathological level (e.g. AP low class or a diastolic BP of 80) rather than to the absence of attention or the absence of blood pressure. Further, the contention that subtypes of DSM-IV ADHD are not different in their FMM class membership may allow a more general treatment approach toward children with ADHD, which is closer to what happens in most clinics today in any event. For example, most clinicians do not vary their pharmacologic or behavioral treatments based on whether or not the child has DSM-IV ADHD CT, H/I, or PI.
Taken together, these data argue for considering DSM-IV ADHD as existing on a severity continuum rather than as discrete diagnostic categories. Implicit in the continuum argument is the need to identify common mediators of risk. In the area of ADHD it is obvious that the current approach of applying the same criteria to individuals of both gender and all ages is unrealistic. The continuum argument allows for the creation of normative distributions by age, gender, informant and ethnicity. Such advances, which seem so simple to accept, should be considered as key modifications in the DSM-V or subsequent editions of our diagnostic manuals.
Our study has a number of limitations. First, we focus exclusively on AP in boys. Our rationale is that prevalence of attention problems is higher in boys, and that the statistical power to detect subtypes increases with prevalence rates. The increasing sample size in the NTR will permit an analysis of attention problems in girls in the future. A second limitation concerns the fact that we relied on a specific statistical approach to detect subtypes, FMM. Other approaches such as the taxometric procedures developed by Meehl and colleagues have been used for this purpose.33 However, it has been shown that taxometric procedures have less power to detect classes than FMM.34 Third, we treated twins as individuals, thereby neglecting the genetically informative structure of the sample.35 A twin mixture model has recently been proposed, however, the model decomposes variance within class into genetic and environmental components rather than the more interesting decomposition of differences between classes.36 In addition, twin mixture models assume that correct estimation of within class variance is unproblematic. However, that this may not be the case, especially when class proportions differ substantially (e.g., small minority classes and large majority classes).29 Fourth, regarding the utility of factor mixture modeling to support selection of subjects for prevention or treatment, it should be noted that simulation studies have demonstrated high error rates in assigning subjects to classes.29 This clearly limits the potential of mixture analyses for selection purposes. The current study shows that factor mixture analyses may be used to exclude subjects that are very unlikely to be affected (i.e., the low scoring majority class). Finally, the current study may be enhanced by including relevant gene candidates to predict class membership. Recent work demonstrates that that substantial sample sizes are needed to reliably detect small gene effects using FMMs.37
This research was supported by Spinozapremie (NWO/SPI 56-464-14192; Twin-family database for behavior genetics and genomics studies (NWO 480-04-004); the VU-CNCR (Centre Neurogenetics/Cognition Research), Developmental Study of Attention Problems in Young Twins (NIMH, RO1 MH58799-03.
This article is the subject of an editorial by Dr. Anita Thapar in this issue.
The authors report no conflicts of interest.