About half of all A-TAC NDP screen-positive children had an actual clinical diagnosis of an NDP at the clinical examinations, and about 40% of those were the same NDP diagnosis that they had screened positive for three years earlier.
The coexistence of diagnoses (often referred to as “co-morbidity”) was more common than “pure” diagnoses. This finding fits with a growing body of evidence
], suggesting that children with NDPs have very high rates of co-occurring problems. There is also an increasingly documented drift between the NDP categories. Several researchers have suggested that pure NDPs are relatively unusual
] and that all children with an NDP should be assessed for all types of possible overlapping conditions.
The diagnostic outcome in this systematic clinical follow-up of a total population screened for NDPs showed that screen-negative children, even high-risk cases such as co-twins of affected children, generally did not have ASDs or LDs, but around 10% had sufficient AD/HD problems or TDs to meet the criteria for one of these diagnoses. It thus seems that the A-TAC is reasonably good at excluding NDPs in population studies, even though the total sensitivity for any NDP was just below 70%; a low score therefore is far from confirmation that no NDP-related mental health issues or diagnostic conditions are present.
In this prospective population-based study, the A-TAC once more had excellent screening properties for ASDs (AUC
0.91, with a sensitivity of 70% and a specificity
90%). These values are comparable to previously reported figures from clinical samples. However, the instrument was less accurate when screening for other diagnoses than ASDs (AD/HD, LDs, and TDs). Still, the low cut-off values showed a sensitivity around 70% (the TDs module represented the low point with 45%). Considering the interval of three years, these figures are still rather consistent, and conform to suggestions that about half of children diagnosed with AD/HD seem to grow out of it, at least in the sense that they no longer meet the diagnostic criteria for this particular condition
], or that key AD/HD features may transform into other mental health problems
In clinical practice, instruments with a high specificity result in under-referrals, but in epidemiological studies the aim is generally to screen out all the “normals.” This study aimed to establish the psychometric properties of A-TAC in a study group of children drawn from the general population. It was considered necessary to enrich the study group for NDPs in order to have sufficient power for these relatively rare conditions. The use of different cut-off points depends on the intended use of the scale. For two-stage investigations in which it is important that cases are recognized during the screening phase, because all positive cases will subsequently undergo clinical assessment, it is necessary to reduce the false negatives to a bare minimum. It is therefore preferable to use scales and cut-offs with a very high sensitivity, even though this usually compromises specificity. Nevertheless, the low cut-off values in A-TAC did not compromise specificity overall, except in the case of LDs (sensitivity 78% and specificity 64%). Thus, the earlier established low cut-offs worked well in this general population group to identify children who would be included with good cost-effect balance in clinical assessments.
In this population, negative predictive values were consistently high (≥ 89%), thus assuring the user that almost all children who screened negative did not meet diagnostic criteria for an NDP. A high rate of false positives is not uncommon in behavioural screening, which often yields low positive predictive values
]. For this reason, the high cut-offs have been identified to serve as proxies for clinical diagnoses in epidemiological studies.
Using sensitivity and specificity alone as measures for an instrument’s efficiency can often be misleading. Sensitivity is only part of the discriminatory evidence, as high sensitivity may be accompanied by low specificity. Additionally, no simple aggregation rule exists to combine sensitivity and specificity into one measure of performance. For this, a single indicator of an instrument’s performance such as the DOR is required. The DOR is reasonably constant for a large range of cut-off scores on the ROC curve (see Table
), but for the extremes of sensitivity and specificity the DOR rises steeply. If the original results in both NDPs and non-NDPs had followed a logistic distribution with equal standard deviation, the DOR would have been constant for all possible cut-off values. The DOR is thus a good measurement in meta-analyses of diagnostic studies that aim to combine results from different studies into summary estimates with increased precision
Although the sensitivity and specificity of screening tools are affected by the prevalence of the disorders, they can also be influenced by differences in the characteristics of various disorders, such as clinical severity, and the characteristics of subjects, such as age and sex. For example, among girls, the lower rates of disruptive behaviour problems, along with a preponderance of inattentive symptoms relative to impulsive symptoms, may partially explain why NDPs often go unrecognized in girls
Screening and diagnostic/identification tools that detect neurodevelopmental behaviour problems are good aids for clinicians, since they provide a structured and systematic assessment procedure that increases diagnostic reliability. There is, however, always a risk that a specific instrument is chosen because of its predominant standing in the field or in the literature, and not because it has the most accurate validation or otherwise is most suitable to the purpose.
All screening instruments should always be interpreted with caution. Omnibus assessment tools warrant critical attention, especially since they are important in research—not least in epidemiologic studies—because they provide prevalence figures in a population, make it possible to discern trends, and provide proxies for clinical diagnoses in scientific studies.
Clinicians and researchers often turn to a “broadband” assessment scale to ensure a comprehensive assessment of the presenting problem and to assist in the identification of co-morbid difficulties. There may be certain advantages in the extensive use of a particular scale, such as the ability to compare studies and the widespread familiarity with the scale among researchers and clinicians, but every scale has limitations that may remain unchallenged while worthy alternatives may be overlooked. Because most diagnostic measures for NDPs are designed specifically for categorical features, not broader phenotypes, there is also a need for instruments like the A-TAC that can provide more continuous measures of various aspects of NDPs.
The coexistence of diagnoses encountered in the field of NDPs points to the fact that developmental problem areas are not pure. In the general population, a dimensional/continuous distribution across diagnostic NDP categories has been reported
The concept of ESSENCE was coined to account for this interrelatedness and coexistence of NDPs across diagnostic boundaries
]. The disorders within the ESSENCE model are today diagnosed as separate categories, but they almost always overlap with each other, and can all be considered “neurodevelopmental” or “neuropsychiatric.” It is therefore vital that all early symptomatic syndromes eliciting neurodevelopmental clinical examinations are taken into account when looking for etiological/pathogenetic links, developmental trajectories, risk factors for negative outcomes, or interventions and treatments
There is an essential need for broader NDPs screening instruments, but many screening tools are aimed mainly at strictly defined cases of childhood-onset disorders, and so are often likely to miss overlapping and associated disorders. The present study shows that the A-TAC instrument is useful as a broadband, first-level screening instrument in a population-based study group. Broadband screening tools for NDPs should generally be administered before narrowband screening instruments to ensure that common conditions, such as language impairment or learning disabilities, are detected.
Because study questions on diagnostic accuracy generally evaluate the association between inventory scores and health status, a cross-sectional design is a natural basic design option. However, this basic design has various modifications, each with specific pros and cons in terms of scientific requirements, burden for the study subjects, and efficient use of resources. In this case, a major factor that affects the instrument’s performance in relation to clinical diagnoses of NDPs is the time between the behaviour sampled and the clinical examination. There was a three-year delay between the parental assessment on the A-TAC and the clinical follow-up. Asking parents to rate current behaviour when symptoms of NDPs may be at their most prototypical, and then clinically examining the children three years later could have contributed to the difficulties in differentiating between NDPs (apart from ASD, which is one of the most constant NDPs), at least between the ages of 9 or 12 and 15. Even if some of the major child psychiatric problem constellations are established by age 12, the complex psychosocial problems associated with puberty that emerge around the time of the clinical examination may interfere with interpreting the results. Moreover, the A-TAC showed comparatively low DOR/AUC for disorders other than ASDs (especially AD/HD). This may be attributed principally to the time lag between the screening and the clinical assessment and perhaps also to a “twin sample bias” suggested to be inherent in using a screen-negative group that largely consists of genetically at-risk siblings. Given that NDPs are under complex and multivariate genetic influences and tend to follow a waxing and waning course, a longitudinal twin sample may compromise probabilistic measures, including NPVs and PPVs, since discordant co-twins will be more likely than other pairs to oscillate above or under a cut-off. Despite this reasoning, however, the notion of a twin sample bias is dubious since numerous studies have reported that twins differ only marginally from singletons
]; even if same-sex twins may not be representative of the general population, is it unlikely that this circumstance would have had any substantial effects on the results presented here.
Strengths of the study
The strengths of this study lie in its investigation of the efficacy of the A-TAC in a population-based cohort of screen-positive children, their screen-negative siblings, and controls and in its rigorous assessments of neuropsychological outcomes. Psychiatric interviews were carried out for all children in the study using the K-SADS-PL schedule, and consensus diagnoses were made by specially trained psychologists and an experienced child psychiatrist.