|Home | About | Journals | Submit | Contact Us | Français|
Objectives: Most children with sleep-disordered breathing (SDB) have mild to moderate forms, for which neurobehavioral complications are believed to be the most important adverse outcomes. To improve understanding of this morbidity, its long-term response to adenotonsillectomy, and its relationship to polysomnographic measures, we studied a series of children before and after clinically-indicated adenotonsillectomy or unrelated surgical care.
Methods: We recorded sleep and assessed behavioral, cognitive, and psychiatric morbidity in 105 children 5.0 to 12.9 years old: 78 were scheduled for clinically-indicated adenotonsillectomy, usually for suspected SDB, and 27 for unrelated surgical care. One year later, we repeated all assessments in 100 of these children.
Results: Adenotonsillectomy subjects, in comparison to controls, were more hyperactive on well-validated parent rating scales (p<0.001), inattentive on cognitive testing (p=0.003), sleepy on Multiple Sleep Latency Tests (p=0.002), and likely to have DSM-IV-defined Attention-Deficit/Hyperactivity Disorder as judged by a child psychiatrist (p=0.03). In contrast, one year later, the two groups showed no significant differences in the same measures. Adenotonsillectomy subjects had improved substantially (p≤0.01) in all measures and control subjects in none. However, polysomnographic assessment of baseline SDB and its subsequent amelioration did not clearly predict either baseline neurobehavioral morbidity or improvement in any area other than sleepiness.
Conclusions: Children scheduled for adenotonsillectomy often have mild to moderate SDB and significant neurobehavioral morbidity -- including hyperactivity, inattention, Attention-Deficit/Hyperactivity Disorder, and excessive daytime sleepiness -- all of which tend to improve by one year after surgery. However, the lack of better correspondence between SDB measures and neurobehavioral outcomes suggests the need for better measures or improved understanding of underlying causal mechanisms.
Children with severe obstructive sleep apnea are at risk for heart failure, hypertension, and failure to thrive, and few clinicians doubt that the sleep disorder should be treated.1 However, most children with sleep-disordered breathing (SDB) have more mild forms for which the main morbidities are believed to be behavioral disturbances and cognitive impairment.2-6 Unfortunately, the best way to identify children with a level of SDB that raises the risk for these outcomes has not been well studied. The benefit of treatment for mild SDB is also not well defined.
In practice, when SDB is suspected on clinical grounds and treatment by adenotonsillectomy (AT) is planned, less than 10% of children in North America undergo polysomnography to confirm the diagnosis or need for surgery.7 In contrast, the American Academy of Pediatrics recommends objective sleep testing before AT for SDB.8 This is because several studies have shown poor correlation between office-based impressions and sleep laboratory results.9 However, standard polysomnography may miss mild forms of SDB that in cross-sectional studies still show associations with neurobehavioral morbidity.2,5,6,10 Few prospective studies have investigated the extent to which morbidity in these cases may respond to treatment,11 none have examined long-term outcomes, and none have incorporated a full range of gold-standard assessments for mental health, behavior, cognition, and daytime sleepiness.
The most definitive demonstration of consequences in mild SDB would involve a double-blind, placebo-controlled, randomized treatment trial. This study design is not feasible at this time for several reasons. Families and clinicians cannot be blinded to AT and sham surgery is not an option. No published data have defined the time-course of neurobehavioral improvement after AT, and long-term randomization to a placebo arm would raise ethical concerns for children known to have SDB.
We therefore chose a prospective, non-randomized follow-up study design to examine long-term neurobehavioral outcomes and polysomnographic findings in a cohort of children already scheduled, for any clinical indication, to have AT. This sample offered several unique advantages. It allowed characterization of children undergoing one of the most common surgical procedures in childhood. Subjects were not recruited at a sleep disorders center, and for this reason were more likely to resemble the large majority of children who undergo AT for SDB. The sample permitted comparison of children with generally mild or moderate sleep apnea, likely to represent forms most often treated by otolaryngologists, to a separate group of children scheduled for unrelated surgical care. Finally, this sample provided a rare opportunity to study some subjects scheduled for AT despite having no polysomnographic evidence of SDB.
We first tested the hypothesis that children who undergo AT, in comparison to other surgical care, experience more neurobehavioral improvement one year after surgery. We then tested the hypothesis that the AT children who had polysomnographic evidence of sleep apnea would show such improvement while the two comparison groups would not. Comprehensive, well-validated assessments at enrollment and follow-up included nocturnal polysomnography, parental behavioral ratings, cognitive testing, Multiple Sleep Latency Tests, and evaluations by a child psychiatrist. Some interim results, mainly on polysomnographic methodology within this 4-year study, have been reported previously.12-16
Children between 5.0 and 12.9 years old who were scheduled for AT (n=78), for any indication, were identified at 8 otolaryngology practices believed to perform the large majority of ATs in Washtenaw County, Michigan (Figure 1). Children were excluded from this IRB-approved study if they 1) required a polysomnogram for clinical purposes according to their surgeons, 2) had a history of SDB treatment, or 3) had severe medical or neurological conditions that would preclude behavioral, psychiatric, or polysomnographic assessments. In children younger than 5 years, hyperactivity (a key symptom of SDB 8) would have been difficult to distinguish from developmentally-appropriate behavior,17 and cognitive testing would have required substantial modification. After age 12, AT becomes increasingly rare and SDB may be more closely associated with adult symptoms. Control subjects (n=27) within the same age range were recruited from other surgical clinics where they were seen for concerns unrelated to risk for SDB. Twelve control participants were scheduled for hernia repair, and the remainder for no surgery. Controls were excluded for the above criteria, or for a history of large tonsils, frequent throat infections, adenoidectomy, or tonsillectomy. For all subjects, one parent signed an informed consent and the child signed an informed assent.
Participants resembled non-participants closely in most respects (Figure 1). Among the AT participants, 71 (91%) were thought by their otolaryngologist to have nocturnal upper airway obstruction. Of the 105 subjects studied at baseline, 100 (95%) participated in the follow-up evaluations that took place at a mean of 13.0±1.4 months after the baseline assessments. The mean age at baseline was 8.4 ± 1.9 years, and 60 subjects (57%) were male. Additional baseline demographic data, for AT and control subjects, are provided in Table 1. At follow-up the mean age was 9.5±1.9 years, and 57 subjects were male.
Baseline assessments were performed between December, 1999 and December, 2002. On the evening of admission to the sleep laboratory each subject and parent was interviewed by a child psychiatrist. The child then underwent polysomnography, and efforts were made to approximate usual bedtimes and rise times. On the next day, neurobehavioral assessments included a Multiple Sleep Latency Test of daytime sleepiness, neuropsychological testing, and parental behavioral ratings. Children were given a $25 gift certificate to a toy store, and parents a check for $100, to compensate them for their efforts. The entire assessment was repeated one year later.
Baseline digital polysomnography, generally within one month prior to scheduled surgery, included four EEG channels (C3-A2, C4-A1, O1-A2, O2-A1 of the 10-20 international electrode placement system), 2 EOG channels (right and left outer canthi), chin and bilateral anterior tibialis EMG, and 2 EKG channels. Nasal and oral airflow were monitored with thermocouples, thoracic and abdominal excursion with piezoelectric strain gauges, oxygen saturation by finger oximetry (with viewable pulse waveform), end-tidal CO2 by pediatric nasal cannula, and esophageal pressure through use of a thin water-filled catheter that has negligible effects on sleep in children.18-20 When a volunteer research subject did not tolerate catheter insertion, or esophageal pressure monitoring was maintained for < 2 hours, the data were considered missing. This occurred in about 1/3 of the children.12 Nasal pressure was not used because of high failure rates in children,21,22 insufficient space at the nares after CO2 cannula and thermocouples were applied, and availability of esophageal pressure monitoring considered to be the gold-standard in assessment of respiratory effort.23
All records were scored by one experienced, registered sleep technologist, masked to all information on clinical, demographic, and surgical status. To avoid any effect of scoring drift over time, scoring took place in batches of 10 records that included pre- and post-operative studies from 5 subjects. Sleep stage scoring followed standard protocols.24 Obstructive apneas were scored when airflow was absent for at least 2 breath cycles. Hypopneas were scored when at least 2 breath cycles of diminished airflow, chest movement, or abdominal movement were followed by an arousal, an awakening, or a 4% or greater oxygen desaturation. Central apneas not associated with a sigh or movement were scored when they were ≥ 20 seconds long, or else ≥ 10 seconds long but associated with bradycardia or ≥ 4% oxygen desaturation.25 Respiratory effort-related arousals 23 were scored when esophageal pressures gradually became more negative by at least 5 cm of water over a period of at least 5 respiratory cycles; when the sequence terminated in an arousal;26 and when no simultaneous apnea or hypopnea could be scored. The respiratory disturbance index was defined as the number of obstructive, mixed, or central apneas; hypopneas; and respiratory event-related arousals per hour of sleep. Obstructive sleep apnea was considered present when the obstructive apnea index (number per hour of sleep) was 1 or more 27 (operationalized as ≥ 0.50 to ensure that subjects not so identified had no significant apnea). Although specific cut-points have not been widely-accepted, pediatric obstructive sleep apnea can be considered mild when the apnea index is 1 – 4, moderate when this index is 5 – 10, and severe when this index is > 10.28
Behavior: Parents completed two well-validated, similar but complementary behavioral rating instruments, the Conners' Parent Rating Scales-Revised (L) 29 and the Child Symptom Inventory-4: Parent Checklist.30 A behavioral hyperactivity index was constructed from the average of T-scores (mean of 50, standard deviation of 10) for inattention and hyperactivity generated by the two instruments. For each component instrument, significant problematic behavior is often identified by T-scores that are 1 to 2 standard deviations or more above the mean.
The Conners' instrument contains 80 items and is often used when comprehensive, DSM-IV-consistent data are required. Norms are based on a sample of more than 8000 male and female children and adolescents aged 3 to 17 years, with good geographic and ethnic diversity. The Attention-Deficit/Hyperactivity Disorder Index T-score was used to construct the behavioral hyperactivity index. The Child's Symptom Inventory-4 is a 108-item behavior rating instrument that screens for a variety of DSMIV-based childhood emotional and behavioral disorders in male and female children aged 5-12 years in kindergarten through 6th grade. The T-score for the Attention-Deficit/Hyperactivity Disorder subtest of the CSI-4 was used to construct the behavioral hyperactivity index.
Cognition: A cognitive attention index was derived from the average of standard scores (mean = 100, s.d. = 15) from each of two well-validated measures of attention. The first, the Integrated Visual and Auditory Continuous Performance Test, assesses sustained attention, or vigilance, and is administered on a personal computer.31,32 The child sees “1” or “2” on the screen and clicks a mouse button only when “1” appears. The main testing period consists of 500 trials, 1.5 seconds each, in which the visual or auditory stimuli are presented briefly in a pseudo-random pattern. The number of omissions, as reflected by the Full Scale Attention Quotient, was the first component of the cognitive attention index.
The Attention/Concentration Subscale from the Children's Memory Scale 33 generated the second component of the cognitive attention index. This subscale is made up of two subtests—Numbers and Sequences. Numbers measures the ability to repeat, in a forward or backward manner, random digit sequences of graduated length. Sequences measures the ability to mentally manipulate and sequence verbal information as quickly as possible: the examinee is asked to perform such tasks as saying the days of the week backward and counting by 4's. Standard scores from the CMS Numbers and Sequences subtests were converted into a standard score for the CMS Attention/Concentration Scale. This score was then averaged with the Full Scale Attention Quotient standard score to obtain an overall cognitive attention index.
Psychiatric Diagnosis: The diagnosis of Attention-Deficit/Hyperactivity Disorder based on DSM-IV criteria was determined by a board-certified child psychiatrist, masked to any sleep study results but not to surgical status. The psychiatrist administered the well-validated, computerized Diagnostic Interview Schedule for Children – Parent Interview, present-state version,34 and also interviewed the parent and child independently to verify results.
Sleepiness: The Multiple Sleep Latency Test 35 included 4 or 5 nap attempts at two-hour intervals. The fifth nap was performed when one preceding nap showed rapid eye movement (REM) sleep. For each nap, sleep onset was scored at the first epoch of stage 1 sleep. If no sleep occurred, the nap opportunity terminated at 20 minutes, and this value was used in calculation of a mean sleep latency. The Multiple Sleep Latency Test is the most well-validated and widely-used objective assessment of daytime sleepiness in adults and children. The Multiple Sleep Latency Test is not often administered to children younger than 7 years,36 but is sensitive to SDB-related sleepiness in children as young as 3 years.37
Data were entered into a database by a professional company that used double entry for verification. The primary outcome was the behavioral hyperactivity index and the primary explanatory variable was subject group (adenotonsillectomy vs. control). Secondary analyses divided the adenotonsillectomy group into those with and without obstructive sleep apnea on polysomnography, and examined group differences in additional outcome variables: apnea/hypopnea index, cognitive attention index, mean sleep latency, and diagnosis of Attention-Deficit/Hyperactivity Disorder.
A chi-square test or Fisher's Exact Test was used to test for group differences in frequency of Attention-Deficit/Hyperactivity Disorder; McNemar's Test was used to test for changes over time. For each of the 4 continuous outcome variables, a single repeated measures model was implemented using PROC MIXED in SAS®, version 8.02 (SAS Institute Inc., Cary, NC). Each analysis produced main effect tests for subject group and time (baseline vs. one-year follow-up) and a test for the group by time interaction. The models also allowed assessment of differences between groups at both time points and the change over time in each group. Group differences failed to show statistical significance for several potential covariates (such as gender, race, body mass index, stimulant use, or socioeconomic group, as indicated in Table 1). The groups differed in mean age by one year, but age did not predict changes in outcomes at follow-up. Histograms of residuals suggested that assumptions of normality were valid, except for the apnea/hypopnea index, which required a natural logarithmic transformation (log [x+1]). Finally, to explore whether additional, continuous sleep measures might predict neurobehavioral measures more effectively, with or without adjustment for age, each neurobehavioral measure was regressed in linear or logistic models on each polysomnographic measure individually.
Polysomnographic SDB measures are summarized in Table 2. The baseline obstructive apnea index ranged from 0.0 to 38.2, and the apnea/hypopnea index from 0.0 to 74.4. Forty (51%) of the 78 AT subjects, in comparison to only 1 (4%) of the 27 control subjects, had obstructive sleep apnea (X2=19.1, p<0.001). In most cases sleep apnea was in the mild to moderate range of severity.
At follow-up, obstructive sleep apnea was found in only 9 (12%) of 77 AT children and in 3 (13%) of 23 controls (Fisher's exact test, p=1.00). Average SDB measures also improved considerably, eliminating group differences at one year (Table 2). However, among 39 AT subjects with obstructive sleep apnea at baseline, 8 (21%) still had it at follow-up; in comparison, among 38 AT subjects without obstructive sleep apnea at baseline, only 1 (3%) had it at follow-up (X2=6.0, p=0.01).
Least-squares mean values from the four generalized linear mixed models of apnea/hypopnea index, behavioral hyperactivity index, cognitive attention index, and mean sleep latency are shown in Figure 2. Table 3 shows the significance levels for the relevant hypotheses tests derived from these models. The AT and control groups showed robust differences in the apnea/hypopnea index, behavioral hyperactivity index, cognitive attention index, and mean sleep latency at baseline. In contrast, none of these differences reached significance at one year (though the behavioral hyperactivity index showed a trend). Each outcome measure changed significantly with time for the AT subjects, whereas none changed significantly for the control subjects. The time by group interaction for mean sleep latency and apnea/hypopnea index shows that their changes over time differed significantly between the AT group and control groups, as illustrated by the non-parallel slopes in Figure 2.
The results for the three-group analyses are illustrated in Figure 3 and significance levels are detailed in Table 4. Again, the groups differed significantly in each outcome at baseline but in no outcome at follow-up. The apnea/hypopnea index and mean sleep latency improved significantly with time in the AT subjects with obstructive sleep apnea but not in AT subjects without obstructive sleep apnea. In contrast, both the behavioral hyperactivity index and the cognitive attention index improved with time as much or more among AT subjects without sleep apnea as they did among AT subjects with sleep apnea. Time by group interactions showed significant group differences in changes over time only for the apnea/hypopnea index and mean sleep latency.
Twenty-two (28%) of the AT subjects had Attention-Deficit/Hyperactivity Disorder at baseline, in comparison to only 2 (7%) of the controls (X2=4.9, p=0.03). Among the 22 AT subjects, 9 met criteria for the inattentive subtype of Attention-Deficit/Hyperactivity Disorder, 2 for the hyperactive subtype, and 11 for the combined subtype. Eleven (50%) of the 22 no longer qualified for the diagnosis one year later. To demonstrate the extent to which this change exceeded instability in the diagnosis over time in the opposite direction and without AT, the 22 subjects were combined with 21 controls who had no Attention-Deficit/Hyperactivity Disorder at baseline, among whom only 2 (10%) newly qualified for the diagnosis one year later (McNemar's test, p=0.01). The frequency of Attention-Deficit/Hyperactivity Disorder was not significantly different between all AT and all control subjects after surgery (21% vs. 9%, Fisher's Exact Test, p=0.23). Both at baseline and follow-up, the frequency of this diagnosis was nearly identical among those AT subjects with and without baseline sleep apnea (28% vs. 29% at baseline, and 23% vs. 18% at follow-up, p>0.6 for each).
At baseline, no behavioral, cognitive, or psychiatric outcome measure was associated with any Table 2 polysomnographic variable in regression models (all p>0.05). In contrast, increased sleepiness was associated with higher levels of each SDB measure except for the arousal index (Table 5). Model results for the 4 neurobehavioral variables were essentially no different after adjustment for age. No neurobehavioral morbidity showed a newly-significant association with a polysomnographic variable when the analyses were confined to the 78 AT subjects, or to the 40 AT subjects with obstructive sleep apnea (mean obstructive apnea index = 5.6±8.0, mean apnea/hypopnea index = 13.1±15.3).
Similarly, one-year change scores for behavioral, cognitive, and psychiatric outcomes showed no significant associations with changes in SDB measures. In contrast, improvement in the mean sleep latency was predicted by improvement in every SDB measure except for end-tidal CO2 (Table 6). Adjustment of each model for age did not change the findings, except that diminished apnea/hypopnea index was associated with improved attention (p=0.04). Restriction of the analyses to AT subjects did not make any association between neurobehavioral and sleep changes newly significant; restriction to AT subjects with baseline obstructive sleep apnea again revealed only a marginally-significant association between diminished apnea/hypopnea index and improved attention (p=0.04).
This prospective study of 105 children who had AT or unrelated surgical care shows that prominent baseline group differences in hyperactive behavior, attention deficit, sleepiness, and frequency of Attention-Deficit/Hyperactivity Disorder became difficult to identify one year after surgery. These improvements are remarkable because hyperactivity and inattention generally are expected to be chronic features in affected school-aged children.17 After AT, parental ratings for hyperactivity and cognitive scores for inattention declined by nearly 0.5 standard deviations; half of the children with Attention-Deficit/Hyperactivity Disorder no longer qualified for the diagnosis; and an objective measure of sleepiness improved. Surprisingly, common laboratory measures of SDB severity did not show associations with baseline neurobehavioral morbidity other than sleepiness, and one-year changes in laboratory measures generally did not predict neurobehavioral outcomes except for reduced sleepiness. Findings from this relatively comprehensive, long-term study of neurobehavioral outcomes after treatment of mild to moderate childhood SDB have several important implications for our understanding of pediatric SDB and for clinical practice.
Clear improvement in our subjects after AT provides new suggestive evidence for a cause-and-effect relationship between SDB – at least as identified in the office by otolaryngologists -- and several adverse behavioral, cognitive, and mental health outcomes. However, our non-randomized study design cannot prove cause and effect. Moreover, the poor correspondence between SDB measures and neurobehavioral outcomes, at baseline and follow-up, seems to run directly counter to expectations if SDB causes these morbidities. The one exception, for daytime sleepiness, is surprising because most pediatric sleep specialists have considered hyperactivity and inattention to be more prominent than overt sleepiness in childhood SDB.38 However, the extent of improvement in sleepiness, by only one minute on average in the Multiple Sleep Latency Test, may have limited clinical significance.
The lack of significant associations between SDB measures and either neurobehavioral morbidity or treatment outcomes could simply reflect inadequate sample size. However, to our knowledge this prospective series of children studied with detailed sleep and behavioral measures represents the largest to date. The sample size proved more than sufficient to identify statistically-robust post-operative changes in both explanatory and outcome variables. Therefore, we believe that lack of better correspondence between these variables may reflect limitations of standard SDB measures in assessment of the highly-prevalent, mild SDB that is commonly treated by otolaryngologists.
Support for this suspicion also derives from a growing number of other investigations. At least 3 cross-sectional studies found that hyperactive behavior or cognitive deficits correlated well with SDB symptoms such as snoring, but not polysomnographic measures of SDB severity.5,39,40 Esophageal pressure monitoring, to assess the excessive respiratory effort believed to disturb sleep in SDB,23,41 was not monitored in these studies and may have provided unique information on subtle SDB in children.42 However, successful use of this method in most of our subjects to refine a respiratory disturbance index did not improve the outcome-based effectiveness of diagnostic polysomnography. This was probably because the esophageal pressure monitoring did not prove to identify many discrete events beyond those already captured by sensitive, two-breath criteria now commonly used for pediatric hypopneas.
Several limitations to our study and its conclusions merit discussion. This study did not test the overall utility of polysomnography, as non-neurobehavioral outcomes were not studied, and neither were several other common reasons for pre-operative testing, such as assessment of operative risk.43 Families who refused to participate clearly outnumbered participants, as in most clinical research. Although data available to compare the two groups were largely reassuring, an influence of referral bias on baseline findings in particular cannot be excluded. The sleep and cognitive testing we used are considered objective, but parents and psychiatrists who assessed the children could not be masked to surgical status. Recruitment of control subjects for this study from non-otolaryngology clinics provided a group comparable in terms of exposure to the medical system, but not levels of baseline hyperactivity. Regression to the mean potentially could explain some of the neurobehavioral improvement in the AT group. However, recruitment and observation of children with neurobehavioral problems for one year without treatment was not a realistic option. Moreover, the excess neurobehavioral problems identified in our AT subjects did not arise from specific efforts to recruit for these traits. Data that compared participants to non-participants at baseline suggested only a limited difference in the frequency of parental concern for behavioral problems.
In conclusion, our data on subjects identified within otolarygnologists' practices help to characterize the cognitive and behavioral burden carried by many of their patients and relieved one year after AT. The findings suggest that SDB, though usually in the mild-to-moderate range, nonetheless carries risk for substantial, reversible neurobehavioral morbidity. The polysomnographic data, along with previous reports, increasingly suggest that children with “primary snoring” – in the absence of frequent apneic events, arousals, or gas exchange abnormalities – may still be at risk for significant neurobehavioral consequences. Published guidelines that recommend objective testing before AT, to distinguish SDB from primary snoring,8 may deserve reevaluation as new outcome data emerge on children with negative polysomnograms.10
Finally, the lack of better outcome-based performance of standard polysomnographic measures in mild pediatric SDB is a particular clinical concern: these are the children, rather than those with severe SDB, for whom effective objective measures could have the most impact. Our data must raise the possibility that some correlate of SDB, rather than SDB itself, causes the morbidity we studied.44 However, we also speculate that new SDB measures could be developed with better ability to predict neurobehavioral outcomes. Approaches with potential promise, for example, could involve characterization of the cyclic alternating pattern in children,45 respiratory and non-respiratory arousals,46 or subtle electroencephalographic changes that occur on a breath-to-breath basis during non-apneic sleep.16
The authors thank the children and parents who participated in this research for their time, interest, and altruism; Judith L. Wiebelhaus, RPSGT, REEGT, for expert technical assistance; Morton B. Brown, Ph.D., and Deanna J. Marriott, Ph.D., for assistance and insight with regard to study design and biostatistics; and the following physicians and surgeons for assistance with identification of subjects or execution of the protocol: Ronald S. Bogdasarian, M.D., Donna Champine, M.D., Susan L. Garetz, M.D., Laurence Ho, M.D., Paul T. Hoff, M.D., Charles Koopman, M.D., Marci M. Lesperance, M.D., M.D., and Thomas A. Weimert, M.D.
Support: NIH grants HD38461, HL80941, NS02009, and RR00042.