|Home | About | Journals | Submit | Contact Us | Français|
Asthma in children and adolescents is a heterogeneous syndrome comprised of multiple subgroups with variable disease expression and response to environmental exposures. The goal of this study was to define homogeneous phenotypic clusters within a cohort of children and adolescents with asthma and to determine overall and within-cluster associations between environmental tobacco smoke (ETS) exposure and asthma characteristics.
A combined hierarchical/k-means cluster analysis of principal component variables was used to define phenotypic clusters within a cohort of 6 to 20 year-old urban and largely minority subjects.
Among the 154 subjects, phenotypic cluster analysis defined three independent clusters (Cluster 1 [n=57]; Cluster 2 [n=33]; Cluster 3 [n=58]). A small fourth cluster (n=6) was excluded. Patients in Cluster 1 were predominantly males with a relative abundance of neutrophils in their nasal washes. Patients in Cluster 2 were predominantly females with high body mass index percentiles and later-onset asthma. Patients in Cluster 3 had higher eosinophil counts in their nasal washes and lower Asthma Control Test™ (ACT) scores. Within-cluster regression analysis revealed several significant associations between ETS exposure and phenotypic characteristics that were not present in the overall cohort. ETS exposure was associated with a significant increase in nasal wash neutrophils (Beta Coefficient = 0.73 [95%CI: 0.11 to 1.35]; P=0.023) and a significant decrease in ACT score (−5.17 [−8.42 to −1.93]; P=0.003) within Cluster 1 and a significant reduction in the bronchodilator-induced % change in FEV1 (−36.32 [−62.18 to −10.46]; P=0.009) within Cluster 3.
Clustering techniques defined more homogeneous subgroups allowing for the detection of otherwise undetectable associations between environmental tobacco smoke exposure and asthma characteristics.
Asthma in children and adolescents is a heterogeneous syndrome comprised of multiple phenotypic subgroups. These subgroups show variable disease expression and variable responses to environmental exposures (1, 2). Despite the potential benefits of more specific phenotypic classification and treatment within such subgroups, they remain poorly characterized (3).
Cluster analysis is a bioinformatic technique that utilizes mathematical algorithms to group individuals within a population according to similarity of specified variables (4). It is an established method frequently used in gene expression analysis (5, 6). However, investigators only recently have begun to utilize this approach to delineate clinically relevant subgroups based on phenotypic data.
While initial reports have demonstrated the utility of cluster analysis to identify subgroups within disease populations with sepsis, asthma, and autism (7–10), we believe that the potential uses of this tool remain largely unexplored. Specifically, we hypothesized that delineation of phenotypic clusters within a pediatric asthma cohort would enable determination of otherwise undetectable associations between environmental tobacco smoke (ETS) exposure and asthma characteristics.
The first 154 children in the Asthma Severity Modifying Polymorphisms (AsthMaP) Project were studied. AsthMaP is an ongoing, cross-sectional study of urban children and adolescents designed to find associations among environmental exposures, allergic sensitivities, genetics, and asthma. It consists of a convenience sample of otherwise healthy children aged 6 to 20 years, inclusive, recruited from the metropolitan Washington, DC area with physician-diagnosed asthma present for at least one year. Participants were recruited in the emergency department at Children’s National Medical Center. Each participant returned to the Clinical Research Center for one study visit at least four weeks after completion of their most recent oral steroid dose. Informed consent and assent were obtained from participants and/or their guardians as appropriate. The study was approved by our Institutional Review Board.
Multiple clinical characteristics of asthma were measured in every participant. These included, but were not limited to, the following: 1) Pre- and post- short-acting beta-agonist spirometry measurements performed with a MedGraphics CPSF/D™ USB PC-based system (Medical Graphics Corporation, St. Paul, MN) using techniques validated in children (11); 2) Serum IgE measured using chemiluminescence with an Immulite 2000 system (Siemens Healthcare Diagnostics, Deerfield, IL); 3) Nasal wash samples procured by instilling 3mL of isotonic sterile saline into each nare, holding it for 10 seconds, and then blowing into a specimen collection container; 4) Eosinophil and neutrophil fractions counted manually in slides prepared from nasal washes stained with Wright’s stain; 5) Aeroallergen skin prick testing using the MultiTest II device (Lincoln Diagnostics, Decatur, IL); 6) Parental interviews incorporating the Integrated Therapeutics Group’s (ITG) Child Asthma Short Form (12, 13), National Institutes of Health, National Asthma Education and Prevention Program (NAEPP) 2007 criteria (14), Asthma Control Test™ (ACT) (QualityMetric Incorporated, Lincoln, RI), and additional asthma severity assessment questions; and 7) Quantitative urine cotinine measured in a fresh urine sample collected within approximately a 4 hour period during the midday (Quest Diagnostics, Chantilly, VA).
Nasal washes were used in lieu of bronchoalveolar lavage (BAL) as a minimally-invasive means to measure mucosal eosinophil and neutrophil expression in the cohort. Previous studies in cystic fibrosis and respiratory syncytial virus (RSV) have shown that inflammatory cell proportions in nasal samples accurately reflect those of lower airway collections (15, 16).
All collected variables were standardized in binary fashion for categorical variables or using a z score for continuous variables. When appropriate, continuous variables were log10 transformed to approximate a normal distribution. Variables were then selected based on a published factor structure for asthma characterization (8, 17, 18). As in Haldar, et al. (8), variables were chosen if they are measured in the clinical evaluation of asthma and describe asthma phenotypes. Additionally, selection of multiple variables representing the same aspect of asthma was avoided. Principal components analysis was then used on the eleven selected variables to identify key clinical components relevant to asthma diagnosis and assessment.
Principal component factors were identified using varimax rotation of the variables selected according to the above criteria. Cluster analysis was performed in two stages using variables representative of each principal component. In the first stage, hierarchical clustering of the variables using between-groups linkage yielded the probable number of clusters present in AsthMaP. A k-means cluster analysis was then performed using this estimated number of clusters. This stage was repeated while specifying one more or less cluster than the estimate to ensure that the most representative model was obtained. Additionally, the k-means cluster analysis was repeated several times within random AsthMaP subpopulations to ensure reproducibility.
Differences among the three clusters were derived using one-way analysis of variance for normally distributed continuous variables, Kruskal-Wallis for non-parametric continuous variables, and chi-square tests for categorical variables. Depending on the type of variable, linear, logistic, or ordinal regression analysis was then used to identify associations between ETS exposure (as represented by urine cotinine) and asthma phenotypes overall and within each cluster. All beta and B coefficients and P values were corrected for age, gender, and body mass index (BMI) percentile. All statistical tests were performed with SPSS Statistics 17.0 (SPSS Inc., Chicago, IL).
The first 154 children and adolescents in AsthMaP were included in these analyses. Of those, 59% were male and the mean (SE) age was 11.5 (0.3) years. The mean (SE) BMI percentile for age was 72 (2) %. Of the 154 cases, 138 (90%) were self-identified African Americans (AAs), and 137 (89%) had persistent asthma as defined by NAEPP 2007 criteria (14).
Eleven key clinical variables relevant to asthma diagnosis and assessment in AsthMaP were selected for principal components analysis using the criteria described by Haldar, et al (8). Varimax rotation identified four principal components representing: symptoms/impairment, airway reactivity, mucosal evidence of allergy, and systemic evidence of allergy. (Table 1) One variable representative of each of these principal components was selected for cluster analysis: ACT, post-bronchodilator FEV1 (% predicted), eosinophils from nasal washes (%), and total serum IgE, respectively. These selected variables did not necessarily have the highest correlation value among the variables in their respective factor. Rather, they were selected because they are highly informative of the factor they represent. Three additional variables (i.e. gender, age of asthma onset, BMI percentile) known to be important factors in asthma phenotype were included.
Analysis of the first 154 AsthMaP cases resulted in a four-cluster best fit model with distinct asthma phenotypes. One cluster was comprised of only six individuals, all displaying an extremely mild asthma phenotype, and was therefore excluded from further analyses. Several characteristics differed between the three remaining clusters. Cluster 1 (n = 57) was predominately male (81% versus 30% and 53% in Clusters 2 and 3, respectively; P < 0.001) with an abundance of neutrophils in their nasal washes (49% [interquartile range (IQR): 8, 94]) relative to Clusters 2 and 3 (37% [1, 49] and 13% [2, 53], respectively; P = 0.031). Cluster 2 (n = 33) was predominantly female (70% versus 19% and 47% in Clusters 1 and 3, respectively; P < 0.001) with high mean BMI percentile (87% versus 51% and 86%, respectively; P < 0.001) and later-onset asthma (7.5 years versus 2.3 years and 1.0 years, respectively; P < 0.001). Cluster 3 (n = 58) exhibited an allergic asthma phenotype with an increase of eosinophils in their nasal washes (82% [IQR: 48, 95] vs. 9% [0, 59] and 56% [4, 92] in Clusters 1 and 2, respectively; P < 0.001), worse asthma control (mean ACT score = 18.2 versus 21.4 and 20.9, respectively; P < 0.001), and high mean BMI percentile (86% versus 51% and 87%, respectively; P < 0.001). Of note, we detected residual heterogeneity in this third cluster, as evidenced by the presence of several smaller sub-clusters.
Co-morbidities were identified for all participants and no differences were detected among the clusters. Additionally, the 16 non-AA children were evenly distributed among the clusters. A sensitivity analysis was performed and showed that race was not an important contributor to the regression analysis. Finally, current medication use (i.e. bronchodilator, inhaled steroid, long-acting leukotriene inhibitor) was not found to be different among the clusters.
Presently, NAEPP severity classifications are used routinely as a “clustering tool” in clinical care to classify children with asthma according to disease severity for the purposes of diagnosis and management (14). Although NAEPP severity classification was not significantly different among the three clusters, all three clusters were represented within each NAEPP severity classification. (Figure 1)
Regression analysis was used to explore associations, in the overall AsthMaP population and within each cluster, between ETS exposure measured quantitatively by urine cotinine and asthma characteristics.
Within-cluster analysis revealed several significant associations in Clusters 1 and 3 that were not present in the overall cohort. No significant associations were found within Cluster 2. Cluster 1, despite having the lowest ETS exposure of the three clusters, showed the largest number of significant associations between ETS exposure and asthma characteristics. A log10 increase in quantitative urine cotinine level was significantly associated with a log10 increase in neutrophils in nasal washes (Beta Coefficient = 0.73 [95% confidence interval: 0.11 to 1.35]; age, gender, and BMI percentile adjusted P = 0.023) and a significant decrease in ACT score (−5.17 [−8.42 to −1.93]; adjusted P = 0.003). In Cluster 3, a log10 increase in urine cotinine level was only associated with a significant reduction in the bronchodilator-induced % change in FEV1 (−36.32 [−62.18 to −10.46]; adjusted P = 0.009). (Table 3)
In this study of 154 urban, largely minority children and adolescents from Washington, DC with established asthma, we successfully employed cluster analysis to reduce the cohort’s heterogeneity. This permitted detection of associations between ETS exposure and asthma characteristics within specific clusters. Historically, asthma has proven difficult to characterize because of the complex nature of the disease (1, 2). Thus, developing tools capable of identifying more homogeneous subgroups is crucial to understanding the variable expression patterns of the disease and to tailoring therapies to subgroups of asthma (3, 19).
Recently, Halder, et al. (8) and Moore, et al. (9) used cluster analysis to identify subgroups within several independent asthma cohorts. However, our study is the first to extend this observation by demonstrating that associations not found in the overall population exist within these clusters. To accomplish this, we first used hierarchical and k-means clustering of principal component variables to identify more phenotypically homogeneous subgroups within the AsthMaP cohort. Of the resulting four clusters, one consisted of only six individuals and therefore lacked power to be useful in association analyses. Interestingly, these six individuals displayed very mild asthma, suggesting that they did not cluster with the other subjects due to lack of disease expression. The remaining three clusters showed distinct and familiar asthma phenotypes commonly seen in clinical practice.
Cluster 1 was predominantly male (81%) with a relative abundance of neutrophils in their nasal washes. Mucosal neutrophilia is a frequent finding in asthma that may or may not be observed in conjunction with eosinophilic inflammation (20). Wenzel, et al. (21) showed that neutrophil-predominant asthma is a distinct inflammatory subgroup of severe asthma, with increased neutrophil levels found in refractory patients. The exact cause of this phenotype is unknown but it has been thought to be exacerbated by environmental exposures such as bacterial endotoxins, air pollution, cigarette smoke, or viral infections (22).
Cluster 2 was notable for its female predominance (70%). In addition, subjects in this cluster had high mean BMI percentile and a mean age of asthma-onset of 7.5 years of age, much older than for the other two clusters. These characteristics are often seen in clinical practice, as there appears to be a biological link between asthma and obesity (23). Several studies have observed that this association is stronger in women (24–26). Specifically, a high BMI coupled with early onset of puberty have been reported as risk factors for first developing asthma during adolescence in females (24, 27).
Cluster 3 had a nearly balanced gender distribution and exhibited the more classical atopic/allergic asthma phenotype with an increase in eosinophils in their nasal washes. Eosinophil activation and subsequent inflammation in the lungs are established hallmarks of asthma pathology and are highly associated with increased symptoms and frequency of exacerbations, along with worse disease control (1, 3). This poor asthma control is evident in Cluster 3 as shown by relatively lower ACT scores and FEV1 measurements. The individuals in this cluster also had a high mean BMI percentile similar to what was observed in Cluster 2. The characteristics that make up Cluster 3 have been shown to be associated with obesity. Particularly, high BMI is associated with decreased symptom control and higher prevalence of atopy (25, 28). Additionally, hierarchical clustering revealed several smaller sub-clusters, suggesting that there is remaining heterogeneity in this cluster.
It is important to note that NAEPP severity levels among the clusters were not significantly different, given that this classification system is frequently used in the diagnosis and management of childhood asthma (14). Although improved in 2007 with regard to heterogeneity within classification levels due to age (14, 29, 30), our data show that it remains limited with regard to other sources of heterogeneity. Thus, we propose that the alternative clustering method described be evaluated as a complementary means of effectively grouping individuals based on phenotype.
Intrinsic to the concept of heterogeneity in asthma is the idea that response to environmental stimuli will differ among asthma subgroups (3). Therefore, we used regression analysis in the overall AsthMaP population and within the identified clusters to explore associations between ETS exposure (i.e. urine cotinine) and asthma characteristics. Urine cotinine levels were not significantly different among the three clusters. However, within-cluster analysis revealed several significant associations that were not found in the overall cohort, supporting our hypothesis that analyzing more homogeneous subgroups would prove useful in identifying new associations.
In particular, we found that within Cluster 1 an increase in ETS exposure was associated with a significant increase in neutrophils in nasal washes and a significant decrease in ACT score. Because it has been shown that asthma symptoms are exacerbated by ETS exposure (31, 32), it is reasonable that there is a significant association between ETS exposure and decreased asthma control in this cluster.
Within Cluster 3, we found that an increase in ETS exposure was associated with a significant reduction in the bronchodilator-induced % change in FEV1. This is a potentially novel finding for this asthma phenotype given that ETS exposure typically increases bronchodilator response in children with asthma (33). It is possible that the eosinophilic inflammation displayed in this cluster, and the subsequent chronic inflammation, makes these individuals less responsive to bronchodilators.
It is notable that no significant associations with ETS exposure were detected within Cluster 2. We suspect that this is due in part to the relatively small sample size. Alternatively, it is possible that the phenotypes in this cluster are not as responsive to ETS exposure. While it has been previously shown that ETS exposure increases the incidence of asthma in overweight individuals (34), there is no evidence that it leads to greater disease severity.
This study has important limitations. First, our sample size of only 154 participants restricted the number of clusters we were able to detect in our cohort. As mentioned regarding Cluster 3, there are undoubtedly more subgroups of asthma that could be identified in each cluster given a larger sample size. However, the goal of this study was not delineation of clusters but rather defining associations between ETS exposure and asthma characteristics within clusters. Second, the selection of variables is subjective. We aimed to select a wide range of variables representative of disease expression. However, we recognize the likelihood that other variables not included could also have an impact on this analysis. Third, because of the cross-sectional nature of the AsthMaP study, this analysis does not address cluster-stability over time. Given the dynamic nature of asthma disease expression, it is possible that individuals move among clusters with time. Fourth, the AsthMaP cohort is comprised of largely urban AA youth, making it more difficult to extend our findings into other childhood asthma populations. However, our study provides insight into AA children and adolescents with asthma as one of the highest-risk asthma populations. Finally, using k-means cluster analysis as the principal clustering tool required us to pre-specify the number of expected clusters. We took action to eliminate bias, including (1) using hierarchical clustering as a first step to estimate the number of probable clusters and (2) repeating the k-means cluster analysis while specifying one more or less cluster than the estimate to ensure the selection of the most representative model.
This study extends the usefulness of cluster analysis to classify asthma subgroups. Identifying established asthma phenotypes within the AsthMaP cohort lends credibility to these cluster analysis techniques. Furthermore, exploring within-cluster associations proved useful in identifying otherwise undetectable relationships between ETS exposure and asthma characteristics. Together, these techniques provide a framework in which to better understand complex disease expression patterns.
Angela Benton carried out these studies while a Master’s student in the Genomics and Bioinformatics Program of the Columbian College of Arts and Sciences at the George Washington University.
Funding/Support: Funding support provided to RJF by grants K23RR020069, P20MD000198, and M01RR020359 from the National Institutes of Health, Bethesda, Maryland, a Sheldon C. Siegel Investigator Award Grant from the Asthma and Allergy Foundation of America, and by institutional grants from Children’s National Medical Center, Washington, DC.
Declaration of Interest: The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.