|Home | About | Journals | Submit | Contact Us | Français|
Heterogeneity in asthma expression is multidimensional, including variability in clinical, physiologic, and pathologic parameters. Classification requires consideration of these disparate domains in a unified model.
To explore the application of a multivariate mathematical technique, k-means cluster analysis, for identifying distinct phenotypic groups.
We performed k-means cluster analysis in three independent asthma populations. Clusters of a population managed in primary care (n = 184) with predominantly mild to moderate disease, were compared with a refractory asthma population managed in secondary care (n = 187). We then compared differences in asthma outcomes (exacerbation frequency and change in corticosteroid dose at 12 mo) between clusters in a third population of 68 subjects with predominantly refractory asthma, clustered at entry into a randomized trial comparing a strategy of minimizing eosinophilic inflammation (inflammation-guided strategy) with standard care.
Two clusters (early-onset atopic and obese, noneosinophilic) were common to both asthma populations. Two clusters characterized by marked discordance between symptom expression and eosinophilic airway inflammation (early-onset symptom predominant and late-onset inflammation predominant) were specific to refractory asthma. Inflammation-guided management was superior for both discordant subgroups leading to a reduction in exacerbation frequency in the inflammation-predominant cluster (3.53 [SD, 1.18] vs. 0.38 [SD, 0.13] exacerbation/patient/yr, P = 0.002) and a dose reduction of inhaled corticosteroid in the symptom-predominant cluster (mean difference, 1,829 μg beclomethasone equivalent/d [95% confidence interval, 307–3,349 μg]; P = 0.02).
Cluster analysis offers a novel multidimensional approach for identifying asthma phenotypes that exhibit differences in clinical response to treatment algorithms.
Asthma impacts significantly on the rising burden of chronic disease in developed countries. Approximately 5 to 10% of sufferers have refractory asthma that remains poorly controlled despite maximal inhaled therapy (1). Effective clinical care is complicated by heterogeneity in the physiologic, pathologic, and molecular abnormalities associated with refractory asthma (2). Current descriptions of asthma phenotypes are limited by subjectivity and poor coherence. A robust system of classification that incorporates the multidimensionality of asthma is needed to identify subgroups with consistent patterns of disease (3, 4). This may provide a framework for identifying distinct phenotypes, with specific pathophysiologic abnormalities that predict response to particular therapies (5) and help to focus current genetic and molecular studies.
The taxonomy of organisms remains the paradigm for biological models of classification. It is based empirically on the principle that similarity measured across a number of different characteristics predicts relationships of biological significance with greater probability. Cluster analysis refers to a group of multivariate mathematical algorithms that broadly perform two distinct functions: (1) quantification of similarity between individuals within a population on the basis of the (multiple) specified variables; (2) grouping of individuals into clusters such that similarity between members of the same clusters is strong and using between different clusters is weak (6, 7). The principal advantage of performing classification numerically is objectivity and using methodology for including multiple variables that assume equal weighting helps minimize a priori bias. Numerical taxonomy or taximetrics is the branch of taxonomy that has developed to use mathematical algorithms such as cluster analysis for this purpose (8), and the principle has been extended for use in other areas of biomedical science, notably bioinformatics and psychiatry (9). In the latter, cluster analysis techniques have been used to identify patterns of symptom expression that have been used to define diagnostic categories (9).
We postulated that cluster analysis could be applied for classifying clinical phenotypes of asthma. We examined this hypothesis using the k-means clustering algorithm to classify two distinct asthma populations: a group recruited from primary care with asthma of predominantly mild to moderate severity and a group from secondary care who met prespecified criteria for refractory asthma (10). The clinical relevance of these clusters was evaluated further by investigating differences in asthma outcomes between clusters identified in a separate cohort of patients with predominantly refractory asthma, who participated in a recently completed randomized study at our center comparing a management strategy aimed at titrating steroid therapy to maintain a normal sputum eosinophil count, with a conventional clinical protocol (11). Some of the results of this study have been previously reported in the form of an abstract (12).
We studied three discrete populations with asthma. All patients had a physician diagnosis of asthma and sufficient symptoms to warrant at least one prescription for asthma therapy in the previous 12 months. All patients were current nonsmokers and ex-smokers had a less than 10 pack-year smoking history. The two larger datasets comprised cross-sectional data for performing cluster analysis to identify the major disease patterns existing, respectively, within primary-care and refractory asthma populations. Our first dataset comprised baseline data from patients with asthma (n = 184) recruited from primary-care practices for two prospective clinical studies at our center: the GLAD (GPIAG [General Practitioners in Asthma Group] and Leicester Asthma and Dysfunctional Breathing) study (n = 70) (trial number ISRCTN 47153522) and the recently completed Intensive Asthma Study (n = 114) (13). The studies shared common subject selection criteria and recruitment techniques.
Our second dataset (n = 187) comprised data from patients with a diagnosis of refractory asthma, made in accordance with American Thoracic Society (ATS) criteria (10) by a respiratory physician with a specialist interest in this field. All the patients attended our specialist Glenfield Hospital refractory asthma clinic (Leicester, UK) for assessment and management of their asthma. The analysis was performed on consecutive patients attending the clinic between 2004 and 2006, with a full complement of data collected as part of their routine baseline assessment during their first visit to our center. The systematic recording and validation of data for some etiologic factors such as nasal polyps, aspirin sensitivity, and ethnicity are not routinely performed at our center. These data were therefore not available as part of the analysis. However, to be representative of the secondary-care asthma population, we chose to include all patients meeting ATS criteria for refractory asthma. Thus, patients in whom nonadherence with therapy was likely to have been a major determinant were not excluded. This is in contrast to our third population (described below) who were recruited to a clinical trial in which suspected or documented therapy nonadherence was an exclusion criterion of the study.
The third dataset comprised baseline and longitudinal data collected from a prospective clinical study (11). The study compared severe exacerbation frequency over 12 months in 74 patients with predominantly refractory asthma managed according to regular monitoring of airway inflammation using induced sputum (sputum arm), with the aim of titrating steroid therapy to maintain normal eosinophil counts, or standard clinical care (clinical arm). Sufficient baseline data were available in 68 of the 74 study participants to perform cluster analysis. Fifty-nine of the 68 patients (86.7%) met ATS criteria for refractory asthma.
Uniform cluster analysis methodology was applied to each population using a two-step approach. In the first step, hierarchical cluster analysis using Ward’s method generated a dendrogram for estimation of the number of likely clusters within the studied population. This estimate was prespecified in a k-means cluster analysis that was used as the principal clustering technique (14). Variables chosen for cluster modeling were selected on the basis of their considered contribution to characterizing the asthma phenotype. Variable selection and cluster analysis methodology are discussed further in the online supplement.
All measurements were standardized using z scores for continuous variables and 0 or 1 for categorical variables. Continuous variables were log transformed to approximate a normal distribution where this was indicated. Discriminant function analysis was performed using both forward and backward stepwise algorithms on each cluster model to evaluate the input variables that were significant determinants of model structure. This is discussed in greater detail in the online supplement.
The between-cluster comparison of baseline parameters that were not input parameters was performed using one-way analysis of variance (ANOVA) for parametric variables, the χ2 test for proportions, and Kruskal-Wallis for nonparametric variables. For the analysis of outcome data in the prospective study, our clustering algorithm was applied to the baseline study data, and outcomes were compared between study arms for each cluster using the independent t test. Univariate ANOVA with the cluster model as a covariate was performed to verify the significance of this as an independent factor for any observed differences in outcome (see the online supplement). The measured outcomes were prespecified and included the frequency of severe exacerbations, measured as the number of rescue courses of oral corticosteroid and the change in corticosteroid dose at 12 months. All statistical analyses were performed using SPSS version 14 (SPSS, Inc., Chicago, IL). In addition, STATA (Version 7.0; Stata Corp., College Station, TX) was used to perform repetitions of cluster models with the k-means algorithm for demonstrating repeatability.
Approval from the local research ethics committee was obtained for data analysis and publication following informed consent for the respective clinical studies and as part of a clinical database for patients attending the Glenfield Hospital Difficult Asthma Clinic.
Compared with our secondary-care, refractory asthma population, the primary-care population had milder disease with significantly fewer symptoms, less airway dysfunction, and lower levels of eosinophilic airway inflammation, while taking a significantly lower mean dose of inhaled corticosteroids (Table 1).
The cluster structure described for each population was reproducible when repeating the algorithm using STATA and within randomly selected subsets of each population (data not shown). Statistical validity for the results was supported by identifying similar clusters of refractory asthma within the independent study cohort of Green and colleagues (11).
A three-cluster model best fit the primary-care population dataset (Table 2; Figure 1). Cluster 1 described a subgroup with early-onset atopic asthma. This cluster had evidence of airway dysfunction, symptoms, and eosinophilic airway inflammation. Clinically, this cohort was associated with a significantly greater number of previous hospital attendances and asthma exacerbations requiring oral corticosteroids when compared with the other primary-care subgroups. Cluster 2 described an obese subgroup with a female preponderance, evidence of asthma symptoms, and an absence of eosinophilic airway inflammation. The third cluster was labeled benign asthma because cases within this subgroup had little evidence of active disease. Asthma symptoms, airway inflammation, and measures of airway dysfunction were frequently within normal limits, and 58% of this cohort did not have evidence of significant airway hyperresponsiveness at the time of assessment. Consistent with a milder disease profile, patients from this cluster had very low rates of hospital attendance for asthma and severe exacerbation frequency in the previous 12 months (Table 2).
We identified four clusters in the secondary-care, refractory asthma population (Table 3; Figure 1). Clusters 1 and 2 had a profile that closely resembled the respective clusters in primary care. Thus, early-onset atopic asthma and obese, noneosinophilic asthma were common to asthma populations across the spectrum of severity. The principal distinction between the clusters in each population was the difference in absolute values of different objective measures of disease severity. In comparison with primary care, early-onset atopic asthma in secondary care exhibited greater airway dysfunction, symptoms, and eosinophilic airway inflammation on a higher dose of corticosteroid therapy. However, the pattern of expression of these variables, demographic data, and measures of asthma control were consistent between clusters of the two populations. The sub-population of this phenotype with refractory asthma had a significantly higher rate of failed attendance of appointments in the 12 months after referral to the clinic compared with the other phenotypes of refractory asthma (Table 3).
Clusters 3 and 4 were specific to the refractory asthma population and both exhibited marked dissociation between eosinophilic inflammation and asthma symptoms. Cluster 3 described an early-onset, symptom-predominant group with minimal eosinophilic disease. Cluster 4 described an eosinophilic inflammation–predominant group with few symptoms, late-onset disease, and a greater proportion of males.
Discriminant function modeling identified the majority of input parameters used in the cluster analysis of both populations to be significant determinants of cluster membership (Table E1 of the online supplement). The discriminant function model of primary-care and refractory asthma clusters required seven of eight input parameters (excluding atopic status) and five of seven parameters (excluding atopic status and sex), respectively. The accuracy of the discriminant function models for predicting cluster membership was 94.6% (primary care) and 96.8% (refractory asthma).
Cluster analysis was performed from baseline data in 68 patients of the prospective study dataset. Three clusters were identified (Table E2); all were comparable with clusters observed in the larger refractory asthma population. The original study demonstrated a significant reduction in severe exacerbation frequency in the sputum arm, with no significant difference in corticosteroid usage between the groups. The present cluster-specific analysis revealed that all of the benefit for preventing exacerbations occurred in the inflammation-predominant cohort (3.53 [SD, 1.18] vs. 0.38 [SD, 0.13] exacerbation/patient/yr, P = 0.002) (Table 4). In addition, sputum-guided therapy allowed successful downtitration of corticosteroid therapy in early symptom-predominant asthma (Table 4; mean difference, 1,829 μg beclomethasone equivalent/d [95% confidence interval, 307–349 μg]; P = 0.02), without compromising asthma control. A univariate ANOVA with the cluster model as a covariate identified both treatment grouping and the cluster model as significant determinants for observed differences in exacerbation frequency (P = 0.002, study groups; P = 0.03, cluster model), but only the cluster model was a significant determinant for differences in inhaled corticosteroid dose (P = 0.07 for treatment groups and P = 0.005 for cluster model).
The need for classifying asthma heterogeneity has gained urgency with the parallel development of better tools for measuring disease characteristics that highlight disparity in clinical, physiologic, and pathologic markers, together with novel and specific molecular therapies that are only likely to be efficacious in particular subgroups of asthma. This study is the first to apply principles of cluster analysis for the identification of clinical asthma phenotypes. We have further shown that phenotypes constructed in this way exhibit clinically relevant differences in outcome, with management strategies that use a measure of eosinophilic inflammation for titrating corticosteroid therapy.
Asthma classification is complicated by the multidimensional nature of the disease. This prompted our consideration of cluster analysis techniques for this purpose. We selected the k-means clustering algorithm as it maximizes separation between clusters, thereby offering the greatest scope for identifying distinct groups within the population. Both familiar and previously uncharacterized asthma subgroups were identified that are more representative of multidimensionality. The identification of early-onset atopic asthma, an established asthma phenotype, validates the method for identifying the other subgroups against an accepted reference (15). Discriminant function analysis demonstrated the majority of the clustering parameters to be significant for cluster modeling, supporting multidimensionality. Atopic status was not identified as a significant discriminator influencing cluster membership in either primary care or secondary care. However, the prevalence of atopy did differ significantly between clusters and its inclusion to describe the phenotypes is therefore appropriate.
We chose to consider the two asthma population datasets independently when performing cluster analysis. This enabled clearer identification of factors that are specifically associated with refractory asthma, a condition that is sufficiently disparate to be considered a distinct disease entity by several authors (16).
The early-onset atopic asthma phenotype was common to both asthma populations, differing only in the severity of disease expression. We identified significantly higher rates of nonattendance for clinic appointments in the refractory subgroup, which has been associated with poorer therapeutic compliance (17). Our finding of uncontrolled eosinophilic airway inflammation was in keeping with this. Our failure to identify the same phenotype in the recruited prospective study cohort may be because poor compliance was an exclusion criterion for the study. Although equivalent measures of compliance were not obtained in our primary-care population, it may be an important factor distinguishing this phenotype between the two populations. Strategies for improving compliance may therefore have a greater role in the management of this subgroup of refractory asthma. The obese, noneosinophilic phenotype common to both populations was characterized by symptoms that were not associated with eosinophilic airway inflammation. Given the recognized association between eosinophilic airway inflammation and steroid responsiveness in airway disease (18), the reported steroid resistance of asthma in obese patients (19) may in part be explained by the general pattern of airway inflammation seen with this phenotype.
The traditional paradigm of a direct relationship between eosinophilic inflammation and symptoms underpins present therapeutic guidelines that recommend symptom-led titration of corticosteroid therapy (20). Our analysis suggests that a symptom-led approach would be effective for mild to moderate asthma in primary care for patients with early-onset atopic asthma and benign asthma, where concordance was observed between inflammation and symptoms. However, discordance between these domains is a prevalent characteristic of refractory asthma and is also a feature of the obese, symptom-predominant, noneosinophilic phenotype seen in primary care (Figure 1). This may be a significant factor predisposing to failure with a conventional protocol and supports a role for measuring eosinophilic airway inflammation in these subgroups. For symptom-predominant phenotypes, the etiology of symptoms is multifactorial and not closely related to underlying eosinophilic airway inflammation. Overtreatment with corticosteroids may therefore occur. In keeping with this, a recent study using exhaled nitric oxide (FeNO) as a measure of eosinophilic airway inflammation in asthma showed that FeNO-guided management resulted in lower inhaled corticosteroid use without compromising asthma control (21). In contrast, the inflammation-predominant phenotype will be undertreated, leading to uncontrolled eosinophilic inflammation that is associated with a greater risk of future severe asthma exacerbations (22). Our hypothesis is supported by the results of the longitudinal cluster-specific analysis that demonstrated a 10-fold reduction in exacerbation frequency for this phenotype with a management strategy that measures eosinophilic airway inflammation to titrate therapy.
This study has several limitations. Principal among these is the cluster analysis methodology. Although we have used the k-means clustering algorithm, it is well recognized that populations of both disease and health have a continuous spectrum of expression. The use of an algorithm that separates the population into discrete clusters may not be realistic. Alternative clustering techniques that use a probabilistic approach for cluster structure and membership within a dataset may provide additional information and should be explored (23). Nevertheless, our analysis supports the hypothesis that subgroups of clinical relevance exist within asthma populations and can be revealed using cluster analysis. Despite our efforts to be objective, there were several areas of subjectivity, including our selection of variables for clustering and our decisions on the number of clusters for each population. Although our choice of clustering parameters was broad, we cannot exclude the possibility that other variables may be of greater significance in developing meaningful phenotypes. In addition, the possible association between specific cluster profiles and well-recognized etiologic factors such as nasal polyps and aspirin sensitivity could not be explored. An advantage of multivariate techniques is that no single variable should be critical for determining the model. One of the drawbacks of using a nonhierarchical clustering technique is the need to prespecify the number of expected clusters. There are no well-validated techniques for predicting the number of clusters within a given population. We estimated this from dendrogram plots obtained using the hierarchical Ward’s method. The study also does not address the question of stability in cluster membership over time and with changes in treatment. Within each population, there was no significant difference in treatment regimens and doses between clusters. Thus, differences observed between clusters may be considered a product of differences in the underlying disease profile together with differences in the response to therapy. These two factors are likely to be closely related. Although longitudinal change in cluster membership has not been explored, our analysis indicates cluster profiling at baseline is predictive of response to a management strategy prospectively for at least 12 months. It is also notable that four of the parameters we used for clustering (age of onset, sex, atopic status, and body mass index) are relatively invariant and not generally affected by time and therapy.
In summary, this study supports a role for the use of multivariate techniques in the classification of asthma populations. Clinically important prognostic differences identified between the phenotypes within this model may provide a reliable framework for exploratory molecular and genetic studies, presently undermined by population heterogeneity.
Although several models of asthma classification have been proposed, a system defining the phenotypes of clinical asthma that incorporate the different aspects of the disease has not been developed.
Cluster analysis may be used to classify patients with asthma into phenotypic groups that exhibit clinically relevant differences in outcome with a management strategy using a measure of eosinophilic inflammation for titrating corticosteroid therapy.
The authors thank Prof. M. Silverman for his comments and Prof. J. Thompson for his review and advice of the statistical methodology.
Conflict of Interest Statement: P.H. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript. I.D.P. received $2,000 for speaking at conferences organized by GlaxoSmithKline and $5,000 for speaking at conferences organized by AstraZeneca; he is in receipt of a $500,000 grant for a study of severe asthma from GlaxoSmithKline. D.E.S. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript. M.A.B. has received lecture fees and conference support from AstraZeneca and GlaxoSmithKline. M.T. has received speaker’s honoraria in the last 3 years for speaking at meetings sponsored by the following companies marketing respiratory products: AstraZeneca, Boehringer Ingelheim, GlaxoSmithKline, MSD, Schering-Plough, Teva; he has received honoraria for attending advisory panels with Altana, AstraZeneca, Boehringer Ingelheim, GlaxoSmithKline, MSD, Merck Respiratory, Schering-Plough, Teva; he has received sponsorships to attend international scientific meetings from GlaxoSmithKline, MSD, AstraZeneca; he has received funding for research projects from GlaxoSmithKline, MSD, AstraZeneca; he holds a research fellowship from Asthma UK. C.E.B. has received a total of $2.2 million in research funding over the last 3 years (or is pending) from AstraZeneca, Cambridge Antibody Technology, GlaxoSmithKline; he has received less than $10,000 per annum from consultancy fees from Cambridge Antibody Technology, AstraZeneca, GlaxoSmithKline, Roche, and Pfizer; he has participated as a speaker in scientific meetings or courses organized and financed by AstraZeneca, GlaxoSmithKline, Boehringer Ingelheim, MSD, and Pfizer. A.J.W. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript. R.H.G. does not have a financial relationship with a commercial entity that has an interest in the subject of this manuscript.