|Home | About | Journals | Submit | Contact Us | Français|
Rationale: The Third National Health and Nutrition Examination Survey (NHANES III) reference is currently recommended for interpreting spirometry results, but it is limited by the lack of subjects younger than 8 years and does not continuously model spirometry across all ages.
Objectives: By collating pediatric data from other large-population surveys, we have investigated ways of developing reference ranges that more accurately describe the relationship between spirometric lung function and height and age within the pediatric age range, and allow a seamless transition to adulthood.
Methods: Data were obtained from four surveys and included 3,598 subjects aged 4–80 years. The original analyses were sex specific and limited to non-Hispanic white subjects. An extension of the LMS (lambda, mu, sigma) method, widely used to construct growth reference charts, was applied.
Measurements and Main Results: The extended models have four important advantages over the original NHANES III analysis as follows: (1) they extend the reference data down to 4 years of age, (2) they incorporate the relationship between height and age in a way that is biologically plausible, (3) they provide smoothly changing curves to describe the transition between childhood and adulthood, and (4) they highlight the fact that the range of normal values is highly dependent on age.
Conclusions: The modeling technique provides an elegant solution to a complex and longstanding problem. Furthermore, it provides a biologically plausible and statistically robust means of developing continuous reference ranges from early childhood to old age. These dynamic models provide a platform from which future studies can be developed to continue to improve the accuracy of reference data for pulmonary function tests.
Accurate interpretation of lung function tests relies on reference ranges which distinguish the effects of disease from growth and development. Limited data from young children limit the accuracy with which early lung disease may be identified.
These extended models provide more accurate reference ranges for spirometry with transition into adulthood and also incorporate age-related differences in between-subject variability, improving the definition of lower limits of normal.
Reference data are important for interpreting pulmonary function test results and can aid in the management of respiratory diseases (1). As measurement techniques, equipment, and population characteristics evolve, there is a concurrent need to keep reference data up-to-date to reflect these changes. Currently in the United States, the American Thoracic Society (ATS) guidelines recommend the use of the Third National Health and Nutrition Examination Survey (NHANES III) reference as the standard for interpreting spirometry results in the U.S. population (2, 3). The NHANES III dataset is one of the few references spanning childhood and adulthood that is also nationally representative and generalizable. However, the NHANES III reference is limited by a lack of subjects younger than 8 years, which often results in reference data being extrapolated to younger ages. Subbarao and coworkers (4) have demonstrated the inaccuracies in interpreting results at younger ages using NHANES III, and the ATS strongly discourages extrapolation of reference data beyond the intended age/height range (3, 4). The alternative, to use pediatric reference equations before switching to NHANES III, introduces discontinuities between equations at the transition point and can lead to further misinterpretation.
In recognizing these limitations, it may be reasonable to collate other available pediatric data to extend the NHANES III reference to younger ages. Collation of previously collected reference data has been shown to be a reasonable alternative to collecting new data provided the data are available for reanalysis and that the datasets are relatively homogeneous (5). Such an extended dataset could also be reanalyzed in an attempt to model the transition between childhood and adulthood continuously, taking into consideration the combined effects of height and age.
Additional inconsistencies in pediatric reference data include the selection of explanatory factors used to predict lung function. There is no doubt that spirometric lung function is related to height and, in adults, both FEV1 and FVC are known to decrease with age. However, in children, as a result of the growth process, age and height are highly correlated, thus some references have chosen to omit age from prediction models. The studies that adjusted for both height and age in children tend to describe either an absolute association (additive) (2, 6, 7) or a proportional one (multiplicative) (8) and, in some cases, develop age-specific equations (6, 9). Adjustment for age using a proportional model may be especially important during periods of rapid growth, such as during puberty when lung and somatic growth may not be synchronized (8, 10).
Spirometric lung function is further complicated because the variability of individual measurements around the median is not uniform across the age/height range and there is skewness in the distributions. This means that conventional multiple regression analysis is not adequate to model the complex relationship between body size and lung function.
This study investigated ways to develop more appropriate reference ranges that could describe the relationship between lung function and height and age more accurately within the pediatric age range, while also including the adult age range and the transition between childhood and adulthood. Preliminary results were presented in abstract form at the 2007 ATS conference (11).
We have chosen to focus on three spirometry outcomes: FEV1, FVC, and forced expiratory flow, midexpiratory phase (FEF25–75), plus the FEV1/FVC ratio. FEF25–75 is referred to as MMEF in some centers, but will be referred to here as the FEF25–75. The outcomes were modeled in terms of sex, age, and height.
To supplement the data from the NHANES III survey, the lead authors of several large-population surveys that had measured children across a wide age range, including those younger than 8 years, were contacted to obtain original data. Positive responses were limited to those from individuals personally known to the authors.
Data from four surveys were obtained, as summarized in Table 1. Briefly, the NHANES III is a large, representative, stratified, random survey of the U.S. population aged 8 to 80 years (2). For these analyses, the original data were reanalyzed to be consistent with the 1994 ATS criteria. These data were supplemented with pediatric reference data published by Rosenthal and colleagues, who sampled children aged 4 to 19 from 12 London schools in the early 1990s (7). This reference is currently recommended by the British Thoracic Society for children in the United Kingdom. Additional pediatric data were available from a Belgian study (12), which used spirometry to validate the forced oscillation technique in children aged 5 to 18 years. Finally, data from an older Canadian study (13), which measured healthy individuals aged 4 to 40 years to develop prediction equations, were also included in the analysis, of which only the pediatric subset younger than 20 years was used to increase the proportion of children within the collated dataset (13).
Two-thirds of the original NHANES III population was of African-American or Mexican-American ethnic origin and approximately one-third of the Rosenthal data were nonwhite. Rosenthal and colleagues (7) classified nonwhite subjects as Afro-Caribbean, Oriental, Middle Eastern, Pakistani/Bangladeshi, and other; of these, only 39 subjects were Afro-Caribbean. The other two surveys were limited to non-Hispanic white subjects. Due to known ethnic and racial differences in lung function, models developed for this study were limited to data from non-Hispanic white subjects, but were subsequently compared with data from African-American and Mexican-American subjects. The ethnic and racial data currently available for these analyses were not sufficient to develop ethnically and racially specific “all age” reference ranges that would be robust enough to apply in multiethnic populations,
To identify possible transcription errors, each dataset was examined individually for obvious outliers and impossible values. Because these data have been published previously, the datasets were relatively clean and very few data points (<1%) needed to be excluded.
Conventional multiple regression analysis relies on four assumptions: (1) a linear relationship, (2) constant variability of values around the mean across the range of height and age, (3) a normally distributed outcome variable, and (4) that the combined effect of the covariates is additive. In the case of spirometric measures of lung function, these assumptions are rarely met.
The LMS (lambda, mu, sigma) method (14), widely used to construct growth reference charts, is an extension of regression analysis that includes three components: (1) the median (mu), which represents how the outcome variable changes with an explanatory variable (e.g., height or age); (2) the coefficient of variation (sigma), which models the spread of values around the mean and adjusts for any nonuniform dispersion; and (3) the skewness (lambda), which models the departure of the variables from normality using a Box-Cox transformation.
Because height is not the only explanatory variable that needs to be considered, we applied the LMS method using the GAMLSS package (15) in the statistical program R (Version 2.4.1; R Foundation, http://www.r-project.org) to allow for modeling of more than one explanatory variable—in this case, height, age, and potential between-center differences. Separate models were developed for males and females. Smoothly changing curves were applied to remove the effects of sampling and measurement variability across the height and age range without distorting the underlying relationship, which allowed the effects of height and age to be modeled as a smooth transition from adolescence into adulthood as a function of age. The goodness of fit was assessed using the Schwartz Bayesian criterion (SBC), which compares consecutive models directly while adjusting for the increased complexity to determine the simplest model with best fit. Any reduction of SBC is considered technically important, but in practice we chose to balance reductions with clinical relevance and biological plausibility. A detailed description of the statistical methodology is planned.
The fitted model provides height/age/sex-specific values of the three elements of the distribution (median, coefficient of variation (CV), and skewness). The spirometry standard deviation (SD) is in the same units as the outcome (i.e., L or L/s), whereas the CV is defined as 100 × (SD/median). The median is the predicted value for the individual, which, together with the CV and skewness, allows the individual's measurement to be converted to a z score; z scores are normally distributed with a mean of 0 and an SD of 1. The lower limit of normal is officially defined as the fifth percentile of the distribution that corresponds to a z score of −1.64 (1, 3). We also discuss the clinical definition that assumes a CV of 10% and defines the normal range as two CVs on either side of 100% predicted (i.e., a normal range of 80 to 120%).
The models were based on 3,598 non-Hispanic white subjects aged 4 to 80 years, 2,182 (60.5%) of whom were younger than 20 years and 271 (7.5%) of whom were younger than 8 years. A description of the demographic characteristics of the study population can be found in Table 2. Children from the NHANES III reference were taller (height-for-age z score) (16) and heavier (weight-for-age z score) (data not shown) than children in the other three datasets. On the basis of the original distributions of each of the three outcomes (FEV1, FVC, and FEF25–75) studied, height and age were nonlinear, the spread of values around the mean was nonuniform for both height and age, and there was also evidence that the distributions of all three outcomes were skewed. FEF25–75 results were not available for the British data, so models for these data are based on 700 fewer subjects.
The models for all three outcomes were dependent on height and age, and logarithmic transformation of both the outcome and explanatory variables was necessary. The resulting models describe a multiplicative and allometric height relationship, where all three spirometric outcomes are proportional to height raised to the power 2.5. For example, a 1% increase in height corresponds to a 2.5% increase in spirometry. The median volumes for each of the outcomes, smoothed by age, are presented in Figure 1. Despite age and height being highly correlated, there was a significant and independent effect of age after adjusting for height (Figure 2).
The LMS method also quantifies the spread of values around the median, which is essential information when determining the range of expected lung function values in a normal population. After adjustment for the effects of height and age, the between-subject variability, characterized by the CV, demonstrated important age-related trends (Figure 3). The between-subject variability was highly age dependent, being greatest in children younger than 11 years and increasing steadily with increasing age in adults after the age of 30. The variability of FEF25–75 was noticeably larger than for FEV1 and FVC. The commonly quoted “normal range” of 80 to 120% predicted assumes a CV of 10%; however, as can be seen from Figure 3, even for FVC, this only occurs over a limited age range of 15 to 35 years. By contrast, at 5 to 6 years of age, the CV for FEV1 and FVC is 15%, corresponding to a normal range of 70 to 130% predicted. The CV for FEF25–75 at age 5 to 6 years is 20%, corresponding to 60 to 140% predicted, and by age 50, the CV for FEF25–75 has widened to 30%, a normal range of 40 to 160%.
After adjustment for height and age, there was little evidence of skewness for FEV1 and FVC. By contrast, there was significant skewness in FEF25–75 and the FEV1/FVC ratio for both sexes, which was incorporated into the prediction models.
The age-related changes in FEV1 and FVC were accompanied by age-related changes in the ratio (FEV1/FVC) (Figure 4). As can be seen, the frequently quoted predicted FEV1/FVC of 0.7 is not in fact attained until around 50 years of age in males and considerably later in females, being noticeably higher during childhood and lower in the elderly. The range of “normal values” for this ratio is age dependent, being wider in both the young and the elderly, and sex differences are apparent, with females having greater predicted values of FEV1/FVC than males at all ages and which are most marked in late puberty (Figure 4).
The models were further explored by evaluating the extent to which between-center differences affected the expected reference range. After adjustment for height and age, there were small but significant between-center differences in FEV1 for both males and females and in FVC for females. Compared with NHANES III, median values from Lebecque (12) and Corey (13) were 2 to 3% greater after adjustment, whereas those from Rosenthal and colleagues (7) were approximately 4% smaller. Interestingly, no between-center differences were observed for FVC in males or for FEF25–75 in either sex.
The NHANES III African-American subjects had considerably lower FEV1 and FVC, but similar flows and FEV1/FVC compared with non-Hispanic white subjects (Figure 5). With the exception of FVC in females, Mexican Americans had similar values to non-Hispanic whites. Ethnic and racial differences varied according to sex, generally being more marked in females. Of significance is that the standard deviations for each of the sex-specific ethnic z scores were approximately 1, which could facilitate development of race- and sex-specific adjustment factors to account for the shift in values.
Figure 6 compares the current model with the original NHANES III equations in terms of the median and the lower limit of normal. Although the new model is not dramatically different from the original, three major advantages of the current approach can be seen. First, the current models extend the reference down to 4 years of age, thereby improving the accuracy with which normal values can be predicted in very young children; it can be seen that the original NHANES III equations underpredict lung function in healthy children younger than 10 years and therefore fail to identify early lung disease. Second, smoothly changing curves describe the transition between childhood and early adulthood. Third, the age-dependent between-subject variability is quantified, thereby allowing improved precision with which to define the lower limits of normal at all ages.
The methods used do not produce equations per se but comprehensive look-up tables that can be applied in a Microsoft Excel add-in module. The module can be found at www.growinglungs.org.uk (Pediatric Reference Ranges for Spirometry). The module can also be easily implemented into current commercial spirometers, upon request by manufacturers. The program facilitates prospective interpretation of a single observation or retrospective analysis of an entire dataset to calculate z scores, % predicted, or centiles.
This study presents a new approach to modeling spirometry data, which produces “all age” reference curves using a single, smoothly age-changing model to explain the complex relationship between lung function and height and age during puberty and early adulthood. In addition, the dataset extends the NHANES III reference to include children as young as 4 years and uses age-dependent between-subject variability to establish the lower limits of normal. This study also confirms previous observations regarding the rapid decrease in the FEV1/FVC ratio with age (17, 18). In Figure 3, we demonstrate that the ratio is not fixed at 0.70, as recommended by the Global Initiative for Chronic Obstructive Lung Disease (19, 20), but is markedly age dependent. The initial high values reflect the relatively large airways in relation to lung volumes in early life, which are associated with a short expiratory time constant and rapid lung emptying, whereas during adolescence, the rapid decline in FEV1/FVC probably reflects the different rates of lung and airway growth (dysanaptic growth) during this period, which may be particularly marked in males, in whom lung growth continues for several years after somatic growth has ceased (8, 17, 18).
A key feature of this study is the proportional model that adjusts for measures of body size and age in a way that is biologically plausible, where the nonlinear height relationship approximates the three-dimensional shape of the chest. Inclusion of an age adjustment in addition to height allows the complex changes during puberty to be accounted for without the need to undertake pubertal staging, which may be impractical in many clinical and research settings.
The ethnic/race trends in Figure 5 complement those described in the original analysis of the NHANES III dataset and highlight the fact that ethnic/race adjustments are complex and inconsistent, such that using the same adjustment factor for both sexes and for all outcomes, as commonly reported (i.e., 12%), is unlikely to be appropriate (3). For these analyses, we did not have access to sufficient ethnicity- and race-specific data in children younger than 8 years or for other groups not included in the NHANES III dataset. Having now established suitable modeling techniques to allow development of all-age reference ranges, a further initiative will be required to collate more ethnicity- and race-specific spirometric data from healthy children (especially those younger than 8 yr) and adults, so that the exercise can be extended for multiethnic application.
In addition to allowing more accurate predictions of expected values in younger children and a smooth transition between pediatric and adult reference data, the ability to quantify the age/height-adjusted between-subject variability has major implications for defining clinical thresholds of normal. Respiratory clinicians are familiar with expressing lung function results as % predicted (i.e., 100 *[observed/predicted]), where the predicted value comes from a reference equation incorporating sex, age, and height. A value of 100% predicted represents the median reference value, with a range of values around the median indicating between-subject variability. For FEV1 and FVC, this variability has conventionally been taken to be a CV of 10% and the normal range is ±2 CVs of the median (i.e., 80–120%). This is valid as long as the CV is genuinely 10%. However, when actual variability of the three spirometric outcomes is plotted as a function of age (Figure 4), it can be seen that a CV of 10% is only observed over a narrow age range of between 15 and 35 years. In younger children and older adults, the CV approaches 15% for FEV1, which extends the normal range to 70 to 130%. Given this wider range of normal values in younger and older subjects, age-specific cutoffs for the lower limit of normal are essential because failure to account for this increased variability will incorrectly flag individuals as “abnormal.” This problem is exacerbated by the differences in between-subject variability between different spirometric outcomes.
An alternative approach is to express results in terms of a z score rather than % predicted, because z scores combine the % predicted and CV into a single number: z score = (% predicted − 100)/CV. Regardless of the CV, the range of normal values is consistent as the z score changes in relation to the CV. Thus, although 80% predicted represents the lower limit of normal when the CV is 10%, it is well within the normal range if the CV is greater than this. This further emphasizes the need to know the between-subject CV for each outcome if results that have been expressed as % predicted are to be interpreted correctly.
Although the ATS statement on interpretation of lung function tests does not explicitly recommend the use of z scores, it does state that the lower limit of normal corresponds to the fifth percentile of the frequency distribution (3). When data are normally distributed, z scores correspond directly to percentiles such that a z score of −1.64 is equivalent to the fifth percentile (1). Thus, the lower limit of normal can be defined as % predicted −1.64 × CV. Regardless of whether z scores, percentiles, or % predicted are used to interpret results, it is imperative to consider the between-subject CV when determining the lower limit of normal.
We have demonstrated that it is possible to collate data from more than one center, and have established a foundation on which larger international and more comprehensive datasets can be built. The small differences observed between centers could be due to equipment or software differences, measurement technique, and/or true population differences. For instance, the Canadian data are more than 30 years old and there may be differences in population characteristics, such as timing of puberty. It is remarkable that, despite the cohort effects, the differences between centers were minimal and not likely to be clinically important. Ideally, each center should develop its own reference ranges; however, in practice, this is rarely feasible (22, 23). In effect, this combined dataset describes a typical center, trading off a slight reduction in precision, due to the increased between-center variability, against a reduction in bias. Combining data from more than one center provides a possible way forward to address the ongoing practical problem of applying reference data in centers that lack their own reference. Nevertheless, centers should continue to validate reference equations with a sample of healthy control subjects from their own population to test for any systematic biases (3). Although the higher variability in younger subjects might be, at least partially, attributed to learning effects, the fact that the majority of children contributing to these cross-sectional datasets would have been naive healthy subjects with minimal prior exposure to spirometry makes this unlikely.
These reference data are potentially limited by the fact that we have not addressed the issue of whether FEV1 is the most appropriate outcome during early childhood. Young children have relatively large airways compared with their lung volumes such that, during forced expiration, emptying may be virtually complete within 1 second. In such cases, FEV1 largely reflects the FVC, suggesting that FEV0.75 may be a more appropriate measure for young children (24, 25). As part of the current exercise, NHANES III was reanalyzed to calculate FEV0.75, but because these data were not available from the other three datasets, they are currently limited to children older than 8 years, rather than the younger age group where they are most likely to be clinically useful. With the exception of some very recent reports on preschool spirometry (24, 25), reference equations for FEV0.75 in children remain limited and outdated. We are currently undertaking an international collaborative study to collate spirometric data in very young children, including FEV0.75 (www.growinglungs.org.uk), which we plan to incorporate into the current dataset in the future.
This modeling technique provides an elegant solution to a complex and longstanding problem: fitting age and height trends to all-age lung function data. Furthermore, we provide a biologically plausible and statistically robust means of developing continuous reference ranges from early childhood to old age. These models have the potential to be a platform from which future studies can be developed to continue to improve the accuracy of reference data for pulmonary function tests.
Supported by Asthma UK (S.S.) and UK Medical Research Council grant G9827821 (T.J.C.).
Originally Published in Press as DOI: 10.1164/rccm.200708-1248OC on November 15, 2007
Conflict of Interest Statement: None of the authors has a financial relationship with a commercial entity that has an interest in the subject of this manuscript.