|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: TL LB JDK MHO TK TD RL DWJ. Performed the experiments: TL LB JDK MHO TK TD RL DWJ. Analyzed the data: LB. Wrote the paper: TL LB JDK MHO TK TD RL DWJ.
In developing countries, pneumonia is one of the leading causes of death in children under five years of age and hence timely and accurate diagnosis is critical. In North America, pneumonia is also a common source of childhood morbidity and occasionally mortality. Clinicians traditionally have used the chest radiograph as the gold standard in the diagnosis of pneumonia, but they are becoming increasingly aware that it is not ideal. Numerous studies have shown that chest radiography findings lack precision in defining the etiology of childhood pneumonia. There is no single test that reliably distinguishes bacterial from non-bacterial causes. These factors have resulted in clinicians historically using a combination of physical signs and chest radiographs as a ‘gold standard’, though this combination of tests has been shown to be imperfect for diagnosis and assigning treatment. The objectives of this systematic review are to: 1) identify and categorize studies that have used single or multiple tests as a gold standard for assessing accuracy of other tests, and 2) given the ‘gold standard’ used, determine the accuracy of these other tests for diagnosing childhood bacterial pneumonia.
Search strategies were developed using a combination of subject headings and keywords adapted for 18 electronic bibliographic databases from inception to May 2008. Published studies were included if they: 1) included children one month to 18 years of age, 2) provided sufficient data regarding diagnostic accuracy to construct a 2×2 table, and 3) assessed the accuracy of one or more index tests as compared with other test(s) used as a ‘gold standard’. The literature search revealed 5,989 references of which 256 were screened for inclusion, resulting in 25 studies that satisfied all inclusion criteria. The studies examined a range of bacterium types and assessed the accuracy of several combinations of diagnostic tests. Eleven different gold standards were studied in the 25 included studies. Criterion validity was calculated for fourteen different index tests using eleven different gold standards. The most common gold standard utilized was blood culture tests used in six studies. Fourteen different tests were measured as index tests. PCT was the most common measured in five studies each with a different gold standard.
We have found that studies assessing the diagnostic accuracy of clinical, radiological, and laboratory tests for bacterial childhood pneumonia have used a heterogeneous group of gold standards, and found, at least in part because of this, that index tests have widely different accuracies. These findings highlight the need for identifying a widely accepted gold standard for diagnosis of bacterial pneumonia in children.
In developing countries, pneumonia is one of the leading causes of death in children under five years of age and hence timely and accurate diagnosis is critical . In North America, pneumonia is also a common source of childhood morbidity and occasionally mortality . A study from Israel has also shown that there can be significant economic burdens to children and families dealing with community acquired pneumonia, as well as significant reduction in their quality of life .
Viruses, atypical, and typical bacteria cause the vast majority of childhood pneumonia – The distribution of pathogens varies with age and clinical setting. Atypical bacterial microorganisms, such as Mycoplasma and Chlamydia usually occur in children between the ages of five and 15 years –, while the incidence of viral infections typically decreases with age . In hospitalized children, the most frequently diagnosed bacteria are the typical pathogens, such as Streptococcus pneumoniae . It can be difficult to identify whether the cause of pneumonia in a given patient is bacterial or nonbacterial , . Classic signs unique to bacterial or nonbacterial pneumonia can be helpful in coming to a diagnosis . However, these signs and symptoms are often subjective, and are ultimately imprecise at determining whether antibiotics are truly warranted .
A clinically acceptable gold standard for the diagnosis of bacterial pneumonia has not yet been developed , , . Often the most readily available means of diagnosing pneumonia are through observations of physical signs and radiological evidence. Diagnostic guidelines have been developed by the World Health Organization for pneumonia and these are generally used in developing countries or in the absence of quick access to laboratory tests . Other diagnostic tests have been used with variable rates of accuracy, such as chest radiographs, laboratory tests (white blood cell count [WBC]) with differential, C-reactive protein (CRP), erythrocyte sedimentation rate (ESR) , , blood cultures and serology , and lung puncture , . The ideal surrogate marker for bacterial pneumonia should be accurate, minimally invasive, and readily available. To date, there is no such gold standard that a physician can rely on to confidently diagnose and subsequently treat bacterial pneumonia .
Clinicians traditionally have used the chest radiograph as the gold standard in the diagnosis of pneumonia, but they are becoming increasingly aware that it is not ideal. Numerous studies have shown that chest radiography findings lack accuracy in defining the etiology of childhood pneumonia , , . There is no single test that reliably distinguishes bacterial from non-bacterial causes . These factors have resulted in clinicians historically using a combination of physical signs and chest radiographs as a ‘gold standard’, though this combination of tests has been shown to be imperfect for diagnosis and assigning treatment , .
The objectives of this systematic review are to: 1) identify and categorize studies that have used single or multiple tests as a gold standard for assessing accuracy of other tests, and 2) given the ‘gold standard’ used, determine the accuracy of these other tests for diagnosing childhood bacterial pneumonia.
This review has been carried out using methods defined for rigorous systematic reviews , . The aim was to use these guidelines and other methodological criteria – to produce a systematic review that is comprehensive and summarizes the data collected (see PRISMA Checklist S1).
Data for this study was acquired through previously published work, no patient or hospital data was accessed. Therefore, written consent and institutional ethical review was not required for this research.
Search strategies were developed using a combination of subject headings and keywords, including: “pneumonia”, “bacteria”, “community acquired pneumonia”, “lower respiratory tract infection”, “pneumococcal”, “diagnosis”, “accuracy”, “sensitivity”, “reliability”, “specificity”, “false/true positive/negative”, “predictive value”, “observer variation”, “likelihood functions/ratios”, “ROC curve”, “receiver operating characteristic”, “child”, “adolescent”, “infant”, “minors”, “pediatrics”, “nurseries”, “youth”, “nursery”, “nurseries”, “toddler”, “clinical trials”, “cohort studies”, “case-control studies”, “comparative”, “evaluation studies”, “prospective”, “retrospective”, and “follow up”.
These keywords were adapted for each of the 18 electronic bibliographic databases from inception to May 2008 (see Table 1 for full listing). Extended systematic search methods (e.g., hand searches of non-indexed journals, reference list tracking, and contact with experts) were also used (See Table 2 for full listing). No language or date restrictions were applied to the search strategy.
Inclusion criteria were assessed independently by at least two reviewers (LB and RL). The primary reason for exclusion of articles was documented. Scientific-based publications were included if they: (1) involved children between the ages of 1 month and 18 years of age, (2) provided diagnostic accuracy data to construct a 2×2 table, and (3) compared a gold standard and index test that were both used to make a diagnosis of bacterial pneumonia taken to include both typical and atypical pneumonia. Gold standard and index test categories included radiographic, hematologic, immunologic, microbiologic, virologic, and clinical variables (signs and symptoms). Due to the lack of a defined gold standard that can reliably differentiate bacterial from non-bacterial pneumonia, all combinations of tests assessing the diagnostic accuracy of bacterial pneumonia were included. To assess study quality the Quality Assessment of Studies of Diagnostic Accuracy included in Systematic Reviews (QUADAS) was applied by two independent reviewers (LB and RL) , .
Data was extracted by one reviewer and checked for accuracy and completeness by a second reviewer. Any disagreements were resolved through discussion with the clinical leaders.
Data analysis was based on a published methodological review . The primary outcome is accuracy of the screening test (i.e., sensitivity, specificity, positive and/or negative predictive values with corresponding 95% confidence intervals using standard formulas) . For each individual study, we reconstructed a standard 2×2 table and if multiple studies had used the same index test and gold standard weighted averages of the sensitivities, specificities or predictive values were computed.
The literature search revealed 5,989 references of which 256 were screened for inclusion. As shown in Figure 1 this resulted in 25 studies that satisfied all inclusion criteria.
The studies examined a range of bacterium types and assessed the accuracy of several combinations of diagnostic tests. Detailed characteristics of each study appear in Tables S1 and S2. These studies were published between 1986 and 2007 from 12 different countries. The majority of included studies originated from higher income countries (Australia, Italy, Spain, France, United States, Switzerland, Japan, and Finland), as defined by The World Bank, with 7 studies from the middle to low income category (China/Taiwan, Argentina, Brazil, and Bangladesh) . All subjects were children between the ages of one month and 17 years, with a mean age of 6.56 years (based on 14 studies reporting a mean or median age). Gender was evenly distributed as specified in 12 of 25 studies (52.3% male). The majority of the studies collected patient data prospectively (21/25) from a single site (24/25). Eleven studies examined atypical species of bacterial pneumonia, six looked at typical bacteria, and seven combined both atypical and typical varieties. One study defined what they studied only as ‘bacterial pneumonia’.
To be included in our review studies needed to clearly describe both the index and gold standards used. A specific gold standard was not defined a priori, therefore all combinations of index tests and gold standards were included, provided the studies met all inclusion criteria. We broadly categorized the types of diagnostic tests (both gold and index) as radiographic, hematologic, immunologic, microbiologic, virologic, or clinical variables (signs/symptoms) for ease of comparison. From the 25 included articles, we ended up with 23 distinct combinations of these categories. As a result of the wide range of testing modalities it was not possible to combine studies or compute weighted accuracy data (see Table S3 for individual study tests and data). Therefore we conducted a qualitative review of this literature and non-numerically summarized the major findings. Results for each of the studies can be found in Table S3. All 25 articles were assessed using the QUADAS tool and the scores ranged from 8 to 14, with an average score of 10.44 (see Table S4 for quality assessment of individual studies).
Eleven different gold standards were studied in the 25 included studies. The most common gold standard utilized was blood culture tests used in six studies –. These studies measured the criterion validity of nine different index tests, including the measurement of signs/symptoms, hematologic, chest radiograph, nested Polymerase Chain Reaction (PCR), procalcitonin (PCT), CRP, latex agglutination, immunochromatographic membrane assay, and lung aspirate. Sensitivities ranged from 10% for the lung aspirate as an index test to 100% with urine latex agglutination for Hib as an index test. Specificities ranged from 63.2% for the chest radiograph as an index test to100% with nested PCR as an index test.
Five studies , – used a chest radiograph either alone or with other variables as the gold standard, measuring the validity of seven index tests. These index tests included: PCT with three cutoff points, WBC count, CRP, serology by complement fixation in 2 studies, latex particle agglutination, and nested PCR. With the chest radiograph as the gold standard, sensitivities ranged from 14.3% (radiograph exhibited air trapping) to 77.8% (PCT>0.5 ng/ml) and specificities ranged from 34.8% (PCT>0.5 ng/ml) to 100% (nested PCR).
The one study  that used pleural fluid cultures as the gold standard revealed that the immunochromatographic membrane assay for urinary pneumococcal antigen detection had a sensitivity of 90.9% and a specificity of 68.8%. Other gold standards utilized in the studies included: hematologic , microbiologic –, hematologic/immunologic , serology , , immunologic –, and clinical signs and symptoms .
Fourteen different tests were measured as index tests. PCT was the most common measured in five studies , , , , , each with a different gold standard. There were as many as four separate cutoff points set for the PCT levels utilized. The sensitivity of PCT ranged from 40% when 0.5 ng/dl was set as the cutoff point and chest radiography was used as a gold standard to 95.4% in two studies with a cutoff above 0.5 ng/dl when blood cultures were used as the gold standard. For all studies its sensitivity decreased and its specificity increased as the cutoff points were raised.
An additional four studies used clinical variables, PCR, and CRP as their index tests. PCR was used as the index test , , ,  with six different gold standards. Sensitivities ranged from 36.4% with complement fixation (Mycoplasma pneumonia) to 95.7% with Mycoplasma serology. Specificities ranged from 79.7% with mycoplasma serology to 100% with positive blood cultures or clinical and radiological evidence of pneumonia.
Clinical variables , – and CRP , , ,  each demonstrated broad ranges in sensitivities and specificities for the array of clinical variables and the different cutoff points for CRP measured. The chest radiograph's accuracy was measured as an index test in four , , ,  of the studies. Its sensitivity peaked at 75% with a range of 0–75% depending on the radiological definition assigned while its specificity ranged from 50 to 100%. Four studies , , ,  utilized the total WBC count as an index test with three ranges being utilized. Sensitivities ranged from 20% to 65.1% and specificities ranged from 53.1% to79.3% when the total WBC count was above 15 000 (X106/l).
Other index tests utilized in the studies included: interleukin-6 at 3 different levels , , , immunologic , –, microbiologic , –, , virologic , , hematologic , and lung aspirate .
Diagnostic testing provides physicians with information about the likelihood of certain diseases. Ideally these diagnostic tests have been validated against an agreed-upon gold/reference standard. The objective of the Standards for Reporting of Diagnostic Accuracy (STARD) initiative  is to improve the accuracy and completeness of reporting of studies of diagnostic accuracy. They defined the gold/reference standard to be “the best available method for establishing the presence or absence of the condition of interest.” They further add that “the reference standard can be a single method, or a combination of methods, to establish the presence of the target condition. It can include laboratory tests, imaging tests, and pathology, but also dedicated clinical follow-up of subjects.” This systematic review has demonstrated that diagnostic tests used for pediatric pneumonia have not been truly validated and there is little agreement as to what tests should be used as a gold standard. It is, therefore, difficult to recommend any of the reference standards used in the reviewed studies as “the best available method” given these limitations.
This review underscores the fundamental problem with diagnosing pneumonia in children when there is no proven and accurate gold standard. Since the standards used to define pneumonia are variable and inconsistent it is difficult to know whether the criterion validity of these diagnostic tests is accurate or not. A problem of ‘circularity’ exists for which there is no easy solution.
An additional problem is that the included studies did not all focus on the same type of bacterial disease. Eleven studies dealt with atypical pneumonia, six with typical pneumonia, and seven studied both typical and atypical pneumonia. And even within those studies which focused on the same type of bacterial etiology (e.g. pneumococcal pneumonia), each study defined the disease differently. For example, a patient with a positive blood culture for pneumococcus is likely clinically different from a patient with a negative blood culture.
One challenging aspect is that most of the studies were performed in high income countries with only seven studies performed in low income countries. This is contrast to the disease burden, where most of the mortality from pneumonia happens in low income countries. Future research should try to redress this imbalance.
Of the eleven different gold standards utilized, the blood culture and the chest radiograph were the most common tests. Chest radiography was utilized in five studies as the gold standard while in three other studies it was measured as an index test. When it was employed as an index test its sensitivity was generally low while its specificity was generally high. This sub-par performance as an index test illustrates that the use of chest radiography as a gold standard is potentially flawed. The use of ten other gold standards for twenty of the studies highlights that there is much disagreement amongst researchers worldwide whether the chest radiograph should be utilized as a gold standard or an index test. In most academic emergency departments, a chest radiograph is considered the standard of care and is readily obtained for pediatric patients with the clinical suspicion of bacterial pneumonia. Clinically similar patients with potential ambulatory pneumonia presenting to a clinic or private office are less likely to undergo chest radiography.
From a clinical perspective, the blood culture is somewhat invasive and the results are generally not available for several hours. Further only a relatively small percentage of patients with bacterial pneumonia yield a positive blood culture (which results in low sensitivity), and now with the widespread use of conjugate pneumoococcal vaccine, the yield of blood cultures would be even less.
Fourteen different index tests' criterion validity was measured. The heterogeneity of the different studies was further illustrated when the results of the different index tests were compared. Though the overall criterion validity of PCR was reasonably consistent, most other index tests (e.g. clinical variables, total WBC count, interleukin-6 and CRP) had highly variable accuracy.
As an example, in one study of PCT used as an index test, Don  concluded, in contrast to other studies, that serum PCT could not reliably distinguish bacterial from non-bacterial pneumonia. However, Don et al.  utilized as gold standard a chest radiograph that was inconclusive in 34% of their patients. This example illustrates that, given the diversity of the diagnostic methods used, current evidence is potentially inaccurate and highly misleading.
There is a critical need for experts in childhood pneumonia to develop an accepted gold standard. While it would be optimal for such a test to be cheap and readily available to practicing clinicians, the development of a more complex gold standard for use in research studies would be a major advance. As suggested in the STARD initiative, one approach for developing a more complex standard is to use a combination of methods including imaging tests, laboratory tests available both immediately and long after the fact, and clinical features obtained not only at presentation but on dedicated follow-up subjects. The problem then becomes how each of these individual items should be weighted relative to the others. Given the highly variable results we found for most reference tests, a fixed algorithmic approach to combining methods is not possible. Alternatively an expert panel could use standard consensus methods to weigh the results of chest x-ray, standard and specialized laboratory tests, bacterial and viral diagnostic tests and clinical course of patients to classify patients as bacterial or non-bacterial.– The development of such a gold standard would greatly enhance and aid the evaluation of diagnostic tests for their accuracy in the future.
Although we conducted a comprehensive electronic and hand search of the literature, as well as verification of all extracted data this review is not without limitations. The main limitation of this review is the inability to include Latin American databases such as the Latin American and Caribbean Health Sciences Literature (LILACS) and Scientific Electronic Library Online (SciELO) as part of the electronic search strategy. At the onset of this review we were unable to identify a clinical expert fluent in Spanish to participate in the identification of search terms and in the screening, inclusion/exclusion, and extraction phases of the systematic review. African and Asian databases were also not included for similar reasons. We acknowledge this as a limiting factor of this review but with the breadth of other databases searched we do not believe this has altered the results.
In conclusion, we have found that studies assessing the diagnostic accuracy of clinical, radiological, and laboratory tests for bacterial childhood pneumonia have used a heterogeneous group of gold standards, and found, at least in part because of this, that index tests have widely different accuracies. These findings highlight the need for identifying a widely accepted gold standard for diagnosis of bacterial pneumonia in children.
(0.07 MB PDF)
(0.06 MB PDF)
Criterion Validity for the Diagnosis of Bacterial Pneumonia.
(0.06 MB PDF)
Quality Assessment of Studies of Diagnostic Accuracy included in Systematic Reviews (QUADAS).
(0.01 MB PDF)
PRISMA Checklist of items to include when reporting a systematic review or meta-analysis (diagnostic review consisting of cohort studies).
(0.06 MB DOC)
Ethics approval was not required for the conduct of this systematic review. The authors would like to acknowledge the contributions of Yuanyuan Liang, PhD for statistical consultation, Donna Dryden, PhD, and Lisa Hartling, PhD for methodological expertise. The authors would also like to thank the Alberta Research Centre for Health Evidence (ARCHE) and Pediatric Emergency Research Canada (PERC) for their support.
Competing Interests: The authors have declared that no competing interests exist.
Funding: This study was funded by the Canadian Institutes of Health Research Team Grants Program (CIHR Team in Pediatric Emergency Medicine) (grant #G118160601). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.