|Home | About | Journals | Submit | Contact Us | Français|
Tests for ocular Chlamydia trachomatis have not been well characterized, because there is no gold standard test. Latent class analysis (LCA) was performed to estimate the sensitivity and specificity of laboratory and clinical tests for trachoma in the absence of a gold standard.
Individual data from pretreatment, hyperendemic areas in Ethiopia were used. A clustered LCA was performed for three diagnostic tests: PCR and WHO simplified criteria grades of follicular trachoma (TF) and intense trachomatous inflammation (TI).
Data from 2111 subjects in 40 villages were available. TF was estimated to be 87.3% (95% CI, 83.3–90.1) sensitive and 36.6% (95% CI, 23.6–40.3) specific; TI was estimated to be 53.6% (95% CI, 46.1–88.0) sensitive and 88.3% (95% CI, 83.3–92.0) specific, and PCR was estimated to be 87.5% (95% CI, 79.9–97.2) sensitive and 100% (95% CI 69.3–100) specific.
LCA allows for an estimate of test characteristics without prior assumption of their performance. TF and TI were found to act in a complementary manner: TF is a sensitive test and TI is a specific test. PCR is highly specific but lacks sensitivity. The performance of these tests may be due to the time course of ocular chlamydial infection, and for this reason, results may differ in areas of low prevalence or recent mass treatment (ClinicalTrials.gov number, NCT00221364).
Trachoma is the leading infectious cause of blindness worldwide, with an estimated 1.3 million blind in 2002.1 The responsible infectious agent, Chlamydia trachomatis, responds to antibiotic treatment,2 but the infection is difficult to diagnose accurately.3 The World Health Organization (WHO) bases its treatment recommendations on the prevalence of clinically active trachoma,4 and our group has demonstrated that treatment strategies may have to be altered depending on the prevalence of chlamydia infection in the community.5–7 Elimination efforts require accurate surveillance of disease to assess when treatments can be discontinued and to detect reemergence in previously treated areas.
Difficulties in diagnosis stem from the lack of an accepted gold standard for active C. trachomatis infection. Disease surveys typically rely on the prevalence of findings from the WHO simplified grading criteria.8 However, clinical examination findings and chlamydial infection are frequently discordant,3 in part because the clinical signs of trachoma persist for many weeks after infection has been cleared.9,10 Culture, while specific for viable organism, is an expensive technical challenge and is thought to have low sensitivity. DNA-based polymerase chain-reaction (PCR) is sometimes assumed to be a gold standard,11 but can be negative in the presence of infection identified by RNA-based PCR.10,12,13 Although the RNA test is thought to be the most sensitive method for detecting chlamydia, it has not been used frequently in trachoma studies.
When determining the sensitivity and specificity of diagnostic tests, using an imperfect test as a gold standard leads to biased estimates of the comparison tests. It also causes the sensitivity and specificity of tests to vary with prevalence, a phenomenon that has been reported with trachoma.14–19 Latent class analysis (LCA) allows comparison between observed data and a parameter-optimized latent gold standard. In effect, the latent gold standard acts as a composite of all available data and permits the estimation of sensitivity and specificity of each test individually. Those estimates are not based on preconceived notions that we have about test performance; rather, they arise completely from trends in the data. In this study, we applied a clustered LCA to baseline village data from the Trachoma Elimination Follow-up (TEF) study to evaluate diagnostic tests for ocular chlamydia infection, including the clinical signs of follicular trachomatous inflammation (TF) and intense trachomatous inflammation (TI) and DNA-PCR assay (Amplicor PCR; Roche Diagnostics, Indianapolis, IN).
We randomly selected 40 villages in the Gurage zone of Ethiopia for enrollment in the TEF study. All children aged 1 to 5 years in these villages were offered screening before any study interventions were implemented. Screening consisted of conjunctival examination and swabbing. The upper right tarsal conjunctiva was graded by trained personnel using WHO simplified grading criteria,8 which include TF and TI. TF is defined as the presence of five or more follicles at least 0.5 mm in size in the upper tarsal conjunctiva. TI is present when inflammatory thickening obscures more than 50% of the deep tarsal vessels.8 Graders were certified if they had greater than 80% concordance with an expert ophthalmologist in field testing. Dacron swabs of the upper right tarsal conjunctiva were obtained and evaluated with a DNA-PCR assay (Amplicor; Roche Diagnostics; referred to as PCR in this report). Laboratory workers were masked to patient information and clinical examination results, and laboratory testing was unavailable at the time of clinical examination. Multiple controls were used in PCR testing, including positive and negative laboratory controls and negative and duplicate field controls.7 The study protocol adhered to the tenets of the Declaration of Helsinki, informed consent was obtained from all participants, and the study had institutional review board approval from all participating centers. The methodology is described in more detail elsewhere.6,7,11
We used an LCA model20 parameterized to account for village-level clustering. Our model introduces a latent gold standard, which is a categorical variable representing a latent or unknowable true disease state. The latent gold standard divides the population into latent classes representing “disease present” and “disease absent.” In the LCA model, all test results are due to the interaction of latent class status prevalence with the sensitivity and specificity of the tests. In our case, we had 6 sensitivity/specificity parameters and 40 prevalence parameters. Using test values for each of these parameters, we directly calculated the expected frequency tables for each village. To optimize the parameters, we minimized the sum of the Kullback-Leibler discrepancies:
for all J villages, where F represents the observed count, Fij the expected count of a particular combination of test results in a village, and I the number of entries in each village's frequency table. The Kullback-Leibler discrepancy is a measure of information gained by introduction of the model,21 and this approach yields the maximum likelihood estimate.20 Note that the LCA model that we implemented requires an assumption of test independence, conditional on the latent class. Latent class methods are described in more depth elsewhere.22
In addition, we performed three comparative analyses: TF versus PCR, TI versus PCR, and clinical activity (defined as TF and/or TI) versus PCR, using the clustered LCA with only two diagnostic tests. Typically, an LCA with m dichotomous tests and one dichotomous latent class is identifiable when m ≥ 3.20 Note that, in this case, the degrees of freedom gained by having multiple villages of different prevalence allowed these two test models to be identifiable.
Parameters were optimized by using a downhill simplex (Nelder-Mead) method (Mathematica 7.0 software; Wolfram Research, Champaign, IL). Percentile confidence intervals were obtained by iterating 999 bootstrap resamples at the village level (to account for clustering). P values were computed by comparing estimates from each model to estimates from the primary model for each resample. To make the density plot for each test, we created a beta distribution for each village's test results and calculated the arithmetic mean.
Clinical examination and PCR data were available for 2111 subjects in 40 villages. The median village prevalence of TF of 1- to 5-year-olds was 73.5% (interquartile range [IQR] 67.9–83.3). Median TI prevalence was 29.9% (IQR 21.1–42.9). Median PCR prevalence was 46.2% (IQR 34.1–60.5). Median latent class prevalence was 53.1% (IQR 38.8–69.9). Figure 1 shows the distributions of prevalence among the villages, separated by test.
Figure 2 compares the prevalence of TF, TI, or PCR with the LCA prevalence for each village. The diagonal line represents the performance of a true gold standard; points in the region above the line corresponds to overdiagnosis and points below to underdiagnosis relative to the latent class. There were various levels of correlation between each of the test prevalences and the latent class prevalence, with an R2 of 0.24 for TF, 0.55 for TI, and 0.97 for PCR.
We performed a clustered form of LCA to estimate the sensitivity and specificity of three tests of ocular chlamydia infection (TF and TI for clinical trachoma and PCR for chlamydial DNA) and provide village-level estimates of the trachoma prevalence. In making these estimates, we did not assume a gold standard. We estimated TF to be 87.3% sensitive and 36.6% specific, TI to be 53.6% sensitive and 88.3% specific, and PCR to be 87.5% sensitive and 100% specific.
TF, the diagnostic test used in the WHO trachoma guidelines, had its advantages. It is inexpensive, yields instant results, and is sensitive; 87.3% of latent class positives would be expected to be TF positive. However, this analysis suggests that it is poorly specific (36.6%) and tends to overestimate infection rates (Fig. 2), which could lead to unnecessary treatment. Low specificity may be the result of the kinetics of infection in which follicles may persist long after the infection has been cleared23 or in which follicles may reoccur without the presence of chlamydia.14 Despite low specificity, the advantages of TF—namely, low cost and immediate results—ensure that it will remain critical in disease surveys.
TI is not currently used in the WHO treatment protocols. TF is the preferred test, because it is more closely related to the historical MacCallan classification and because TI is often overdiagnosed in the presence of redness or scarring (Taylor H, personal communication, 2011). Nevertheless, this study suggests that TI is far more specific (88.3%) than TF (36.6%). TI has been shown to correlate more strongly with PCR results10,15 and to have higher chlamydial loads by quantitative PCR.24 Our results suggest that it lacks sensitivity (53.6%) and tends to underestimate prevalence (Fig. 2). Again, this may be due to infection kinetics; TI tends to resolve sooner in the course of infection than does TF.25
Another trachoma study used a hidden Markov model, which shares traits with latent class models, to analyze longitudinal data. Briefly, their model had a hidden true disease state, analogous to our latent class, and diagnostic tests approximated this disease state with parameterized sensitivity and specificity. They estimated clinical activity (presence of TF and/or TI) to be 97% sensitive and 93% specific.26 Their results are difficult to directly compare with ours for two main reasons. First, their model used longitudinal data and was designed with the goal of estimating duration of infection, not test characteristics. Second, we used PCR as a diagnostic test, whereas they used an immunoassay.
Our estimates of PCR sensitivity (87.5%) and specificity (100%) closely agree with the best current estimate in a recent review in which sensitivity was estimated at 90% to 100% and specificity at 95% to 100%.3 Although the Amplicor test (Roche Diagnostics) is officially indicated only for urogenital use, it is commonly used in trachoma studies.11,18 In the urogenital literature, similar performance has been reported: sensitivities of 90% to 92% and specificities of 99 to 100%.27–29 Our estimates come from macrolide naive areas with high prevalence. The temporal relationship between infection and clinical examination could lead to different results in other settings, particularly in recently treated or low-prevalence areas. Our estimates should be re-evaluated, as posttreatment data accumulate from ongoing mass-treatment studies.
PCR has limitations, particularly that it may have a false-negative rate as high as 20% (Table 1). The Amplicor test evaluated here detects the cryptic plasmid present in most C. trachomatis. Others have isolated C. trachomatis that lacks the plasmid,30 suggested that other species of Chlamydia can cause trachomatous inflammation,31 and shown that a broadened spectrum of PCR targets increases sensitivity.32 The Amplicor test evaluated here would fail to detect other species or variants with altered or missing, which could be critical in the face of selection pressure. Differences between conjunctival and epithelial specimens, human conjunctival cell yield, DNA extraction efficiency, and removal of molecular inhibitors may also affect test performance33; those factors were not examined in this study and may partially account for PCR's imperfect sensitivity.
Determining test characteristics in the absence of a gold standard is difficult, and latent class analysis has several limitations.34,35 The most relevant limitation in our case is that we assume diagnostic tests are independent, given their underlying latent class disease state. Positive correlation between tests may cause LCA to overestimate sensitivity and specificity,36 and so it is possible that estimates are optimistic. There are methods that directly calculate correlation between diagnostic tests.37–39 These methods all require more than three diagnostic tests for the model to be identifiable. For our data, we were able to perform a two-test LCA comparing TF versus PCR and TI versus PCR (comparative analyses 1 and 2). By ignoring one of the clinical examinations in each of these comparative analyses, we eliminated any possible effect of correlation between TF and TI. The two-test LCA comparing TF and PCR produced estimates for the sensitivity of PCR and the specificity of TF that lay just outside the 95% CIs for the primary analysis. Neither of these differences was significant when accounting for error in each estimate. The results of the two-test LCA comparing TI and PCR lay within the primary analysis CIs. Taken together, these findings suggest that correlation between TF and TI may be playing a small role in our model. Quantifying the relationship further requires more diagnostic tests.
LCA may be particularly well suited to trachoma because of the unclear definition of a case of trachoma. The WHO defines cases based on clinical examination, whereas research studies, such as the source of these data, frequently use laboratory tests such as PCR. These two approaches are fundamentally different; examination detects inflammation whereas PCR detects the causative organism. Examination and laboratory tests are frequently discordant,14,23 possibly due to infection kinetics23 and age-dependent manifestations of infection.40 Latent class is appropriate for trachoma because the trachoma latent class in the model is never specifically defined. Instead, it acts as an unbiased composite of all available data, which is more appropriate than defining cases based on unverified assumptions. This begs clarification of what the latent trachoma class actually represents. In our case, the high sensitivity and almost 100% specificity of PCR suggest that the trachoma class represents chlamydial infection as determined by PCR more than examination findings.
Our LCA approach both reaffirms and challenges some traditionally held views about trachoma. PCR appears to have the specificity of a true gold standard (100%) but lacks sensitivity (87.5%). TF, the diagnosis used in the WHO protocol, is quite sensitive (87.3%) but poorly specific (36.6%). Although TI is no longer used in the WHO protocol, our findings suggest that it could play a role due to its specificity (88.3%) and its strong correlation with LCA prevalence (0.74). The next step is to compare these estimates with those from ongoing clinical trials, particularly those in areas of different prevalence and in areas after treatment.
Supported by National Institutes of Health Grants NIH/NEI K12EX017269 and R21 AI055752 for the Trachoma Elimination Follow-up Study, with additional support from International Trachoma Initiative, Pfizer International, the Bernard Osher Foundation, That Man May See, the Harper Inglis Trust, the Bodri Foundation, the South Asia Research Fund, and Research to Prevent Blindness. Mr. See's research was supported by a grant from the Doris Duke Charitable Foundation.
Disclosure: C.W. See, None; W. Alemayehu, None; M. Melese, None; Z. Zhou, None; T.C. Porco, None; S. Shiboski, None; B.D. Gaynor, None; J. Eng, None; J.D. Keenan, None; T.M. Lietman, None