|Home | About | Journals | Submit | Contact Us | Français|
The Centers for Disease Control and Prevention currently recommends a 2-tier serologic approach to Lyme disease laboratory diagnosis, comprised of an initial serum enzyme immunoassay (EIA) for antibody to Borrelia burgdorferi followed by supplementary IgG and IgM Western blotting of EIA-positive or -equivocal samples. Western blot accuracy is limited by subjective interpretation of weakly positive bands, false-positive IgM immunoblots, and low sensitivity for detection of early disease. We developed an objective alternative second-tier immunoassay using a multiplex microsphere system that measures VlsE1-IgG and pepC10-IgM antibodies simultaneously in the same sample. Our study population comprised 79 patients with early acute Lyme disease, 82 patients with early-convalescent-phase disease, 47 patients with stage II and III disease, 34 patients post-antibiotic treatment, and 794 controls. A bioinformatic technique called partial receiver-operator characteristic (ROC) regression was used to combine individual antibody levels into a single diagnostic score with a single cutoff; this technique enhances test performance when a high specificity is required (e.g., ≥95%). Compared to Western blotting, the multiplex assay was equally specific (95.6%) but 20.7% more sensitive for early-convalescent-phase disease (89.0% versus 68.3%, respectively; 95% confidence interval [95% CI] for difference, 12.1% to 30.9%) and 12.5% more sensitive overall (75.0% versus 62.5%, respectively; 95% CI for difference, 8.1% to 17.1%). As a second-tier test, a multiplex assay for VlsE1-IgG and pepC10-IgM antibodies performed as well as or better than Western blotting for Lyme disease diagnosis. Prospective validation studies appear to be warranted.
Lyme disease (LD) is the most common vector-borne disease in the United States, with a reported incidence of nearly 35,000 new cases annually (10, 21). There are three disease stages: stage I is the early acute phase, characterized by a rash (erythema migrans [EM]) that occurs in at least 70% of patients; stage II represents early disseminated infection, including lymphocytic meningitis, cranial neuropathy, radiculopathy, and Lyme carditis; and stage III represents late disseminated infection, such as Lyme arthritis, axonal peripheral neuropathy, and encephalomyelitis (39). Diagnosis of stage I disease is based on clinical, not serological, criteria, while stages II and III typically require serologic confirmation (37). Despite the predominance of stage I disease, more than 3.4 million tests for LD were ordered in 2008 in the United States (A. Hinckley, Centers for Disease Control and Prevention [CDC], personal communication). Overuse of serology has led to significant problems with false-positive results and misdiagnosis (38).
When first introduced for LD diagnosis, whole-cell enzyme immunoassays (EIAs) and indirect immunofluorescence assays (IFAs) for serum antibodies to Borrelia burgdorferi suffered from a lack of standardization, poor reproducibility, and high false-positive rates (11, 25). Following the Second National Conference on the Serologic Diagnosis of Lyme Disease (27 to 29 October 1994; Dearborn, MI), a 2-tier serologic approach was recommended, comprised of an initial serum EIA or IFA for antibody to B. burgdorferi followed by supplementary IgG and IgM Western blotting of positive or indeterminate samples (9). Furthermore, only IgG blots were recommended for serologic diagnosis more than 30 days after disease onset. Although Western blotting is very sensitive for stage II and III disease, multiple limitations to blot accuracy have been identified: a low sensitivity for stage I disease, false-positive IgM immunoblots, and subjective interpretation of weakly positive bands (1, 5). Western blotting is also labor-intensive and expensive. The goal of the current study was to develop an objective alternative to Western blotting as a second-tier assay.
Diagnostic serology has evolved and now utilizes recombinant and synthetic peptide antigens, such as C6, the 26-mer invariant portion of VlsE1 (variable major protein-like sequence 1); recombinant VlsE1 itself; and pepC10, a 10-mer conserved portion of OspC (2). These surface antigens are expressed by Borrelia burgdorferi during the early phase of mammalian infection (39). The predominant immune responses to C6 and VlsE1 are IgG mediated, even in early disease, while pepC10 generates an early and sometimes lasting IgM response (2, 5, 28). While more specific than whole-cell EIAs, these new assays might not be as specific as Western blotting (5, 40). Although the results are preliminary, microarrays for serologic detection of products of expressed open reading frames represent a promising new technology (3). Diagnostic alternatives to serology remain limited. Cultures of blood and body fluids for B. burgdorferi demonstrate low sensitivity (1). PCR assays for B. burgdorferi DNA from synovial fluid and skin are often positive prior to antibiotic therapy but require invasive procedures to obtain suitable samples and are prone to false-positive results if contamination risk is not rigorously controlled (1). At present, no assays for direct detection of B. burgdorferi have been approved by the Food and Drug Administration (19).
Given the complex nature of the host immune response to B. burgdorferi infection, the use of multiple serologic assays has been proposed to enhance either test sensitivity or specificity (2, 6, 12, 17, 34, 35). Tests derived from continuous data generate binary (positive or negative) results when a cutoff value is chosen to achieve a desired specificity (e.g., 99%). Combining binary test results by using Boolean “OR” logic, such as detecting either IgG antibody to VlsE1 or IgM antibody to pepC10 by kinetic EIA, can generate a more sensitive but less specific assay than the individual test components (2). In contrast, combining tests by using Boolean “AND” logic may produce a more specific but less sensitive assay (35).
Some potentially useful information about specific antibody levels is lost in creating a binary test. For some antibody combinations, multivariate regression models can outperform standard binary assays: they can identify the most important diagnostic tests among multiple options and weight their individual contributions when calculating an overall diagnostic score (31). Regression models can be used for either disease classification (i.e., disease or no disease) or prediction (i.e., probability of disease). Our analyses focus on using receiver-operator characteristic (ROC) regression models to improve disease classification. ROC curves plot the tradeoff between sensitivity and specificity as the test cutoff is varied; in general, the greater the area under the ROC curve (AUC), the better the test. Recent publications have explored the use of ROC curves to compare and optimize the performance of diagnostic tests (30, 32). Full ROC regression analysis optimizes classifier performance by maximizing the area under the entire ROC curve (29). Partial ROC regression is a relatively new bioinformatic technique that can augment test performance within clinically significant portions of the ROC curve (e.g., 95% specificity or higher) (14). We chose to evaluate partial ROC regression models for LD diagnosis because of the need for high test specificity.
The multiplex is a device that can perform multiple antibody assays simultaneously with the same serum sample (20), making this platform attractive for LD diagnosis. We developed a new multiplex immunoassay for VlsE1-IgG and pepC10-IgM antibodies, interpreted using partial ROC regression techniques, as an objective alternative to Western blotting.
Data set A consisted of the following: (i) 79 prospectively collected sera from patients with culture-proven, early-acute-phase LD (stage I; EM) and 78 early-convalescent-phase sera from the same patients; (ii) 4 retrospectively collected convalescent-phase samples from patients with culture-proven EM; (iii) 47 prospectively collected sera from patients with stage II and III LD (16 with early neurological disease, 2 with myocarditis, and 29 with Lyme arthritis); and (iv) 34 retrospectively collected sera obtained following treatment for extracutaneous disease (n = 16) and erythema migrans (n = 18) (A. Steere, Boston, MA, and the CDC, Fort Collins, CO). Of the 16 patients with early neurological disease, most had more than one disease manifestation, including facial palsy (n = 11), meningitis (n = 7), radiculopathy (n = 4), and optic neuritis (n = 2). All prospectively collected sera were obtained during prior investigations and evaluated retrospectively for this study. All patients from data set A had histories consistent with exposure to North American B. burgdorferi and met the CDC case definition for Lyme disease (7, 8).
Data set B consisted of 446 consecutive uncharacterized samples submitted to the New York State Department of Health (NYSDOH) for routine LD serology between 2006 and 2007 (S. J. Wong, Albany, NY); no clinical data were available for the patients. Of the latter samples, 164 were standard 2-tier serology positive.
Samples were collected with informed consent during previous studies (set A) or for nonresearch purposes (set B). This research was approved by the institutional review boards of Saint Francis Medical Center, Trenton, NJ, and the NYSDOH, Albany, NY; samples were deidentified prior to testing, and requirements for additional informed consent were waived. Laboratory personnel were blind to the multiplex diagnostic score when performing other assays.
Uninfected controls included 300 healthy blood donors from New Mexico (where LD is not endemic), 300 healthy blood donors from New England (where LD is endemic), 99 patients from New Mexico undergoing routine screening examinations, and 95 patients with potentially cross-reacting conditions from an area where LD is endemic. The latter conditions included Epstein-Barr virus infection (20), toxoplasmosis (10), rheumatoid arthritis (10), anti-nuclear antibody-positive status (10), leptospirosis (10), syphilis (10), rubella (10), and other conditions (15).
Initial samples collected less than 6 months after the start of treatment for stage II or III disease were considered representative of the maximal immune response (group 1), while samples collected 6 or more months after the beginning of treatment were considered representative of a waning immune response (group 2) (33). Similarly, early-convalescent-phase samples collected less than 30 days after the start of antibiotic treatment were considered representative of the maximal immune response (group 1), while samples collected 30 or more days after the beginning of treatment were considered representative of a waning immune response (group 2) (18). The immune responses of those with untreated early acute disease were considered separately.
Recombinant VlsE1 protein was produced using Escherichia coli Sure2 with a pVlsE1-His3 fusion protein plasmid construct (supplied by S. Norris, University of Texas Medical School, Houston, TX) and was purified using His and heparin affinity columns (27). Synthetic pepC10 (PVVAESPKKP-OH) was obtained from NeoMPS, Inc. (San Diego, CA).
The AtheNA Multi-Lyte test system (Zeus Scientific, Inc., Branchburg, NJ) is a sandwich immunoassay based on flow cytometric separation of fluorescent microparticles by use of Luminex xMAP technology (Luminex Corp., Austin, TX) (20). Briefly, multiple sets of 5.6-μm polystyrene beads are each impregnated with fluorescent dyes that give them distinct spectral signatures (20). For this study, VlsE1 and pepC10 antigens were covalently bound to the surfaces of separate sets of beads. All patient samples and assay controls were diluted 1:21 by combining 10 μl of specimen with 200 μl of the specimen diluent and mixing them for 30 s on a shaker plate at 800 rpm. A 50-μl mixture of bead sets containing VlsE1-conjugated microspheres, 4 calibrators, and a bead set to detect nonspecific binding was added to each filtration well, resuspended by vortexing and sonication for 30 s each, and then incubated with 10 μl of diluted specimen for 30 min. All incubation steps required mixing on a shaker plate as described above, followed by vacuum washing with 200 μl of phosphate-buffered saline (PBS) three times. The bead sets were incubated with 150 μl of phycoerythrin (PE)-labeled goat anti-human IgG gamma antibody (Moss Inc., Pasadena, MD) for 30 min and then vacuum washed. A 50-μl mixture containing 2 additional bead sets was added to each filtration well: 1 set was conjugated with VlsE1, and the other was conjugated with pepC10. The bead sets were resuspended as described above, and another 10 μl of diluted specimen was added to each well and incubated for 30 min. The bead sets were vacuum washed, and 150 μl of PE-labeled goat anti-human IgM mu (Moss, Inc.) was added to each well and incubated for 30 min. After being vacuum washed, the bead sets were resuspended in 150 μl of PBS and the results were read by a flow cytometer. Using a proprietary method, IgG and IgM levels were measured simultaneously, using a single excitation laser and a single reporter molecule (PE). Antibody levels were measured in AtheNA units (AU), but the test combination (AtheNA bioinformatic score [BIS]) was computed using the algorithm described below and rescaled such that a score of ≥1.0 was considered positive; interpretive software is available through the corresponding author. All patient specimens were processed by S. J. Wong at the NYSDOH; control specimens were processed by both Zeus Scientific, Inc., and the NYSDOH.
All specimens were tested with the following assays: (i) Zeus whole-cell EIA, (ii) IgG and IgM Western blotting for whole-cell EIA-positive and -equivocal specimens (MarDx Diagnostics, Inc., Carlsbad, CA), and (iii) C6 IgG/IgM EIA (Immunetics, Inc., Boston, MA). All assays were performed in accordance with the manufacturers' instructions. Western blots were interpreted in accordance with current CDC guidelines for 2-tier testing (9).
For standardization purposes, individual cutoffs for VlsE1-IgG and pepC10-IgM antibodies were determined in AU by Zeus Scientific, Inc. (Branchburg, NJ). The cutoff for VlsE1-IgG, 31 AU/ml, was 8 standard deviations above the mean for 25 healthy controls from an area where LD is not endemic, while the cutoff for pepC10-IgM, 24 AU/ml, was 4 standard deviations above the mean. There were 3 testing sites and 2 technicians at each site. Five serum standards were prepared, with antibody concentrations ranging from negative and near the cutoff to highly positive. Each standard was run in triplicate twice each day for 5 consecutive days by each technician at each location.
A broad selection of second-tier classifiers was created to provide alternatives to Western blotting. Each multivariate classifier, including partial ROC regression models, generated a composite score for each sample by using a weighted linear combination of pepC10-IgM and VlsE1-IgG antibody levels measured by the multiplex assay. Examples of second-tier classifiers included VlsE1-IgG alone, pepC10-IgM alone, binary combinations of pepC10-IgM and VlsE1-IgG using individual antibody cutoffs, and combinations of antibody levels by logistic likelihood regression, full ROC regression, partial ROC regression trained using the 95% to 100% specificity portion of the ROC curve (95% pROC), and partial ROC regression trained using the 60% to 100% specificity portion of the ROC curve (60% pROC). See Appendix A for a detailed description of the partial ROC regression technique. The 95% pROC analysis focused on achieving high specificity, while the 60% pROC analysis emphasized high sensitivity.
To determine the optimal training set, each classifier was trained on each disease stage from data set A, using all samples from a given stage. A classifier trained on one disease stage from data set A was then tested against the other disease stages in data set A and against data set B. Training controls consisted of 249 sera from an area where LD is not endemic, and testing controls consisted of 545 samples from blood donors from both areas where LD is endemic and those where it is not endemic, as well as samples from patients with potential cross-reacting conditions. Partial ROC areas were used to identify the optimal disease stage for training purposes; otherwise, classifier sensitivity and specificity were used to compare performances. A median bootstrap method was used to generate 95% confidence intervals for the differences between classifier sensitivities, specificities, and partial ROC areas (16).
Classifier sensitivities were compared at both Western blot specificity (95.6%) and 99.0% specificity among 545 testing controls, corresponding to 69.2% specificity and 93.6% specificity, respectively, among the 78 EIA-reactive testing controls (the target population for second-tier assays). Classifier performance was compared at 99% specificity because the latter value generates a higher test accuracy in low-incidence settings (2, 15). Our primary end point was to determine if the overall sensitivity of the multiplex assay used as a second-tier test was noninferior to Western blotting (margin, ≤10%; two-tailed α = 0.05) at 95.6% specificity. As a secondary end point, the overall specificity of the multiplex assay was compared to that of Western blotting by bootstrapping at an equivalent sensitivity. Post-antibiotic-treatment sera (group 2) were excluded from the analysis of primary and secondary end points because their significance was uncertain.
Within-site variance was determined using one-way analysis of variance (ANOVA). Because only three test sites were utilized, calculating between-site variance using 2 degrees of freedom might significantly overestimate its true value. Therefore, between-site variance was estimated by dividing the sums of squares between sites by 60, the number of test repetitions at each site.
In addition to the bootstrap technique, differences in classifier performance were assessed using the Wilcoxon rank sum test where appropriate. All reported P values were calculated using the latter test unless stated otherwise. MATLAB software was used to estimate regression models, perform the ANOVA, and make statistical comparisons (MathWorks, Natick, MA).
Table 1 evaluates the impact of using training data from different disease stages to maximize the AUC between 95% and 100% specificities and between 60% and 100% specificities. We observed no significant advantage to using training data from one disease stage to maximize the AUC for any other disease stage for a given partial ROC classifier; the same observation held true for logistic likelihood regression and full ROC regression classifiers (data not shown). Therefore, we were not able to identify disease stage-specific classifiers. By training the 95% pROC classifier using stage II and III data and the 60% pROC classifier using early-acute-phase data, we were able to validate both classifiers against early-convalescent-phase data with little risk of overfitting (i.e., overestimating the true performance).
We evaluated the following specificity quantiles to train partial ROC regression classifiers: 60% to 100%, 80% to 100%, 90% to 100%, and 95% to 100% specificities. The overall sensitivity at Western blot specificity (95.6%) was the same using either the 60% to 100% or 80% to 100% specificity quantile but fell as the specificity quantile narrowed to between 90% and 100% (data not shown). A more detailed comparison of the 60% pROC and 95% pROC classifiers is described below.
The log-log scatterplot in Fig. 1 demonstrates VlsE1-IgG and pepC10-IgM antibody levels by disease stage and illustrates the ability of the 95% pROC and 60% pROC regression classifiers to distinguish EIA-reactive case-patients from controls; these samples represent the target population of a second-tier assay.
The ROC curves in Fig. 2 A provide a heuristic means by which to compare classifier performances among EIA-reactive early-convalescent-phase sera (76 EIA-positive and 2 EIA-equivocal sera) and controls (57 EIA-positive and 21 EIA-equivocal sera). Between 80% and 100% specificities, the 95% pROC and logistic regression models outperformed other classifiers, including single-antibody assays. Between 60% and 80% specificities, the 60% pROC and VlsE1-IgG classifiers demonstrated greater sensitivity than the other models, including Western blotting. Potential binary combinations of VlsE1-IgG and pepC10-IgM antibodies are represented by golden dots in each panel of Fig. 2; by varying the cutoff for each antibody separately, we could generate a range of sensitivities while maintaining the same specificity (or vice versa). The partial ROC regression models displayed in Fig. 2A appear more sensitive than most possible binary combinations at any given specificity.
Figure 2B compares classifier performances among 2-tier test-positive samples from data set B. Because no classifier for data set B could generate a sensitivity that exceeded that of Western blotting, we could determine only the relative sensitivities of alternative classifiers for that data set; real differences between classifiers may be muted by this sensitivity ceiling. Data set B (Fig. 2B) demonstrated the same relative sensitivities among classifiers between 80% and 100% specificities as those with data set A (Fig. 2A). The partial ROC regression classifiers in Fig. 2B also appear more sensitive than most possible binary combinations at any given specificity.
In order to identify the best model(s), we compared classifier sensitivities for two data sets at two different specificities (Table 2). Because the proposed multiplex assay is part of a 2-tiered approach, it was necessary to consider all early-convalescent-phase sera from data set A in calculating overall test performance. Among early-convalescent-phase sera, the 95% pROC and logistic regression classifiers provided the best sensitivity at 99% specificity; combining antibody levels using these regression techniques generated 65.9% sensitivity, compared to 53.7% for VlsE-IgG alone and 48.8% for pepC10-IgM alone (P < 0.05 by bootstrapping for each antibody).
At Western blot specificity (95.6%), the 60% pROC and full ROC classifiers provided optimal sensitivity among early-convalescent-phase sera and were statistically superior to (i) Western blotting (difference in sensitivity, 20.7%; 95% confidence interval [95% CI], 12.1% to 30.9%), (ii) the 95% pROC model (difference in sensitivity, 9.7%; 95% CI, 4.0% to 16.2%), and (iii) the logistic model (difference in sensitivity, 11.0%; 95% CI, 4.8% to 17.7%). The performance of VlsE1-IgG assay alone at 95.6% specificity was inconsistent between data sets: it was equal to that of the 60% pROC model with data set A but significantly less sensitive than this model with data set B (P < 0.05 by bootstrapping); in contrast, the performance of the 60% pROC model appeared robust for the choice of data sets. IgM antibody to pepC10 was the least sensitive assay with both data sets at 95.6% specificity.
Differences in classifier performance can also be expressed in terms of specificity at a fixed sensitivity. Comparing the specificity of regression classifiers to that of binary combinations is difficult because there is a range of separate cutoffs for VlsE1-IgG and pepC10-IgM antibodies that together can generate the same sensitivity but produce different specificities. To aid in comparisons, we identified a single cutoff value in AtheNA units for both antibodies, such that the overall sensitivity of the binary combination was equal to that of the 95% pROC classifier among samples from data set A; although the test sensitivity was 70.2% for both classifiers, the specificity of the 95% pROC model was 1.3% higher than that of the binary combination (95.6% versus 94.3%; 95% CI for the difference, 0.6% to 2.2%). The latter difference translated to a 9% improvement in specificity among EIA-reactive controls. When the sensitivity of the 95% pROC classifier was statistically equivalent to that of Western blotting (124/208 samples [59.6%] versus 130/208 samples [62.5%]; 95% CI for the difference, −8.3% to +2.4% among stages I through III combined), the regression model was 3.5% more specific than Western blotting (99.1% versus 95.6%; 95% CI for the difference, 1.9% to 5.1%); this difference translated to a 24.4% improvement in specificity among EIA-reactive controls.
For VlsE-IgG, the within-site coefficient of variation ranged from 19.2% for negative samples to 5.8% for highly positive samples; the between-site coefficient of variation ranged from 13.4% for moderately positive samples to 11.6% for negative samples and 5.4% for highly positive samples. For pepC10-IgM, the within-site coefficient of variation ranged from 15.3% for negative samples to 9.8% for highly positive samples; the between-site coefficient of variation ranged from 10.6% for low-positive samples to 7.5% for negative samples and 1.3% for highly positive samples. The dynamic range for each antibody was approximately 4 log AtheNA units (data not shown).
The standard 2-tier model enjoys a specificity advantage over single-tier assays: because only EIA-reactive samples are evaluated further, sera positive by Western blotting but negative by EIA are eliminated from consideration. Some studies suggest that Western blot specificity is improved 8% to 9% by including a first-tier evaluation (11, 18, 25). The same phenomenon was observed when the multiplex assay utilized a 60% pROC classifier as the second tier of a 2-tiered approach: an initial EIA improved the overall specificity of the multiplex assay by 10.3% (i.e., from 85.3% to 95.6%).
Employing the 60% pROC classifier as the second tier of a 2-tiered model, the multiplex assay was 20.7% more sensitive than Western blotting for early-convalescent-phase disease (Table 3) and 12.5% more sensitive for stages I through III combined (156/208 samples [75%] versus 130/208 samples [62.5%]; 95% CI for the difference, 8.1% to 17.1%; P = 0.008). Because the early-acute-phase and convalescent-phase sera in our study were not from independent groups, we also evaluated classifier performance by using only early-convalescent-phase and stage II/III sera (constituting all group 1 samples). The multiplex assay was 16.3% more sensitive than Western blotting with the latter group (120/129 samples [93.0%] versus 99/129 samples [76.7%]; 95% CI for the difference, 10.0% to 23.4%; P = 0.014).
The false-positive rates of both assays were identical (Table 4). Although positive assays among healthy blood donors from areas of endemicity might be related to past B. burgdorferi infection, the majority of false-positive results in our study were due to IgM rather than IgG blots; prior infection might have resulted in more positive IgG blots than we observed (41).
Four multiplex-positive samples from patients with stage II/III disease were EIA positive in our laboratory but Western blot negative by standard 2-tier criteria (MarDx, Carlsbad, CA) (9). There was enough serum remaining to retest 3 of the 4 samples at a second reference laboratory (A. Steere, Massachusetts General Hospital, Boston, MA), using a Western blot with a VlsE stripe from a different manufacturer (Viralab, Oceanside, CA). On retesting, sera from 2 patients with early neurological disease were positive by EIA and IgM blotting but failed to meet standard 2-tier criteria because they were collected 45 and 64 days after disease onset (37); it is likely that both patients had Lyme disease because (i) recent studies suggest that IgM blots may be useful for diagnosis of neurological disease within 6 weeks of onset (40) and (ii) the first patient demonstrated an IgG-VlsE band and the second patient had EM. A third patient, seen in 1982, had EM and flu-like symptoms followed 1 month later by the onset of facial palsy and meningitis; serologic testing was not available at that time. Serology in the current study was positive only by EIA, but additional serum was not available for retesting by the second reference laboratory. The fourth sample came from a patient with arthritis and was positive by both EIA and IgG blotting on repeat testing. It is possible that differences in Western blot reagents could have contributed to the discordant results between reference laboratories. On the whole, our results suggest no significant difference in test performance between the multiplex assay and Western blotting for the 47 group 1 sera from patients with stage II/III disease (Table 3). Although the multiplex assay was marginally more sensitive than Western blotting for post-antibiotic-treatment sera (group 2), persistently positive serology by either method was not indicative of treatment failure.
The multiplex assay was slightly more sensitive than the C6 IgG/IgM EIA among Western blot-positive sera from data set B (93% versus 86%) and among early-convalescent-phase sera from data set A (89% versus 85%), although neither difference was statistically significant. The multiplex assay was otherwise equivalent to C6 IgG/IgM EIA and was equally specific (96%).
The current study evaluated a multiplex microsphere assay for LD diagnosis using VlsE1-IgG and pepC10-IgM antibodies. Because multiplex systems can perform multiple tests simultaneously in the same sample well, this technology lends itself particularly to the study of LD, an illness with a complex multiantibody host immune response (39). We explored the use of regression classifiers to generate a single diagnostic score from two separate antibody levels; given the importance of high specificity for diagnostic tests for Lyme disease (39), partial ROC regression models were utilized to maximize multiplex performance at specificities of ≥95%. The multiplex assay used in this study performed as well as or better than Western blotting as a second-tier test.
When the sensitivities of the 95% pROC regression model and Western blotting were equivalent, the 95% pROC model was 3.5% more specific than Western blotting (95% CI, 1.9% to 5.1%). When the specificities of the 60% pROC regression model and Western blotting were equivalent, the 60% pROC regression model was significantly more sensitive than Western blotting, being 20.7% more sensitive for early-convalescent-phase disease (95% CI, 12.1% to 30.9%) and 12.5% more sensitive overall (95% CI, 8.1% to 17.1%); about 2/3 of the improvement in overall sensitivity was related to better detection of early-convalescent-phase disease.
No one classifier was superior under all conditions. If the objective of testing is to rule out Lyme disease in a low-risk setting, then the 95% pROC model is a reasonable choice because of its high specificity; in some instances, the pretest risk of LD may be low enough to justify deferring testing altogether (Appendix B). If the clinical picture suggests stage II or III Lyme disease, then the 60% pROC model may be preferred because of its high sensitivity.
Logistic likelihood regression analysis is one of the most commonly used statistical methods for both disease classification and prediction; it has been used to interpret the antibody response to B. burgdorferi by Western blotting (24), kinetic EIA (34), and flagellin-based EIA (13). Unlike logistic models, which maximize the likelihood of disease at a given specificity, ROC regression methods maximize the AUC (29). Pepe et al. (31) demonstrated that regression models that optimize the AUC can offer advantages over logistic models when selected biomarkers are combined. Data from set A demonstrated that full ROC regression and 60% partial ROC regression models were significantly more sensitive than the logistic model at Western blot specificity (95.6%), reinforcing the value of AUC optimization methods for classifier development.
Western blotting was only 69.2% specific among our EIA-reactive control sera, reducing its overall specificity to 95.6%; this specificity is lower than that reported by other investigators (5, 40). Of all false-positive Western blots in the current study, 79% were due solely to IgM antibody, illustrating the limitations associated with that assay (36). If achieving 99% specificity among the healthy population is an important benchmark from a public health perspective (2), then specificity among our EIA-reactive controls would need to be improved to at least 93.6%. Second-tier approaches have been proposed that eliminate IgM blotting by using IgG Western blots in conjunction with a VlsE band (5). The multiplex assay described above offers another alternative.
There are multiple limitations to the current study. The number of patients with stage II and III disease was insufficient to detect significant differences in assay performance. Some samples from data set A and all samples from data set B were collected retrospectively, potentially biasing the study population (30). We were unable to produce stage-specific classifiers, but it is possible that expanding the number of antibodies assayed might help to achieve that goal (e.g., IgGs to DbpA and BmpA). Although we detected benefits from utilizing VlsE1-IgG and pepC10-IgM antibodies together, we cannot extrapolate our results to other antibody combinations. Each test panel requires careful evaluation of classifier performance over a range of acceptable specificities. Because there was no clinical information accompanying the sera in data set B, we cannot be certain how many 2-tier test-positive patients actually had Lyme disease. We did not assess the role of paired acute- and convalescent-phase serology; demonstrating expansion of the IgG immune response by Western blotting may be helpful in diagnosing recent disease (40).
A full decision-analytic evaluation comparing the multiplex assay to Western blotting is warranted but is beyond the scope of this study. We did not calculate costs or benefits related to different clinical outcomes, nor did we provide a means to determine the pretest probability of LD. Formal decision analysis requires these elements, along with knowledge of intrinsic test performance, to help guide test utilization (Appendix B). Integrating pretest risk assessment into clinical workflow is a goal that has not yet been realized; computer-based decision support systems may soon assist with that assessment (23).
A prospective study using the current multiplex assay, particularly for patients with stage II or III disease, would address the above methodological issues and provide guidance to help integrate clinical with laboratory information. The score that we create through our regression model can be expressed as a likelihood ratio and utilized in a Bayesian context (i.e., pretest and posttest probabilities). We believe that the multiplex platform with employment of ROC regression techniques offers substantial promise for improving Lyme disease diagnosis.
We thank Yingying Fan of the Marshall School of Business, University of Southern California, Los Angeles, CA, for her statistical contributions to this work; Hoshang Batliwala of Zeus Scientific, Branchburg, NJ, and Karen Hughes of the Massachusetts General Hospital, Boston, MA, for their laboratory support; and Brad Biggerstaff of the Centers for Disease Control and Prevention, Fort Collins, CO, for his statistical advice.
The findings and conclusions in this article are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention.
Financial support was provided by SBIR-AT-NIAID grants 1R43AI069564-01 and -01S1 to Infectious Disease Consultants, PC, from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, and by a grant from Zeus Scientific, Inc., to Health Research, Inc., Menands, NY, a nonprofit organization which supported the work performed by S.J.W. and K.K. at the NYSDOH, Albany, NY, for this study.
R.B.P. holds 2 patents on bioinformatic methods for Lyme disease diagnosis and has affiliations with Zeus Scientific, Inc., and Infectious Disease Consultants, PC. C.G.H., L.L., and J.F. were consultants to Infectious Disease Consultants, PC. M.K. is an employee of Zeus Scientific, Inc. B.J.B.J., A.C.S., K.K., and S.J.W. report no conflicts of interest.
Partial ROC regression classifier. The score function is derived from a linear combination of test results, βTY, where D is the disease, Y1,..., Yk is a set of k diagnostic tests for D, Y is a vector of diagnostic test results y1,..., yk, D′ is not D, β is a vector of coefficients β1,..., βk for Y, and βT is the transpose of β. For a given cutoff value, c, a test is positive if βTY ≥ c.
With ROC regression, the test panel and β coefficients are chosen simultaneously to maximize the AUC of the empirical ROC, as approximated by the following equation:
where I is the indicator function, N is the total number of study subjects, nD is the number of patients with disease D, nH is the number of healthy controls, nD + nH = N, i = 1,..., nD, i D are patients with disease, j = 1,..., nH, and j H are healthy controls. The ROC curve is smoothed using the sigmoid function as follows:
wherein bias related to values of x close to zero is reduced by introducing a series of positive numbers, σn, such that Sn(x) = S(x/σn) and σn approaches zero as n approaches infinity (29). An optimal set of β coefficients is determined by an iterative gradient descent algorithm using the sigmoid maximum rank correlation estimator (SMRC) described by Ma and Huang (22, 29), as follows:
Raw test results are transformed into a likelihood ratio, and a logistic likelihood model is used to select the initial β coefficients and anchor marker. If feature selection is desired, then a gradient LASSO is applied to the SMRC; for tuning, an L1 constraint of ≤u is chosen using a V-fold cross-validation technique (26, 29). If the regression features are already known, as in the current study, then the SMRC alone is used to optimize β.
If t0 is the maximum false-positive rate permitted by a physician interpreting the tests and is a multiple of 1/nH, then the β coefficients and test panel are chosen simultaneously through partial ROC regression in order to generate the largest area below the partial ROC curve for the (1 − t0) quantile of individuals without disease; the score cutoff, c, is chosen such that SH(c) = t0 (the survival function of patients without disease with a score of c) when the score function βTYj ≥ c. The features are fitted to a truncated set of controls by using the above sigmoid maximum rank correlation estimator and gradient LASSO (14, 26, 29, 34). If the features are already known, then the SMRC estimator alone is used to optimize β; we observed estimator convergence within 100 iterations.
Test/no-test cutoff = [1 + B(sensitivity)/C(1 − specificity)]−1;
where B is the regret associated with failing to treat disease, C is the regret associated with treating someone without disease, and the sensitivity and specificity of a given test are known (4).
Published ahead of print on 2 March 2011.