|Home | About | Journals | Submit | Contact Us | Français|
We propose new classification criteria for Sjögren’s Syndrome (SS), which are needed considering the emergence of biological agents as potential treatments and their associated co-morbidity. These criteria target individuals with signs/symptoms suggestive of SS.
Criteria are based on expert opinion elicited using the Nominal Group Technique, and analyses of data from the Sjögren’s International Collaborative Clinical Alliance. Preliminary criteria validation included comparisons with classifications based on the American-European-Consensus-Group (AECG) criteria, a model-based “gold standard” obtained from Latent Class Analysis (LCA) of data from a range of diagnostic tests, and a comparison with cases and controls collected from sources external to the population used for criteria development
Validation results indicate high levels of sensitivity and specificity for the criteria. Case definition requires at least 2 out of the following 3:
Observed agreement with the AECG criteria is high when these are applied using all objective tests. However, AECG classification based on allowable substitutions of symptoms for objective tests results in poor agreement with the proposed and LCA-derived classifications.
These classification criteria developed from registry data collected using standardized measures are based on objective tests. Validation indicates improved classification performance relative to existing alternatives, making them more suitable for application in situations where misclassification may present health risks.
Sjögren’s syndrome (SS) is a multisystem autoimmune disease characterized by hypofunction of the salivary and lacrimal glands. It is among the group of diseases overseen by rheumatologists, however, its diagnosis and management require three areas of specialty practice: rheumatology, ophthalmology and oral medicine. The multidisciplinary aspect of the disease represents a challenge for definition and validation of classification criteria because there is no single gold standard test for diagnosing SS and, it is not feasible to use a single clinician’s diagnosis for case/control definition. The closest substitute is based on expert assumptions about the characteristics of SS, specifically that it: 1) is a systemic, multi-organ autoimmune disease; 2) has a chronic or progressive course; and 3) is characterized by, but not limited to, secretory dysfunction.
While there have been 11 classification or diagnostic criteria published for SS since 1965 (1–11), none have been endorsed by the American College of Rheumatology (ACR) or European League Against Rheumatism (EULAR). The American-European Consensus Group (AECG) criteria (11) have better specificity than their predecessor (9), as they require evidence of autoimmunity from positive anti-SSA/B serology or focal lymphocytic sialadenitis (FLS) with focus score (FS) ≥ 1 in a labial salivary gland (LSG) biopsy. However, they have been criticized for including subjective tests (symptoms), physiologic measures that lack specificity, and alternate objective tests that are not diagnostically equivalent. For example, the Schirmer test may be used instead of rose Bengal ocular stain, even though they differ in sensitivity and specificity (11). Further, the inclusion of symptoms of dry mouth and/or eyes can lead to misclassification of asymptomatic patients. In addition, physiologic measures, such as unstimulated whole salivary (UWS) flow, unanesthetized Schirmer test and salivary scintigraphy are useful for assessment of salivary or tear function, but lack specificity for SS.
The need for new classification criteria is clear considering the current lack of standardization inherent to use of multiple older criteria in the field, and the emergence of biological agents as potential treatments. Considering the potentially serious adverse effects and co-morbidities of these agents, criteria used for enrollment into clinical trials will need to be clear, easy to apply, and have high specificity. They also must: 1) rely upon well established objective tests that are clearly associated with the systemic/autoimmune, oral, and ocular characteristics of the disease; and 2) include alternate tests only when they are diagnostically equivalent. Furthermore, it is desirable for new classification criteria for SS to be endorsed by professional rheumatology organizations across the world (such as ACR and EULAR) to increase their credibility and maximize standardization when enrolling participants into clinical trials.
The Sjögren’s International Collaborative Clinical Alliance (SICCA) is funded by the National Institutes of Health (12) to develop new classification criteria for SS, better define the SS phenotype, and collect/store clinical data and biospecimens to support future research. We propose new classification criteria for SS, following the ACR guidelines (13) to the extent possible for a condition requiring multiple clinical specialties for diagnosis. Below we describe our approach to criteria development and validation.
We used a consensus methodology derived from the Nominal Group Technique (14) to:
We then engaged in a series of validation exercises. Our overall approach relied on analyses of current SICCA data, and consisted of 4 phases.
We first identified panel members who: 1) were experts in the relevant clinical specialties (rheumatology, ophthalmology, or oral medicine); and 2) constituted a heterogeneous group with respect to geographic area, gender and seniority/level of expertise. The panel included 20 experts: 7 rheumatologists, 6 ophthalmologists, and 7 experts in oral medicine. Nine members (45%) were from the US, and the rest were from four countries on three continents. All of the panel experts practiced their specialty within a university-affiliated medical center. Sixteen (80%) were senior investigators at the professor level, 20% at the associate-professor level, and 40% were women. All investigators had been selected for their experience with SS within their clinical specialty, and for geographic representation.
In February 2004, the panel of experts gathered for a face-to-face meeting moderated by a statistician (SCS) and an epidemiologist (CHS). The goal of this meeting was to obtain consensus (at least 80%) on the target population to whom the classification criteria would apply, and the initial list of variables or criteria items that would be collected as part of SICCA. The meeting began with presentation of a comprehensive literature review by one of the senior investigators (TED) of the 11 previous classification and diagnostic criteria for SS that had been published in the past 40 years, none of which had been endorsed by the ACR or EULAR.
There was consensus among the panel that the criteria should apply to the population of patients who may be referred to a specialist because of signs and/or symptoms possibly suggesting SS. Recruitment strategies and eligibility criteria are described below. The rationale for selecting this target population is that a given patient would not be evaluated for SS unless she/he had signs or symptoms suggesting this diagnosis. There was also consensus that if asked to select cases and controls for validation of new classification criteria, panel members would use objective tests (e.g., specific serum measures of autoimmunity, ocular staining reflecting lacrimal hypofunction, and LSG biopsy reflecting FLS) that would likely be part of the new classification criteria, leading to circularity. Therefore, it was agreed that no diagnostic labels would be used for enrollment, and that all participants would undergo the same set of standardized objective tests, and questionnaires capturing various signs and symptoms.
The panel agreed upon examinations and tests used to assess ocular and oral signs and symptoms, tear and salivary function, LSG biopsy results and various serum measures of autoimmunity. The list created was based both on published results and on the clinical experience of panel members. There was discussion among the rheumatologists regarding which extra-glandular manifestations possibly associated with SS should be captured, and a consensus was achieved regarding a list of signs/symptoms that would be measured through a targeted rheumatologic examination, review of systems, careful medical history and serologic laboratory measures. Similarly, the oral medicine specialists agreed on a list of tests measuring salivary function (both stimulated parotid and UWS flow rates), and salivary gland expression of autoimmunity through biopsy of LSG, examining them for the presence of FLS, and measuring FS accordingly as described in detail elsewhere (15). The ophthalmologists agreed on tests evaluating participants for the presence of keratoconjunctivitis sicca (KCS). There was consensus that, while rose Bengal had been widely used for grading conjunctival and corneal damage in patients with KCS, it is inherently toxic to epithelial cells and very painful for patients. Therefore, fluorescein was selected to grade the cornea and lissamine green the bulbar conjunctiva. Effectiveness for grading KCS is established for both (16). They agreed on a standardized quantitative grading system that would be easily reproducible and could be used in clinical practice in the future (17). Ocular staining score (OSS) is the sum of a 0–6 score for fluorescein staining of the cornea and a 0–3 score for lissamine green staining of both nasal and temporal bulbar conjunctivae, yielding a total score ranging from 0 to 12. Alternative established tests for dryness used in prior criteria, such as tear break-up time (TBUT) and unanesthetized Schirmer test, were also included.
The final list of criteria items that was agreed upon by the end of the first meeting included nearly all those previously reported in the relevant literature. It has been described previously (12) and is available at http://sicca.ucsf.edu.
Following 2 years of standardized data collection including the criteria components selected in phase 1, another face-to-face panel meeting was convened in April 2006. Data-analysis summaries were presented to the group by the epidemiologist and statistician who moderated the initial meeting, and the panel was divided into small specialty-specific focus groups to review evidence-based results for each clinical specialty. The goals of these exploratory analyses included understanding the relationship between variables representing the oral/salivary, ocular and systemic features of the disease, defining cut-off values for particular tests that could be used as components of the classification criteria, and assessing the value of tests that could serve as surrogates for primary objective tests in alternate criteria sets. Methods used included frequency tables, binary regression and classification trees. We also used area-proportional Venn diagrams to visualize the overlapping relationships of three variables simultaneously, each representing one of the primary disease features (18).
The panel met in May 2009 to decide on preliminary classification criteria. Additional analyses were presented, including longitudinal assessment of the stability of criteria components over time (based on results from scheduled 2-year follow-up visits that mirrored baseline assessments). Results from a statistical classification based on latent class analysis (LCA) (19) were presented (approach further described below). The data presented to the panel represented a subset of participants (n=1107) consecutively enrolled as of April 1, 2009.
Following presentation of results by the statistician, a discussion among panel members was moderated by the epidemiologist and statistician. Its goal was for members to select, based on their understanding of the data analyses presented, and on their clinical experience, which objective test(s) they felt was/were the most specific for SS within their own specialty. Furthermore, various preliminary classification criteria were discussed, and panel members were asked to select which criteria they felt would best classify patients with SS. Following this third face-to-face meeting, a report summarizing the data analyses was circulated among panel members, and a questionnaire distributed by email. The questionnaire was designed to assess consensus among the expert panel members regarding the preliminary classification criteria. It also measured the level of consensus within each specialty regarding which criteria component was felt to have the highest level of face validity within that specialty. The response to each question was on a scale of 1 to 5, with 1 indicating strong disagreement and 5 indicating strong agreement.
The participants in the SICCA cohort have been enrolled since 2004 at five collaborating academically-based research groups, located in Argentina, China, Denmark, Japan and the United States, and directed from the University of California, San Francisco (12) (Table 1). Subsequently, additional research groups joined the SICCA project: in 2007, from the United Kingdom and in 2009, from India and two additional sites in the United States.
To be eligible for the SICCA registry, participants must be at least 21 years of age and have at least one of the following: symptoms of dry eyes or dry mouth; a previous suspicion or diagnosis of SS; elevated serum antinuclear antibodies (ANA), positive rheumatoid factor (RF), or anti-SSA/B; bilateral parotid enlargement in a clinical setting of SS; a recent increase in dental caries; or have diagnoses of rheumatoid arthritis or systemic lupus erythematosus and any of the above. The rationale for these eligibility criteria is that only patients with such characteristics would be evaluated for SS or considered for enrollment in a clinical trial designed to evaluate a potential therapeutic agent for SS. Therefore our classification criteria target individuals with signs and symptoms that may be suggestive of SS, not the general population.
Participants are recruited through local or national SS patient support groups, healthcare providers, public media, and populations served by all nine SICCA research groups. Exclusion criteria include known diagnoses of: hepatitis C, HIV, sarcoidosis, amyloidosis, active tuberculosis, graft versus host disease, autoimmune connective tissue diseases other than rheumatoid arthritis or lupus; past head and neck radiation treatment; current treatment with daily eye drops for glaucoma; corneal surgery in the last 5 years to correct vision; cosmetic eyelid surgery in the last 5 years; or physical or mental condition interfering with successful participation in the study. Contact lens wearers are asked to discontinue wear for 7 days before the SICCA examination. We do not exclude participants taking prescription drugs that may affect salivary or lacrimal secretion, but record their use and all other medications currently taken.
In the absence of a gold standard diagnostic test for SS, conventional methods of validation based on direct estimation of quantities such as sensitivity and specificity are not directly applicable. The practice of defining a gold standard based on a series of cases and controls identified by expert clinicians is also not practical for SS, because diagnosis must rely on three clinical specialties. Further, since such diagnoses rely on the same tests that form the basis of proposed criteria, estimates of sensitivity and specificity will be inherently biased.
Acknowledging these difficulties, we based initial evaluation and validation of preliminary criteria primarily on a data-based approach including:
Results reported here are based on complete participant data on key diagnostic features from 6 SICCA sites collected through March 2010. In addition, for external validation we utilized data from approximately 300 participants not included in the data set used for criteria development, and representing 10 months of additional recruitment. We excluded participants with rheumatoid arthritis, systemic lupus, scleroderma, or other connective tissue disease from these analyses since there were only 87 (6%) such participants. Thus the proposed classification criteria apply to a target population of individuals who do not have SLE, RA, or other connective tissue diseases. The methodological approaches for each of the steps outlined follows:
We considered alternate sets of criteria, based on substituting simpler and/or less invasive tests for the preliminary criteria. These included: 1) substitution of UWS flow rate for the LSG biopsy with FLS and FS ≥ 1 focus/4mm2; 2) substitutions of TBUT < 10 seconds, or unanesthetized Schirmer test ≤ 5mm/5 min for OSS ≥ 3; 3) positive RF, ANA ≥ 1:320, positive serum anti-SSA/B, and each of the three used individually to represent the serologic component of the disease. Performance was assessed via sensitivity and specificity estimated taking the preliminary criteria as a “gold standard”, and summarizing results with exact 95% binomial confidence intervals (CI). The results of these analyses were used to evaluate possible effects of such substitutions on classification performance of the preliminary criteria.
An assessment of plausible levels of sensitivity and specificity of the preliminary and alternate criteria were provided using latent class analysis (LCA) (19). LCA provides a model-based clustering of individuals into a specified number of “disease” classes based on the observed patterns of a series of binary predictor variables representing presence or absence of important diagnostic features. The resulting classes can then be related to disease status based on the class-specific patterns of diagnostic features. Because LCA methods rely on the restrictive assumption that component test results are independent conditional on the true classification (20), we applied a variant of LCA that relaxes this assumption (21). We fitted a series of models, presuming as few as one and as many as four true disease classes. Models were based on ten predictor variables encompassing the major ocular, oral/salivary and systemic features of the disease. We also used a standard multivariate clustering procedure known as k-means (22). We used the results of LCA and clustering analyses to provide alternate model-based disease classifications to which the preliminary criteria (and alternate versions of these criteria) could be compared. This allowed estimation of sensitivity and specificity using the model-based classification as a “gold standard” (23, 24). In the absence of knowledge of the true disease classification, the accuracy of these estimates cannot be assessed. However, comparison of results between alternate versions of criteria allows an assessment of a plausible range of sensitivity and specificity of proposed criteria. Further, consistency of results between alternate methods of deriving model-based standards helps establish stability of conclusions and reveals possible dependence of conclusions on assumptions inherent in the models.
Supplementary analyses included use of random forest classification as a means of assessing importance of individual tests in predicting the model-based gold standard. The random forest approach is a generalization of standard classification trees (25). It is applied to a collection of predictor variables measured on individuals with known outcome classification to build a nonparametric classification rule that predicts the outcome as accurately as possible. One of the outputs of this analysis is a variable importance ranking for the predictors. We applied this approach to the classification produced by the latent class model, using the same predictor variables as inputs.
To investigate how preliminary criteria compare to previous criteria, we classified participants in the validation sample using both the preliminary criteria and the AECG criteria (11). The 2002 AECG criteria for SS (11) are a modification of the 1993 European criteria (9) based on re-analysis of 180 cases selected from the original data set. It applies six types of clinical signs or tests: ocular symptoms, oral symptoms, ocular signs (Schirmer test or ocular staining), histopathology (FLS in minor salivary glands), salivary involvement (reduced UWS flow, parotid sialography or salivary scintigraphy) and serum auto-antibodies (anti-Ro, anti-La or both). Primary SS is indicated in the presence of any four of these six as long as either the histopathology or serology is present and none of seven exclusions are present. The AECG criteria were defined for SICCA participants using the specified oral/salivary, ocular and systemic components, substituting the SICCA OSS for rose Bengal staining, and a definition of participant-reported ocular and oral symptoms based on questions most closely matching the corresponding questions used in the AECG criteria. Because of the flexibility inherent in the definition of the AECG criteria, we considered alternate classifications using: a) all available tests, b) restricting the ocular test to be based on the Schirmer only, c) restricting the oral/salivary test to be based on the UWS only, and d) restricting both salivary and ocular tests to be the UWS and Schirmer, respectively. Consistency of the two approaches is summarized by estimated proportions of agreement and disagreement, with 95% CIs, and using the Kappa statistic.
Validation of proposed criteria with a set of expert-defined disease cases and disease-free controls collected from sources external to the study population used for criteria development is a common means of validating classification criteria. Despite the potential for circularity in the expert assessments arising from use of the diagnostic variables comprising the proposed criteria, this type of validation can potentially yield complementary information to the other approaches just described. To provide a preliminary assessment of this type, we obtained a series of disease cases from two sites recently added to the registry. The directors of the Johns Hopkins University (JHU) site (ANB) and of the University of Pennsylvania (UPenn) site (FV) were asked to identify patients whom they (or their Rheumatology faculty practice colleagues) had diagnosed as having SS, using standard clinic procedures, prior to entry into the SICCA registry. We could not use clinical diagnosis to identify controls, since only people with suggested signs/symptoms of SS are referred to the SICCA Registry (as in real clinical practice, only those with suggested signs/symptoms of SS would receive a work-up to confirm/rule out the disease). Therefore, controls were selected among participants observed to be negative according to the AECG criteria (described above) and recruited subsequent to the final date for inclusion in the sample of participants considered for criteria development. We compared the case/control classification to that obtained using the preliminary criteria, taking the former as the “gold standard” for the purpose of estimating sensitivity and specificity.
To examine temporal stability of the preliminary classification criteria, we compared individual classifications made using test results from enrollment visits with classifications made on two-year follow-up visits. Results are summarized by estimated proportions of agreement and disagreement, with 95% CI.
A total of 1,618 participants were enrolled in the SICCA Registry as of March 8, 2010. A summary of demographic characteristics and phenotypic features of SS are summarized in Table 1.
High proportions of participants in the SICCA registry complained of symptoms of dry mouth, dry eyes or both (Table 1). However, as reported previously, dry eyes/mouth symptoms did not show statistically significant association with the presence of FLS, serum anti-SSA/B, or an OSS ≥ 3 (12, 26). Numbers of participants not complaining of dry eyes, dry mouth, or either were 247, 154 and 62, respectively. While these individuals represent no more than 15% of the cohort, 39% to 49% of these asymptomatic patients had positive anti-SSA/B, 36% to 41% had LSG biopsies with FLS and FS ≥ 1, and 63% to 73% had OSS ≥ 3.
Analyses investigating the associations between phenotypic features of SS found that the odds of having positive anti-SSA/B serology was 12 times higher among those with FS ≥ 1 than among those with FS < 1 or without FLS (95%CI: 9.3; 15.5). Those with FLS and FS ≥ 1 were 4 times more likely to have an OSS ≥ 3 than those with FS < 1 or without FLS (95%CI: 3.1; 5.3). The association between OSS ≥ 3 and positive anti-SSA/B serology was also strong (odds ratio (OR) = 4.8; 95%CI: 3.6; 6.4), but much less so than the association between FLS with FS ≥ 1 and positive anti-SSA/B serology. The relationships among these three measures, as depicted by Figure 1, defined a large group of participants who had KCS without other components of SS (KCS-only) representing a clinical entity distinct from the KCS associated with SS (17).
Diagnostic confirmation of participants’ histories of thyroid, liver or kidney diseases or lymphoma was sought from the diagnosing physician and obtained for 78% of those reported histories. The prevalence of confirmed thyroid, liver and kidney EGM included 18 diagnoses of Graves disease, 43 of Hashimoto’s thyroiditis, 15 of primary biliary cirrhosis, 3 of renal tubular acidosis, 2 of interstitial nephritis, and 5 diagnoses of lymphoma. We found strong associations between phenotypic features of SS and serologic characteristics of autoimmunity. For example, participants with FLS and FS ≥ 1 (compared to those with FS < 1 or without FLS) were 9 times more likely to be RF positive (95%CI: 7.0; 11.4), to have higher ANA titers (≥ 1:320; 95%CI: 7; 11.3), were 14 times more likely to have hypergammaglobulinemia with IgG > 2013 (95%CI: 9.3; 21.1), and 2.4 times more likely to have hypocomplementemia with C4 < 16 (95%CI: 1.7; 3.3). Similarly, participants with OSS ≥ 3 (compared to those with OSS < 3) were 4 times more likely to be RF positive (95%CI: 3.0; 5.3), 5 times more likely to have ANA ≥ 1:320 (95%CI: 3.5; 6), 7 times more likely to have hypergammaglobulinemia (95%CI: 4; 11.3), and twice as likely to have hypocomplementemia (95%CI: 1.2; 2.7).
As part of Phases 2 and 3 of our Consensus Methodology, earlier versions of the analyses summarized above were presented and discussed, in addition to classification tree analyses and various iterations of the Venn diagram presented in Figure 1. These demonstrated the strong interrelationship between the 3 main serologic, ocular, and oral/salivary phenotypic features of SS measured by objective tests (anti-SSA/B positive serology, FLS with FS ≥ 1, and OSS ≥ 3). The rheumatologists discussed potential roles for RF and ANA titers in the absence of anti-SSA/B positive serology in the classification criteria. The consensus was that a positive RF or ANA in the absence of a positive anti-SSA/B would be too non-specific. However, there was strong support for substituting both positive RF and high titer ANA in the absence of anti-SSA/B as a way to capture participants who have negative anti-SSA/B serology, but strong expression of autoimmunity from these 2 other tests. The relative importance of each of the 3 main serologic, ocular, and oral/salivary phenotypic features of SS measured by objective tests was discussed within each sub-group of the panel (rheumatology, ophthalmology, and oral medicine). Furthermore, various combinations of the 3 main phenotypic features of SS measured by objective tests, such as at least 1 out of 3, 2 out of 3, or 3 out of 3 were discussed by the entire panel.
Results from a questionnaire administered following Phase 3 revealed high consensus among each of the clinical specialties. More specifically, 6 of the 7 rheumatologists (86%) either agreed or strongly agreed that positive anti-SSA/B serology represented the most specific serologic marker of SS, and 86% also felt that positive RF and ANA ≥ 1:320 represented a satisfactory substitute for a negative anti-SSA/B serology. All 6 ophthalmologists either agreed or strongly agreed that an OSS ≥ 3 (using lissamine green and fluorescein) represented the most specific way to diagnose the ocular component of SS. Only one ophthalmologist (17%) agreed that TBUT represented the next best substitute, and none agreed with the use of the Schirmer test as a specific measure of the ocular component of SS. All 7 oral medicine specialists strongly agreed that presence of FLS in a LSG biopsy with a FS ≥ 1 focus/4mm2 was the most specific test to determine the presence of the salivary component of SS. There was also 100% consensus in that group that neither UWS nor stimulated parotid flow rate would represent specific measures of the salivary component of SS. Among the entire panel, 86% agreed or strongly agreed that the preliminary criteria for SS should be at least 2 out of 3 of the following objective tests:
Thirteen panel members (62%) either disagreed or strongly disagreed, while 4 (19%) agreed that the preliminary criteria for SS should be 3 out of 3 of the above objective tests. There was 100% consensus that preliminary classification criteria for SS could not be limited to only one of the 3 objective tests.
Exclusion criteria included those initially defined in our methods section. It was also agreed that IgG4-related disease would be among the exclusion criteria. IgG4-related disease is a relatively new clinical entity characterized by increased serum IgG4 (>135mg/dl) and marked infiltration of IgG4 positive plasma cells in various organs, especially pancreas (so called autoimmune pancreatitis), lacrimal, submandibular and parotid glands (27).
For consistency, results for this and subsequent validation analyses are based on a subset of 1,362 participants with complete data for ten individual tests listed in the section on validation using LCA (i.e., participants with pending test results due to batch shipping of specimens from international sites were not included in these analyses). These tests were selected based on our preliminary analyses to represent the range of oral/salivary, ocular and systemic features that characterize the disease, and also because they encompass characteristics used in previously developed criteria. The cases and controls defined according to the preliminary criteria were first used to explore possible sensitivity and specificity of alternate sets of criteria, each defined by substituting one component with an alternate test (Table 2).
Based on preliminary analyses, UWS was the only alternate oral/salivary measure considered that demonstrated a strong association with other objective tests such as positive anti-SSA/B serology or a FS ≥ 1. Classification based on substituting this for the LSG biopsy had a sensitivity of 89.8 (95% CI 87.2–92.0), but a low specificity of 74.3 (95% CI 71.0–77.5). Stimulated parotid flow rate was found to have a high number of missing observations (mostly because of technical difficulty encountered by examiners across multiple sites). We therefore did not include this variable in our analyses. Alternate classifications obtained by substituting TBUT or the Schirmer test for the OSS, yielded high sensitivity and specificity (94.8 and 94.4, respectively) for the former, and low sensitivity (74.8) for the latter.
A series of LCA models were fitted to results of ten diagnostic variables representing a wide range of ocular, oral and systemic features of the disease for the 1,362 participants in the validation sample. Results indicated that a model with two latent classes fit adequately, with no significant improvement observed with the addition of a third class. Assignment of the disease “case” and “control” status was based on examination of observed patterns of results from the ten component tests used as predictors in model fitting. Cases had clearly higher observed prevalence of positive results for the majority of these tests. Table 3 lists estimated sensitivity and specificity values from ten component tests used for fitting the random effects LCA. These estimates provide an indication of the importance of individual test results in predicting the overall disease classification provided by the model. Results indicate that a FS of at least 1, and positive serology for anti-SSA/B provide the best overall combinations of sensitivity and specificity. The OSS with a cut-off of 3 yielded relatively high sensitivity but low specificity, and the alternate measure based on TBUT performed similarly. Indicators of presence of ocular or oral symptoms were very non-specific, while alternate systemic measures based on ANA and RF performed similarly to anti-SSA/B, with somewhat lower values for sensitivity and specificity. A companion analysis using random forest classification (25) ranked the component tests in the following order of importance in determining the LCA results: FLS with FS ≥ 1, positive serum anti SSA(Ro) and/or anti-SSB(La), ANA ≥ 1:320, positive RF, OSS ≥ 3, UWS < 0.1 mL/min, Schirmer test ≤ 5mm/5 min, TBUT < 10 sec, symptoms of dry mouth, and symptoms of dry eyes. Restricting component tests to exclude symptoms had no discernible effect on results of the LCA.
Table 4 presents estimated sensitivity and specificity values for the alternate sets listed in Table 2 and the preliminary criteria, compared to LCA classification. Results indicate that the preliminary criteria provide the best overall levels of both sensitivity (96.3; 95%CI: 94.3; 97.7) and specificity (83; 95%CI: 80.3; 85.5) relative to alternate sets. Alternate model-based classification approaches, including conventional LCA and K-means clustering yielded similar estimates (not shown) as those displayed in Table 4.
In Table 5 we compare 4 versions of the AECG classification against the preliminary SICCA criteria, taking the latter as the “gold standard”. In the case where all diagnostic tests are available, classification by the AECG depends more heavily on the results of the LSG and anti-SSA/B status. In this situation, we would expect results comparable to the preliminary SICCA criteria. This is confirmed by the results for sensitivity, specificity and overall agreement (as measured by the Kappa statistic) in Table 5. The level of agreement decreases for alternate versions of the AECG criteria defined by substituting alternate tests for the ocular and oral/salivary components of the disease. As noted previously, requiring the presence of dry eye/mouth symptoms will exclude some asymptomatic patients. Estimated sensitivity and specificity for the full AECG criteria for predicting the “gold standard” classification based on LCA were 88.6 (95% CI: 85.6–91.1) and 81.8 (95% CI: 79.2–84.4), respectively (Table 6). These results indicate somewhat lower sensitivity than the preliminary criteria, but similar specificity. Analogous results for alternate AECG criteria showed overall less agreement.
Since AECG criteria were published in 2002, they are likely the most commonly used in practice. As a result, expert-clinician selection of SS cases and controls would almost certainly involve their use. Therefore, we also explored sensitivity and specificity of the SICCA preliminary criteria as compared to the AECG criteria used as “gold standard”. When the AECG criteria were applied using all available tests the sensitivity and specificity of SICCA preliminary criteria were high, 94.7 (95%CI: 92.6, 96.3) and 93.3 (95%CI: 91.3, 95.0), respectively.
When using an external data set obtained from two sites recently added to the registry whose participants were not included in the data set used to develop the preliminary classification criteria, we identified 40 participants who had been diagnosed as having SS by a JHU or UPenn rheumatologist prior to, and independently of, entry into the SICCA registry. These clinical diagnoses were made by university-based rheumatologists (mainly ANB and FV) with expertise in SS prior to their involvement with SICCA. We also identified 263 controls defined as such as they did not satisfy the AECG criteria as described in the Methods. In this external data set of 303 participants, we found the SICCA classification criteria to have a sensitivity of 92.5% (95% CI: 80%; 98.4%) and a specificity of 95.4% (95% CI: 92.2%; 97.6%).
To investigate stability of the preliminary criteria over time, we classified 236 participants who had completed two-year follow-up visits, at both enrollment and follow-up. Results were concordant in 92% of participants. Among the 8% with discordant results (20 participants), 12 (60%) showed signs of progression from a disease-free classification at enrollment to classification as diseased at follow-up. The remaining 8 (40%) exhibited the reverse pattern. Among these, two reported taking a corticosteroid medication and one reported taking a TNF alpha inhibitor at baseline. However, none of the 8 participants were taking these or any other immunomodulating medication at the 2-year recall visit. These results indicate general stability of disease status over a two-year period. Additional analyses based on comparing above validation results in participants recruited prior to September 8, 2009 with those recruited between September 8, 2009 and March 8, 2010 yielded remarkably similar results for all comparisons and are not reproduced here.
The SICCA Registry represents a unique resource for establishing universally acceptable classification criteria based on: 1) the large size and international nature of the cohort with accordingly diverse ethnic backgrounds; 2) the international and multidisciplinary team of experts including the three clinical specialties involved in the management of SS, epidemiologists, and statisticians; and 3) the standardized data collection procedures combining questionnaires, clinical examinations, and specimen collections performed by calibrated investigators. Using a consensus methodology derived from the Nominal Group Technique among 20 experts, and analyses involving 1,362 participants with complete data on ten individual tests, we first developed preliminary classification criteria for SS. These relied on a combination of objective tests that assess the three main components of SS (serologic, ocular, and salivary). Accompanying analyses showed that symptoms of dry mouth and dry eyes had poor specificity due to lack of association with objective phenotypic features. We found strong associations between the main objective phenotypic features, in particular focal lymphocytic sialadenitis (FLS) with focus score (FS) ≥ 1 and positive anti-SSA/B. Odds ratios measuring the association between ocular staining score (OSS) ≥ 3 and each of these two features were less than half the magnitude of the corresponding odds ratios measuring the association between them. A proportional Venn diagram (Figure 1) provides a good illustration of the lower specificity of OSS in relation to the salivary and serologic components.
Using a data-driven, consensus-based approach we defined preliminary criteria for SS as at least 2 out of 3 objective tests. We then performed a series of validation analyses. The first assessed sensitivity and specificity of alternate sets of criteria, each defined by substituting one item with an alternate test. The inclusion of alternate tests is important because: 1) classification criteria should be applicable in a wide variety of settings, and some tests may not be available in certain settings; 2) some tests like the labial salivary gland (LSG) biopsy may be perceived as invasive, and cannot easily be performed by clinicians from specialties outside oral medicine/surgery. However, results did not identify any suitable alternate tests for the salivary and ocular phenotypic features of SS. While unstimulated whole saliva (UWS) < 0.1mL/min had good sensitivity, it had low specificity compared to the LSG biopsy to measure FLS and FS ≥ 1. It also had both low sensitivity and specificity in comparison to model-based latent class analysis (LCA) validation results. While tear break-up time (TBUT) < 10 seconds was found to have high sensitivity and specificity when substituted for OSS ≥ 3, it was found to have very low specificity in LCA results. Although the specificity of OSS ≥ 3 was also low in the LCA comparison, it was almost twice as high as the TBUT. Finally, the ophthalmologists in the panel all agreed that TBUT is decreased in many diseases with tear surface abnormality, thus supporting a lack of specificity for SS. Furthermore, it also requires the use of fluorescein and a slit lamp, thus is not thought to be easier to administer than the OSS. With respect to serologic tests, positive anti-SSA/B had the highest sensitivity and specificity based on the LCA comparison. Positive RF and ANA ≥ 1:320 had reasonable specificity but lower sensitivity, suggesting that either test alone would not be a good substitute for anti-SSA/B. Although we did not identify any suitable alternate tests for the salivary and ocular phenotypic features of SS to be used in our proposed classification criteria for SS, UWS < 0.1mL/min and TBUT < 10 seconds may be suitable alternatives for diagnostic criteria. While classification criteria need to be stringent to prevent any misclassification because they are used to select participants for entry into clinical trials, diagnostic criteria that are used in clinical practice may allow for more flexibility.
Our analysis comparing the American-European Consensus Group (AECG) criteria and proposed SICCA criteria revealed that if the Schirmer test was used in place of the OSS and the UWS rate in place of the LSG biopsy, the level of agreement between both criteria was low (52%). However, when all objective tests were available to define the AECG criteria, the level of agreement between the SICCA classification criteria and the AECG criteria was high (88%). Also, if the AECG criteria were used as “gold standard”, which is a likely scenario if experts were asked at this time to select SS cases and controls, and all objective tests were available, the sensitivity and specificity of our SICCA criteria would be very high. In reality, because the AECG allows for substitution of criteria components, it is almost never applied with all objective tests only, which is one of its inherent weaknesses. Finally, in an external data set of 303 participants who were not included in the data set used to develop the preliminary classification criteria, we found the SICCA criteria to have a sensitivity of 93% and a specificity of 95%.
Until recently, since few therapeutic agents were being considered in the systemic management of SS, the development of classification criteria was mainly for the purpose of epidemiologic studies to estimate the prevalence of the disease. However, the development of new biologic immunomodulating agents that are being considered in the treatment of SS increases the need and importance of developing stringent classification criteria that can be used in the context of clinical trials. The consequence of misclassifying someone without SS as a case would be serious given the potentially toxic side effects of these agents. The results of the various validation analyses described herein indicate that the preliminary classification criteria we initially developed using a consensus methodology constitute a set of criteria that are stringent enough to be used as entry criteria into clinical trials. The SICCA classification criteria were found to perform very similarly to the AECG criteria, when all objective tests are available for the AECG. However, the SICCA criteria do not have the weakness inherent to the AECG criteria that allows for the use of alternate tests like the Schirmer test, or reported symptoms of dry mouth and/or eyes that we have shown to have poor specificity. Not only do the proposed classification criteria rely on a combination of objective tests, but they also require evidence of autoimmunity by serologic and/or by histopathologic measures. Histopathologic examination of a LSG sample provides high disease specificity, wide availability, prediction of non-Hodgkin lymphoma development with the presence of lymphoid germinal centers in the glands (28), and unparalleled insights into the autoimmune disease-active cells within a SS target organ. LSG biopsy has been criticized as being invasive and difficult to apply in all settings. However, the performance and analysis of nearly 1,400 biopsies as part of the SICCA protocol suggests otherwise. When LSG biopsy is skillfully and conservatively performed, it is a minimally invasive 15-minute procedure that yields unique information about the extent and nature of the disease process.
The distinction between primary and secondary forms of SS is based on an early definition of the disease and may now be obsolete. The initial definition and diagnostic criteria for SS were the presence of “keratoconjunctivitis sicca (‘dry eyes’), xerostomia (‘dry mouth’) and rheumatoid arthritis or other connective tissue disease” and “two of the three are generally considered sufficient for the diagnosis” (1). Patients who developed the dry eyes/mouth components of SS without developing RA were initially labeled as having the “sicca syndrome” and later “primary SS”, while those with RA, who usually developed the dry eyes/mouth components after onset of their joint disease, were labeled “secondary SS” (29). Subsequently, objective measures were adopted for assessing lacrimal and salivary hypofunction, systemic components of primary SS were identified and we learned that various organ-specific (e.g. thyroid, liver, kidneys and lungs) autoimmune conditions can occur in SS patients, and in other autoimmune connective tissue diseases, and independently. While the details of autoimmune pathogenesis remain elusive, many diseases have now been identified as having autoimmune mechanisms, mostly distinguished by the target organ(s) affected and genetic causes or susceptibilities are emerging. It has also become clear that some individuals with one autoimmune disease have enhanced susceptibility to develop others. Therefore, it seems of little use and risks potential confusion to distinguish in a given patient one autoimmune disease as secondary to another. Accordingly, the diagnosis of SS should be given to all who fulfill these criteria, while also diagnosing any concurrent organ specific or multi-organ autoimmune diseases, without distinguishing primary or secondary.
In summary, the SICCA classification criteria developed from registry data collected using standardized measures are easy to apply even though they may require the involvement of at least two clinical specialties, and are based entirely on objective tests. A series of validation exercises indicate improved classification performance relative to existing alternatives, making them more suitable for application in situations where misclassification may present health risks.
In addition to the listed authors, many other professional collaborators in the Sjögren’s International Collaborative Clinical Alliance have been essential to the conduct of this project. Their names and roles follow:
University of California, San Francisco, USA. Oral Pathology D Cox and R Jordan, Rheumatology D Lee, Operations Director Y DeSouza, Clinical Coordinator / Phlebotomy D Drury, Clinical Coordinator A Do, Clinical Assistant L Scott, Statistician/Programmer M Lam, Data Manager J Nespeco, Finance Director J Whiteford, Administrative Assistant M Margaret
University of Buenos Aires and German Hospital, Buenos Aires, Argentina: Stomatology I Adler, AC Smith, AM Bisio, MS Gandolfo, Oral Pathology AM Chirife, A Keszler, Specimen processing S Daverio, Group Coordinator V Kambo
Peking Union Medical College Hospital, Beijing, China: Rheumatology Y Jiang, D Xu, J Su, , Stomatology/ Pathology D Du, Stomatology/ LSG biopsies H Wang, Z Li, J Xiao, Specimens / Rheumatology Q Wu, Phlebotomy C Zhang, W Meng, Project Assistant J Zhang,
Rigshospitalet, Copenhagen, Denmark: Ophthalmology S Johansen, S Hamann, Oral Medicine Julie Schiødt, Helena Holm, Oral Pathology P Ibsen, Group Coordinators/Specimen Handling AM Manniche, SP Kreutzmann, J Villadsen.
Kanazawa Medical University, Ishikawa, Japan: Rheumatology Y Masaki, T Sakai, Ophthalmology N Shibata, Stomatology M Honjo, Oral Pathology N Kurose, T Nojima, Specimen processing T Kawanami, Hematology/Immunology T Sawaki, Group Coordinator K Fujimoto.
King’s College London, UK: Pathology E Odell, P Morgan, Specimen processing L Fernandes-Naglik, Oral Medicine B Varghese-Jacob, S Ali
Group Coordinator: M. Adamson
University of Pennsylvania, Philadelphia, USA: Rheumatology S Seghal, R Mishra, Ophthalmology V Bunya, M. Massaro-Giordano, Otolaryngology SK Abboud, Oral Medicine A Pinto, YW Sia, Group Coordinator K. Dow
Johns Hopkins University, Baltimore, Maryland, USA: Ophthalmology E Akpek, S Ingrodi, Oral Medicine W Henderson, Otolaryngology C Gourin, Group Coordinator A Keyes
Aravind Eye Hospital, Madurai, India: Group Director M Srinivasan, co-Directors J Mascarenhas, M Das, A Kumar, Ophthalmology Pallavi Joshi, Physician R Banushree, Surgeon U Kim, Oral Medicine B Babu, Administration A Ram, Saravanan, Kannappan, Group Coordinator N Kalyani
We would like to also express our gratitude to all the participants in the Sjögren’s International Collaborative Clinical Alliance, and to Drs. Pamela McInnes, Jane Atkinson, and Xavier Mariette for their review of the manuscript and valuable input.
Supported by NIH (National Institute for Dental and Craniofacial Research, National Eye Institute, and Office of Research on Women’s Health) contract N01 DE32636.
Frederick Vivino has received consultant fees and/or honoraria (less than $10,000 each) from Daiichi-Sankyo, Inc, and Parion Sciences, Inc. John Greenspan has received consultant fees and/or honoraria (less than $10,000 each) from Glaxo-Smith-Kline.