The aim of systematic reviews is to identify and evaluate all available research evidence relating to a particular objective. An essential part of any systematic review is the quality assessment of individual studies. Aspects such as study design, methods of sample recruitment, the execution of the tests and the completeness of the study report relate to the overall quality. The results of a systematic review will be biased if the results of individual studies are synthesised without any consideration of quality in terms of potential for bias, lack of applicability and the quality of reporting.
] was the first systematically developed, evidence based quality assessment tool to be used in systematic reviews of diagnostic accuracy studies. The QUADAS tool [30
] contains a detailed explanation of the intention of each item, situations when the item does not apply and how to score items. This allows for minor adaptations in specific areas. The QUADAS tool [30
] does not incorporate quality scores to assess the level of evidence. Since the importance of individual items and their potential biases may vary according to the context in which they are applied, incorporation of quality scores into the results of a review may generate different magnitudes of bias and lead to different conclusions regarding the effect of study quality on estimates of diagnostic accuracy. Instead, it has been proposed that a systematic review should involve a component approach, where the association of individual quality items with test accuracy are investigated individually [55
The QUADAS tool [30
] was used in the present systematic review since it is a standardised approach to quality assessment and since the criteria needed to assess the quality of diagnostic test evaluations differ from those needed to assess evaluations of therapeutic interventions [56
In the present review, modifications were done to adapt the QUADAS tool [30
] to better correspond to the objectives of the study. Two questions (numbers 6 and 7) were excluded from the 14 questions of the final QUADAS tool [30
] and two were added from the original list of 28. In addition, question number 3 was modified into two sub-questions.
When interpreting the studies with the aid of the QUADAS tool [30
] it could be established that some studies showed shortcomings in describing the selection criteria clearly and in describing the test method or the reference method in such detail that it could be reproduced. A sufficient description of the tests is important since variations in measures of diagnostic accuracy can be traced back to differences in the execution of the tests. A clear description is also needed in order to implement the test in another setting. None of the included studies reported uninterpretable/intermediate results or data on observer or instrument variation. A diagnostic test can produce uninterpretable/intermediate results with varying frequency. If these results are removed from the analysis, it may lead to biased assessments of the test characteristics. Furthermore, it can be questioned if some of the studies presented appropriate results.
When scoring QUADAS items [30
] as unclear it is difficult to be certain if this indicates poor methods with the attendant consequences for bias and/or variation, or simply poor reporting of a methodologically sound study. The STARD initiative [57
] has proposed standards for the reporting of diagnostic accuracy studies. If these standards are widely adopted, reviewers might be able to assess methodological quality rather than the quality of reporting. The aim of test accuracy studies is to assess how well a test can distinguish between subjects with and without the disease/condition of interest. There are two basic types of test accuracy study: (1) The single-gate design which includes participants in whom the disease status is unknown, and compares the results of the index test with those of a reference standard used to confirm diagnosis. This design is broadly representative of the setting in which the test would be used in practice. (2) The two-gate design compares the results of the index test in patients with an established diagnosis of the target condition with its results in healthy controls or controls with another diagnosis. This design has inherent problems that may lead to bias. The inclusion of healthy controls is likely to lead to over estimations of specificity and the selective inclusion of cases with more advanced disease is likely to lead to over estimations of sensitivity [58
]. The two-gate studies can however be useful in the earlier phases of test development.
Addressing the question formulated to specify the problem, it can be concluded that whilst a variety of tests to diagnose oral dryness have been examined, only a few have been validated in terms of diagnostic accuracy. Eight of the included studies presented their results as percentage of correct diagnoses. Four of these studies used European Community Study Group on classification criteria for Sjögren’s syndrome [48
] as reference method. The European classification criteria for Sjögren’s syndrome were developed and validated between 1989 and 1996 and have received broad acceptance by the scientific community. Since the reference standard is an important determinant of the diagnostic accuracy of a test, it raises the question of why all of studies aimed at the evaluation of tests for the diagnosis of Sjögren’s syndrome did not use the same reference method.
Although these criteria have received a broad acceptance, some criticism has been raised concerning the inclusion of subjective test (symptoms), physiologic measures that lack specificity and alternate objective tests that are not diagnostically equivalent.
Recently, the American College of Rheumatology [59
] proposed new classification criteria for Sjögren’s syndrome. These criteria are based on expert opinion elicited using the nominal group technique and analyses of data from the Sjögren’s International Collaborative Clinical Alliance [60
]. The proposed criteria are: 1) positive serum anti-SSA and/or anti-SSB or (positive rheumatoid factor and antinuclear antibody titer ≥1:320), 2) ocular staining score ≥3, or 3) presence of focal lymphocytic sialadenitis with a focus score ≥1 focus/4
in salivary gland biopsy samples. Case definition requires at least 2 of the 3 above mentioned objective features. Thus, only objective tests and not subjective tests (symptoms) are included since symptoms of dry mouth and/or eyes can lead to misclassification of asymptomatic patients. For the salivary and ocular phenotypic features of Sjögren’s syndrome the results did not identify any suitable alternate tests besides labial salivary gland biopsy. While unstimulated salivary flow rate <0.1
ml/min had good sensitivity, it had low specificity compared to the labial salivary gland biopsy to measure focal lymphocytic sialadenitis with a focus score ≥1 [59
]. Seven of the studies interpreted in this review evaluated different tests for determining decreased salivary flow and used sialometry as a reference method. These studies revealed heterogeneity with respect to source of secretion whether unstimulated or stimulated. Cut-off values defining salivary gland hypofunction also varied. As stated earlier, without proper individual baseline information, it is almost impossible to ascertain if the level of a patient’s salivary flow rate is below the ‘normal’. When using sialometry for diagnosing salivary dysfunction it can be argued if the method is used as a diagnostic tool or rather as a verification of an already established condition. Sreebny [61
] proposed that the low cut-off values should be viewed as values which “flag” or “raise suspicion” about the presence of a disease. They do not indicate that the person who demonstrates such values definitely has a disease.
The fact that there is no global consensus regarding the terminology of oral dryness, although many authors distinguish between xerostomia, denoting the subjective feeling, and hyposalivation, denoting a decreased salivary flow rate, creates a problem for research, diagnosis, and therapy. As for research, this problem is illustrated when using Medical Subject Headings (MeSH). MeSH is the National Library of Medicine’s controlled vocabulary thesaurus used for indexing articles for PubMed. The MeSH database defines xerostomia as decreased salivary flow, which is incorrect since a sensation of oral dryness can occur in subjects with a normal salivary flow. Nederfors [12
] proposed to divide the term “salivary gland hypofunction” into 3 different entities: xerostomia, denoting the subjective feeling; hyposalivation, denoting the decreased salivary flow rate; and altered saliva composition. This classification accepts that xerostomia may exist without signs of hyposalivation, that hyposalivation may be a symptomless condition and that an altered saliva composition may exist even if the saliva secretion rate is unaffected and without subjective symptoms. These three entities are inter-related and can influence each other in different ways.
Over the last decade, advances have been made regarding proteomic and genomic approaches to identify potential biomarkers that may be used in the detection of different diseases, e.g. Sjögren’s syndrome [62
]. Saliva is a biofluid that is readily accessible via noninvasive methods, and therefore a perfect medium to be explored for purposes to monitor health status, disease onset and progression, and treatment outcome. Salivary diagnostic technologies identifying specific biomarkers associated with disease may in the future be used to verify general diseases behind salivary gland hypofunction [63
]. It should also be mentioned that in the absence of an efficient treatment, a diagnostic method has little value. The basic causes of oral dryness are difficult to treat and many methods have been tested to stimulate saliva secretion and ease the patient’s discomfort, e.g., saliva-stimulating tablets and artificial saliva. Several studies have evaluated the efficacy of such preparations but there is no documented evidence of their effect on oral health [64
Currently, diagnostic methods are addressing quantity and content of saliva in bulk and few qualitative tests of saliva, in bulk or of saliva as an adsorbed thin film, are at this date available for describing the protective functions of saliva.
Since changes in the protective functions of saliva may occur, there is a need for effective diagnostic criteria and functional tests in order to discern which individuals with oral dryness will require oral treatment, such as alleviation of discomfort and/or prevention of diseases.
An important component in determining the usefulness of a test is the evaluation of the diagnostic accuracy, but the clinical value lies in improving a patient’s condition or health. The clinical value, i.e. how the results of a test affects the clinical decision-making and the effect on the patient’s wellbeing are important factors when evaluating diagnostic tests or methods. A method with high diagnostic accuracy may not always be efficient and useful for the patient. Studies that investigate the value of diagnostic interventions are scarce and seldom available for new test methods. In addition, appropriate reference standards for many disorders are lacking.