This study investigated the intra- and interobserver reliability of assessment of the change in ST T cells, plasma cells, and macrophages quantified by DIA. Tissue samples were obtained from RA patients participating in a single-centre, placebo-controlled clinical trial with prednisolone. There were no significant differences in measurement of the mean change in T cells, plasma cells and macrophages between the three observers, or for different measurements by one observer. ICCs revealed good agreement between measurements. All observers and all measurements identified, on average, significant reductions in T cells and macrophages but not in plasma cells in the prednisolone group compared with placebo.
It can be anticipated that there will be an upsurge in randomized controlled trials investigating novel biological agents and small molecules in terms of their safety and efficacy. Thus, sensitive, validated and reliable measurements to screen for potential efficacy in an early phase of drug development are clearly needed. Clinical outcome measures have historically been used as primary end-points, but their reliability may be limited in small proof-of-principle studies. For clinical measurements such as the tender and swollen joint count, ICCs have been reported to vary between 0.15 and 0.85 for inter-rater variability and between 0.67 and 0.95 for intrarater variability [21
]. Radiographic measurements, with the use of conventional X-ray films, show good reliability in most studies but they are not useful in short-term clinical trials [21
]. The use of magnetic resonance images is promising, with acceptable inter-rater ICC for global synovitis scores and bone erosions, although optimal scoring systems are yet to be developed [22
In light of the need to screen various compounds for potential efficacy in small numbers of patients and because of recent technical developments, we believe that our thinking about clinical trials is about to change dramatically. Clinical studies conducted during early phases of drug development will increasingly consist of small trials with a high density of biological data [23
]. Consistent with this notion, serial ST analysis with evaluation of biomarkers was recently included in several randomized clinical trials of both disease-modifying anti-rheumatic drugs and biological agents [6
]. These and other studies showed consistent relationships between the magnitude of synovial changes and clinical response. In particular, the change in infiltrating sublining macrophages was identified to be a potent and sensitive synovial biomarker [6
ST can easily and safely be obtained as a result of the introduction of small-bore arthroscopes and the development of local and regional anaesthesia protocols. Despite heterogeneity in the ST within a single joint, it has been shown that representative measures of synovial inflammation can be obtained by examining a limited area of tissue [15
]. Previous work [10
] has also shown that DIA is a sensitive, time efficient method for quantifying both the number of stained cells and the staining intensity, with good correlations with both manual counting and semiquantative scoring.
Although DIA is described as reliable and objective, little is known about the variability and reliability of this tool. Variation in measurements may result from a limited number of factors with this approach. In our system the observer selects three different areas of each six high-power fields from one slide, which is composed of six biopsy samples from six different sites in the joint. This is done in such a way that a representative area is selected, and this requires extensive training and experience with the histopathological morphology of ST. After scanning the representative high-power fields, the images are analyzed by setting threshold values for the stained antigen, nuclear staining and background staining [10
]. These thresholds are kept constant for all measurements with the same marker within a study, but could theoretically give rise to variation when set by different observers or by one observer at different times. In the present study it was shown that these variables did not result in different outcomes. There were good ICCs when the findings of three experienced observers or the findings of the same observer at different times were compared. Analysis by Bland–Altman plots showed no systemic differences with regard to the intra-observer measurements, and the SDCs showed good discriminatory power when applied to the treatment groups. In addition, all observers and all measurements identified the same cell types (T cells and macrophages) as decreasing significantly in the active treatment group compared with placebo. All measurements also identified a consistent trend toward reduced plasma cell numbers after corticosteroid treatment, which did not reach statistical significance, possibly because of the relative small number of patients included. Although this method does exhibit good agreement in detecting changes in histological markers, this does not necessarily mean that these results can be extrapolated to the expression of a given marker at a given time point, as used in cross-sectional studies of ST. In addition, it remains to be seen whether the same reliability holds true for determination of changes in secreted proteins, such as cytokines and chemokines.