We have demonstrated that a wide range of HFUS measures of ST and PDUS measures of synovial vascularity at the MCPJs are reproducible and capable of detecting treatment effects of oral prednisone (15 mg and 7.5mg daily) after a week, and two US measures after only one day, in small panels of subjects (n = 18 and n = 27, respectively) with moderate to severely active RA. DAS28(CRP) was only able to detect a significant treatment effect after two weeks in the 15 mg cohort. US may, therefore, be a leading indicator of therapy response occurring before a clinical response. At present, more than 50% of drugs tested fail at phase III and the expense of the traditional drug development pathway has become prohibitive for numerous novel compounds developed to selectively inhibit a range of potential therapeutic targets that have been identified for RA. Our study has extensively investigated the sensitivity and reliability of a diverse range of two-dimensional ultrasonographic endpoints at the MCPJ and their potential as tools to provide an early and objective indication of a therapeutic response to treatment intervention in RA. We have confirmed that ultrasonography of MCPJ is an early, reliable indicator of therapeutic response in RA and it thus has the potential to reduce patient numbers required as well as the duration of clinical trials designed to give a preliminary indication of efficacy. Such an approach to early drug development in RA might increase the chances of success in later phase studies designed to meet the regulatory endpoints that are required to achieve approval.
In the present study, correlations between the majority of different US endpoints and DAS28(CRP) while on prednisone treatment were between 0.4 and 0.8, suggesting that they measure somewhat different constructs. Combining US endpoints with DAS28(CRP) increased effect sizes at all time points and identified treatment effects earlier. Composite endpoints increased the endpoint sensitivity for 15 mg in Panel A. The DAS28(CRP) had an effect size of about 1.0, which would take 13 subjects per group to identify a treatment difference (alpha 0.1; 80% power). In combination with a US endpoint with combined effect size of approximately 1.5, the sample number drops to six per group. Some combinations of US endpoints with effect sizes of approximately 2.0 would require four subjects per group. Likewise, single dose effects of 15 mg prednisone were identifiable with some combinations of endpoints. These findings strongly suggest a potential value in employing such composite endpoints in future prospective small studies designed to establish an early indication of efficacy. Composite endpoints were selected from Panel A on how well they performed. They were tested in Panel B in a predefined way but in a limited capacity. These composite endpoints need to be tested in future studies to confirm their utility.
Both 15 mg and 7.5 mg prednisone represent relatively low corticosteroid doses and it would be notable if an endpoint could differentiate their effect. Overall, there was a trend towards a dose-response. Greater numbers of subjects may have discriminated the two doses. Other factors that may have decreased the study's ability to differentiate the two doses include the fact that there were two centers, the scanning rooms of which, for example, may have been at different temperatures, and there were two ultrasonographers, the first scans of each in their respective centers were used to determine treatment effect.
For Panel B the 10MCP Trans PDA demonstrated a significant treatment effect earlier than in Panel A. This may be because there were more subjects in Panel B who received active treatment, albeit at a lower dose. To support this, at Day 8 more US endpoints registered a significant treatment effect for Panel B than Panel A.
For Panel A, seven out of eight US endpoints demonstrated a consistent time-response to 15mg of prednisone. Within Panel B more US endpoints registered a significant effect size at Day 8 in comparison with Day 15 perhaps due to waning of therapeutic response to low dose corticosteroid in some subjects. The observed transient response of the US endpoints to 7.5 mg of prednisone was mirrored in the effect sizes of the DAS28(CRP) even though this latter endpoint did not show significance at any time point. We postulate that for some subjects in Panel B, 7.5mg of prednisone may be just below the threshold dose for a sustained anti-inflammatory effect. The biological response to prednisone at low doses (≤7.5 mg/day prednisone or equivalent), is not necessarily predictable in inducing and sustaining an anti-inflammatory effect in RA [29
]. If we had used larger doses of prednisone in the study, for example 40mg, we would have undoubtedly seen more consistent time-responses but this would have weakened the impact of the study as it would not have permitted a demonstration of the sensitivity of US to detect change.
The Long STA endpoint performed especially well in the current study. Our previous investigations of HFUS gray-scale ST have shown inferiority to power Doppler vascularity in detecting a treatment effect with respect to the kinetics and the extent of change [27
]. However, those studies measured synovial thickening semi-quantitatively in the transverse plane only. Semi-quantitative indices may constrain the detection of change in joints if synovial thickening greatly exceeds the largest score by delivering static scores when genuine reduction in synovial thickening can be detected quantitatively. The greater area afforded by the longitudinal versus the transverse view may have also benefited the registration of a treatment effect by the Long STA endpoint. The data in the current study support these theories: semi-quantitative measures of synovial thickening had smaller effect sizes than quantitative measures (the only exception was Day 2, panel B); transverse measures of synovial thickening had smaller effect sizes than longitudinal measures (the only exception was Day 2, Panel A; Figure ). The treatment effect was less at Day 2 and, therefore, these factors would have had less influence at this early time point.
Most US studies have investigated reliability on a joint by joint basis. Few have assessed reliability of a summation of scores for a selected group of joints. Naredo et al
. assessed within scan intra-reader reliability with a resultant excellent ICC value of 0.99 [10
] for summated 4-point semi-quantitative PDUS imaging of 28 joints, called the 'overall US joint index for power Doppler signal'. Backhaus et al
] developed a composite US score called the 'German US7 score'. They measured HFUS synovitis and PD synovitis using 4-point semi-quantitative scales in seven joints and the within scan inter-reader reliability kappa value was 0.6. Arguably the most robust measure of reliability is the 'parallel scan inter-reader' (included in our study) because it is a comparison between two ultrasonographer-readers scanning the same patient. The images are read independently, as might be the case in multi-site clinical trials using the same model of US machine and settings. As expected the overall reproducibility for parallel scan inter-reader reliability was lower than within scan inter-reader reliability; the difference between these two methods most likely representing the loss of concordance due to image acquisition. A similar observation was reported by Kamishima et al
]. Despite this shortfall, in the current study good agreement was observed for the overall parallel scan inter-reader reliability. The overall parallel scan intra-reader reliability was strongest demonstrating the potential advantage of one ultrasonographer acquiring and reading the scans at a single site.
Quantitative ultrasonographic measures of synovitis demonstrated better overall reliability than semi-quantitative measures although the difference was not statistically significant. Therefore, within future studies there may still be a place for more time-consuming measures of synovitis, by computationally quantifying pixel counts, but quicker semi-quantitative scales may be an acceptable substitute. We observed that power Doppler measures of synovitis were significantly more reproducible than gray-scale measures of synovitis and we advocate that future US studies include power Doppler vascularity endpoints to deliver optimum reliability.
The dimensions of the transducers available for use in this study may have been a limitation resulting in weak inter-reader reliability (within-scan or parallel scan) for the 10MCP Trans STi and the 10MCP Trans STA. Because of the broad width of the transducer relative to the deepest point of the triangular structure (which is a narrow precise location), more than one hyperechoic line, representing bone, is often observed on the saved gray-scale image. Therefore, MS and SK may have chosen different ROIs depending on which line was selected to represent the lower border of the triangular structure, even though, from the beginning of the study, there was a consensus to use the lowest hyperechoic line.
Another limitation of our study is that the two prednisone doses were trialed in series rather than in parallel and, therefore, although comparisons can be made between treatment groups, firm conclusions are hampered. This is especially relevant when attempting to comment on the dose-response of the US endpoints to prednisone.
Due to the time constraints of scanning we restricted our US evaluation to the dorsum of the MCPJs. It may have been valuable to have assessed endpoints derived from imaging over the palmar surface also.