In the UK in 2004, 97% of farmers reported lame sheep in their flock with an average within flock prevalence of 10% [1
]. Footrot is the main cause of lameness and foot lesions in sheep in the UK [2
]. Dichelobacter nodosus
is the essential organism for causing footrot, other organisms especially Fusobacterium necrophorum
are thought to play an important role in the pathogenesis of footrot [3
]. The clinical presentation of footrot is highly variable and ranges from mild interdigital inflammation (benign footrot) to under-running of horn with a characteristic smell (virulent footrot). Long term disease with footrot [3
] and poor foot trimming [4
] can alter foot integrity.
A diagnosis of footrot can be made using culture or PCR from swabs taken from the hoof horn junction [6
]. However, these laboratory methods are not completely reliable. D. nodosus
requires complex media for culture with strict anaerobic conditions [6
], and while 16S rRNA PCR is more rapid and sensitive than culture it is still far from 100% sensitive [7
]. As a consequence, diagnosis using visual observation of the foot without further laboratory tests is commonly used by researchers and clinicians once D. nodosus
is endemic in a flock. Visual diagnosis may include a system to score the severity of the footrot lesion. A commonly used system to score footrot is an Australian system with five ordinal scores [8
] (Table ). In the UK, in addition to scoring footrot, a 4 point ordinal scoring method to score foot integrity has been used [4
]. These scoring systems have been used by researchers [5
] to study the epidemiology, pathogenesis, treatment, control and economic losses attributable to footrot. However, the between and within observer reliability of a scoring method for foot integrity has not been formally tested. One study [11
] investigated agreement of a footrot scoring system between two trained observers and reported a high level of agreement, but the study had 85% of lesion score 0 (no lession) out of 100 sheep. The study provided no information on when the observers disagreed or where (i.e. which scores) the disagreement lay.
Footrot scoring scale from Egerton and Roberts (1971)
The reliability of a numeric scoring system is the generalizability (based on generalizability theory) of the results across scoring situations and judges [12
]. To evaluate this, reproducibility (as a measure of between observer variability) and repeatability (the measure of within observer variability) are estimated [13
]. In both the medical and veterinary fields, an ordinal score is often used to evaluate the severity of a disease [14
]. The observer agreement for such ordinal data is commonly provided by a single measure of agreement e.g. weighted kappa coefficients [15
] or Kendall’s coefficient of concordance [16
]. These do not provide information on components of disagreement such as observer bias (i.e. tendency for observers to give higher of lower rating than others) or differences in thresholds and therefore category widths for the ordinal scale. There is one study by Thomsen et al. [17
] that tested whether the category widths used by observers for an ordinal scale were equidistant by calculating a polychoric correlation. But this approach only compared two observers and did not provide an estimate for observer bias.
Modeling techniques have been described to evaluate observer agreement for ordinal scores. These include log linear models [18
], association models [19
] and latent trait and latent class models [20
]. Both log linear and association models have been designed to compare only two observers and there are issues with interpretation of relative magnitude of some of the parameters used [14
]. Latent trait and latent class models have been designed for multiple observers and have been used in the medical field [21
] to quantify agreement with multiple observers. These models explore agreement by testing whether there is observer bias and give a visual representation of the observers’ perceived impressions of the scores on a continuum, thus indicating the threshold and width of score categories, for example, for a 0 to 3 category scale, the first threshold is the point from which an observer applies score 1 and below that would be score 0, the second threshold the point from which an observer applies score 2 and so on. To our knowledge such modeling approaches have not been used to evaluate observer agreement for ordinal categories in the veterinary field. In the current paper, observer agreement of scoring systems for footrot (using photographs and videos) and foot integrity (using post mortem
feet) in sheep is evaluated and explored using two approaches, weighted kappa and located latent class modeling.