This study has shown that despite the large quantity of screening data provided by the BVA/KC hip scoring scheme, progress against hip dysplasia appears minimal; equivalent to that projected from avoiding only the worst 15% of animals for breeding. The introduction of EBV alone would be projected to increase the rate of progress by 19% through additional accuracy of selection even if selection intensity remained unchanged. Barring costly and painful surgery, hip dysplasia is incurable due to the development of osteoarthritis as a consequence of malformation, so genetic selection presents the only effective method of reducing the prevalence. Therefore, as hip dysplasia is one of the most serious diseases in larger breeds of dog, the need for the most efficient and effective genetic selection is clear.
The presented results have demonstrated that the availability of EBV through routine evaluations of the hip score data would hasten progress in alleviating the problem of hip dysplasia via increases in selective accuracy compared to selection based on phenotype alone. However the benefits of EBV extend beyond the simple comparisons of accuracy for a recently scored dog: (i) the EBV for an individual, unlike its phenotypic score, will further increase in accuracy over time by utilising all the available information and being updated as additional information becomes available e.g. from offspring or siblings; (ii) the EBV will provide predictors for those animals that do not have a phenotypic record hence increasing selection opportunities and intensity, which again enhances rate of improvement; (iii) the EBV will be available from the moment of birth for selection (although newborn littermates will have identical EBV) and, in this case, the accuracy (and hence rate of improvement) from using EBV increases by 31% compared to the parental average phenotype; (iv) the EBV will have been corrected for other fixed effects such as sex and age which bias phenotype as a predictor of genetic merit; and (v) it may be argued that taking account of a sustainable rate of inbreeding as well as disease prevalence would restrict the selection pressure that can be applied, however this only serves to place a greater emphasis on the accuracy of the selection that does take place. Finally with the availability of sequence 
and dense canine SNP chips, the development of a genomic EBV (an EBV informed by additional information from dense SNP genotypes 
) would help to distinguish littermates and further increase accuracy, increasing the potential rate of improvement, and might also lead subsequently to scientific benefits through identifying the major QTL. The intention is to make public the EBV for hip score for all KC registered Labrador Retrievers so that all these benefits may be realised.
The analyses of transformations for calculation indicated consistently that a logarithmic transformation was more suitable for the data, despite the observation of a higher estimate of heritability for H untransformed. Two important reasons argue for the use of the transformation. Firstly the Box-Cox analysis optimises transformation on the basis of implicit assumptions with the model; namely normality, lack of heterogeneity in subclass variance, and additivity of model terms, and whilst the procedure has been used here in the context of mixed models rather than a fixed model in which it was developed the underlying principles behind the optimisation may be assumed to hold. These implicit assumptions underpin substantial parts of quantitative genetic theory and therefore it seems a wise precaution to use the transformed scale. A second justification may be found in a more detailed examination of the additivity in a genetic context where similarity between offspring and parent is fundamental, and where in selection theory linearity in regression of offspring on mid-parent an important tenet. This examination was possible because of the relatively stable temporal trends for evaluation of hip score, which testifies to good quality control by the BVA. The implication is that focussing on the upper tail of the distribution alone is unlikely to have the benefit that may be anticipated in reducing the population mean, and that genetic progress needs to be generated by influencing selection within the wider population that is less extreme. Transformation alone does not alter this – the transformation is monotonic and so does not change ranking. However the more linear relationship between offspring and mid-parent will underpin a more predictable response to selection. The lack of linearity and monotonicity in the lower tail of the relationship for loge(1+H) is not influential as it affects only the lower 5% of the distribution and that part of the distribution which will always be selected. A possible explanation for the results concerning the lower tail may be lower precision of evaluation, for which there may be some support from the apparent excess of zero individual scores (see ).
A further caveat of the BVA/KC scheme arises from the possible under-reporting of extremely poor hip scores, since submission of the radiograph to the BVA is voluntary. In such cases, it would be hoped that a prospective hip score severe enough to warrant saving the cost of evaluation by the BVA panellists would dissuade the owner from using the dog in question for breeding. Directed removal of data from one end of the scale is expected to under-estimate heritability and consequently bias estimates of EBV; in particular sires with the poorer breeding values since they are expected to have more offspring with missing (and bad) records. However, selection progress from the existing data will still be expected, and potentially faster than predicted here as a result of the underestimation of sires with poorer breeding values. Therefore whilst submission of all radiographs would be better, the BVA/KC scheme remains of high quality, scoring a large number of dogs with the majority of breeding stock. Such biases are present in many recording schemes, for example in preferential treatment of cows in dairy breeding.
This study clarified that the total hip score (of both left and right hips) was the appropriate statistic for genetic evaluation since investigation showed that left and right hips had near identical genetic parameters. The optimum weighting for individual hips, in principle, favours the hip that is richer in genetic information and this is related to heritabilities and phenotypic variances for each hip. However the demonstration of near perfect genetic concordance across hips indicates that analysis of total hip score averages out the environmental influences that differentiate the individual left and right hip score (note the environmental correlation was only 0.57). Furthermore, given the extent to which genetic influences are shared by both hips (the genetic correlation was 0.999) recording the worst hip only will add bias by recording the hip that has suffered from the most extreme deleterious environmental impact. This was supported by additional analysis which indicated that EBV for a measure of mean hip score was a better predictor of both mean and worst hip score than EBV for worst hip score (see supplementary Material S1
and Table S1
The rate of genetic improvement estimated in this study is modest, equivalent to 0.43 genetic standard deviations p.a.; lower than that reported in Swedish Rottweilers from 1992–2002 (0.67) 
, but higher than that in US Labradors from 1970–2005 (0.37) 
. Comparison between the US and UK studies are more straightforward since they consider the same breed/gene pool and, given the low selection intensity observed in the UK, the disparity of progress may be due to the recording schemes since the more refined scale of the BVA/KC was estimated to be 1.6 fold more heritable than the 7 point OFA scheme. Comparison with Swedish Rottweilers however is across gene-pools, and the reported heritability for the 5 point FCI scheme was marginally greater though with greater standard errors. Thus, whether the superior rate of improvement reported by the Swedish study is due to breed, to recording scale or to greater selection pressure is unclear.
To date any selection against hip dysplasia in the UK will have been accomplished using phenotypic hip score, and the BVA endorses such practice by recommending breeding from dogs with scores clearly below the breed mean, which is currently 15 (http://www.bva.co.uk/public/documents/chs_hip_scheme_breed_mean_scores.pdf
). However, it is clear that the BVA recommendations are only just short of being met, since a score of 15 corresponds to the 81st
percentile in our data, equivalent to avoiding only the worst 18–19% of animals for breeding – close to what is observed. At the current rate of progress (an improvement in mean EBV of 1.36×10−2
per annum), an ambitious but realistic target of a reduction in the median hip score from 10 to 5 would take over 44 years using phenotypic selection, or just over 37 years if selection was on EBV of dogs with a phenotypic hip score. Therefore, despite the improved accuracy of selection enabled by EBVs, there remains very slow selective progress against hip dysplasia. Even breeding below the median phenotype, i.e. best 50% of animals, would have resulted in progress over 2.5 fold greater than has been observed over the period 1996–2006. The selection intensity from breeding from the best 50% of scored animals using EBVs would be over 3 times higher than that achieved over 1996–2006 where breeding from dogs with scores less than the current breed mean was the guideline. These figures indicate that, even with the improved efficiencies afforded by EBV, adequate selection pressure is also vital in improving progress against hip dysplasia and that this could be achieved with more challenging guidelines.