As an illustrative example of the applied Deming method, the scatter plot, the residual plot and the correspondent numeric output of alanin-amino-transferase (ALAT) measurements are shown in Figure . The means of the triplicates of tube 11 are compared with those of tube 22. Figure shows that the identity and the weighted Deming regression line lie closely to each other. Figure illustrates that the deviation of the standardized residuals strongly increase in lower values, indicating that a weighted rather than unweighted procedure should be applied. On the lower part of Figure , the numerical results of the constant and proportional bias and their p values are displayed. The constant bias corresponding to the intercept in regression analysis should be 0.00 ideally, the proportional bias corresponding to the slope 1.00. The p value of 0.57 for the constant bias and that of 0.87 for the proportional bias indicate that neither the slope nor the intercept differs significantly from their ideal values.
Figure 1 Deming fit. This figure shows a typical report of the Deming fit by Analyse-it for Excel: Graph A represents the scatter plot, the regression line and indicates the identity line and the limits of a 5% bias. Graph B plots the standardized residuals. The (more ...)
A compilation of the constant and proportional bias over all statistical tests including all parameters and all centrifugation conditions is displayed in additional file 1
, table S1 and an excerpt of them is illustrated in table . The leftmost column shows the parameter analyzed. For each parameter, seven statistical comparisons as displayed in the adjacent columns were made, resulting in 357 Deming tests (51 parameters × 7 comparisons). One tube 22, condition 2 was excluded because several analytical parameters showed aberrations that exceeded 3 standardized residuals, and we concluded that this tube did not contain a sample of appropriate quality. Twenty-three analytical parameters could not be evaluated, as either, they were not quantitative tests or most patient samples did not contain measurable amounts and therefore a reliable quantitative comparison could not be performed. They are listed as footnotes in additional file 1
, table S1.
An illustrative excerpt of all Deming fits containing the data of some clinical chemistry and immunology analytes.
Table and additional file 1
, table S1 list the statistical evaluations on all investigated parameters including the constant bias, the proportional bias and their 95% confidence intervals. The 95% confidence intervals included the ideal values (i.e. 0.00 in constant bias or 1.00 in proportional bias) for 688 of the 714 results (357 each constant and proportional bias), and not included in this interval were 26 results (18 constant, 8 proportional biases). These data correspond to 3.6% of the 714 results, 5.0% of constant and 2.2% of proportional biases, respectively. These percentages do not surpass the expected 5%, as the confidence intervals include only 95% and not 100% of random variations. Moreover, aberrations from the ideal values were minor in these cases, supporting the concept of pure randomness.
To further analyze the results, the distribution of the proportional biases that should randomly vary around the value 1.00 was studied (Figure ). This analysis showed that 50 percent of all proportional bias (slopes) were located between 0.990 and 1.010, and 99% of them were located between 0.924 and 1.086. The extreme values were 0.90 and 1.15. The parameters with proportional biases outside the 99% distribution and below 0.924 were the following: sodium, bicarbonate, CKMB, and those above 1.086 were bicarbonate and CKMB. Apparently, the Deming fit did not appropriately estimate the slope in some of these outliers such as CKMB. From the scatter plot, we assume that relatively large variations in low normal values influenced this estimate. In others, like bicarbonate or sodium, the analytical imprecision combined with a small measurement range resulted in aberration from the 95% confidence interval.
Distribution of proportional bias: Histogram showing the distribution of the proportional biases (Slopes). The slopes scatter around the ideal value of 1.00. For comparison, a normal distribution is depicted.
As the values of the constant biases (intercepts) depend on the measurement range of the analyte, no similar analysis could be performed. Instead, we analyzed whether deviations accumulate in a certain centrifugation condition. The 95% confidence interval of the constant bias did not include the ideal 0.00 value 18 times, namely in comparison between tube 11 and tube 22: twice; in tube 11 - between centrifugation condition 1 and 2: once; between condition 1 and 3: 3 times; between condition 2 and 3: 3 times; in tube 22 - between centrifugation condition 1 and 2: 4 times; between condition 1 and 3: twice and between condition 2 and 3: 3 times. Thus, the aberrant values did not cluster under any centrifugation condition.
In six instances, both constant and proportional bias confidence intervals did not include the ideal value. These tests were therefore considered as potentially significantly aberrant. These conditions were listed in table . In all but one case, condition 3 was involved as test method, whereby either condition 1 or 2 were reference. As in all instances, the confidence intervals of the proportional bias and constant bias did not include the ideal values only marginally, we concluded that these outliers were generated only by chance. In order to enable the readers to make his or her own adjudgment on the significance of these deviations, the scatter plots of all six conditions including the slopes, their confidence intervals, the identity lines and the upper and lower limits of reference where appropriate, are displayed in Figure .
Tests with confidence intervals of both proportional and constant bias exceeding the confidence interval and indicating a possible lack of identity between test and reference method.
Figure 3 The scatter plots of the Deming fits mostly aberrant from identity are depicted. The upper five diagrams (group A) show those tests with a proportional bias outside the 99% distribution range (s. figure 2). The lower six diagrams (group B) show tests (more ...)
The procedure that has been discussed so far, calculates the probability of identity between the test method and the reference method (alpha error). For the purpose of this study however, the probability of a deviation between the two methods i.e. the beta error, is at least as relevant as the alpha error. The use of patient samples with a sufficiently large measurement range thereby enforced the statistical power.
The first estimate of the beta error according to the description in paragraph 2.4.2 is listed per parameter in table . All parameters except for chloride were below the allowable limits of the bias, the problem in chloride being the physiologically narrow range of sample values. Second, it was tested whether a specified bias could be detected at the limits of the reference values. As illustrated in Figure , a bias, named allowable bias on the figure, was pre-specified and the probability calculated that such a bias could be detected. In this example, the upper limit of reference is 41 U/L. Five percent of 41 are 2.1 indicated on the figure as bias goal. The bias calculated from the data corresponds to 0.2, which is much smaller than 2.1. The null hypothesis that the bias is equal or larger than the bias goal can be falsified with high probability, which excludes a deviation with high probability.
An illustrative sample of the 539 reference limits tested is given in table and the full information is listed in additional file 2
, table S2. 88.9% or 479 of those tests would detect a 5% bias. Interestingly, chloride was within this group. If the allowable bias was set to 10%, 20%, 30% and 40%, such biases would be excluded at further 48, 8, 3, and 1 reference limits, respectively. These figures correspond to a cumulative frequency of 97.8%, 99.2%, 99.8% and 100% of all levels tested. The cortisol test required the highest allowable biases of all tests for falsifying the null hypothesis, namely, once 40%, three times 30%, twice 20% and once 10%. This indicates that the number of specimens tested were insufficient for cortisol to exclude a bias with sufficient certainty.
Excerpt of a list of the probabilities to detect a 5% bias with 95% certainty at the limits of the reference ranges limited to the comparison of tubes 11 to tubes 22.
We conclude that the conditions used including the number of measurements, the analytical range and the analytical imprecision were sufficient to detect a beta error with sufficient probability.