For the 16 serum samples included in the final data, the total recorded MMT measurements and the percentages of the total assayed titers for laboratories A to F were 14 (2.6%), 42 (7.7%), 38 (7.0%), 45 (8.2%), 47 (16.0%), and 18 (4.7%), respectively. The total number of OPA titers at or below the MMT were as follows: for serotype 1, 43 (21.5%); for serotype 3, 11 (6.5%); for serotype 4, 19 (7.9%); for serotype 5, 14 (7.0%); for serotype 6A, 11 (5.5%); for serotype 6B, 30 (12.4%); for serotype 7F, 2 (1.0%); for serotype 9V, 6 (2.5%); for serotype 14, 0 (0.0%); for serotype 18, 9 (3.7%); for serotype 19A, 6 (3.0%); for serotype 19F, 16 (6.6%); and for serotype 23F, 37 (15.3%). In addition, laboratory A recorded eight titers as “>16,570” for serotype 14, and these titers were set to twice that amount (33,140) for analysis purposes.
ANOVA model laboratory-predicted and consensus OPA titers.
ANOVA models were used to estimate single predicted OPA titers for each serum by laboratory and serotype from the duplicate titers submitted for analysis and for the estimation of the consensus OPA titers for each serum by serotype. Scatter plots comparing these predicted serum OPA titers between laboratories and aggregated over all serotypes are presented in Fig. . The line of identity represents perfect agreement (intercept = 0; slope = 1). In general, most laboratory-to-laboratory comparisons yielded clusters of points centered over the line of identity, indicating good agreement. However, titers for laboratory E were lower than those for the other laboratories, as indicated by the shift in the point clusters away from the line. In addition, some laboratories were able to produce measurable titers for samples that other laboratories recorded as MMTs. This is shown by the line of points with titers near 2 along the x and y axes of the comparisons.
FIG. 1. Scatter plots of pairwise comparison between laboratories aggregated over serotype. Predicted OPA titers derived from ANOVA random-effects models were used for each set of assay duplicates. The solid line indicates perfect agreement (intercept 0 and slope (more ...)
Figure displays the comparison of laboratory-predicted and consensus titers. Laboratory D exhibits slightly more overall variability than the other laboratories, and titers from laboratory E tend to be lower than the consensus values. Box plots displaying the distribution of fold differences between the individual laboratory-reported and consensus OPA titers by serotype are presented in Fig. . The distance of the mean (*) from the gray dotted line at a 1-fold difference is a direct measure of the mean bias within a laboratory for each serotype. The size of the box, coupled with the extensions of the vertical lines above and below the box, is a direct indicator of the intralaboratory or within-laboratory variability of the fold differences (repeatability). As an example, small boxes centered about the gray dotted line, with vertical lines extending between 1/2 and 2, indicate a distribution where the laboratory-observed titers were within ±1 titer (2-fold difference) of the consensus value. The positioning of the boxes about the gray dotted line for a given serotype across all laboratories is an indicator of the between-laboratory variability (reproducibility). Laboratory A has a substantial negative mean bias and high variability for serotype 14 and a moderate negative mean bias for serotype 5, but with low variability. In addition, laboratory A has positive mean biases for serotypes 19A and 23F, with moderate variability. Laboratory B has noticeable positive/negative mean biases for all serotypes except 19F and shows increased variability for serotypes 19A and 23F. Laboratory C has a moderate amount of mean bias but generally shows the smallest amount of variability around the consensus titers. Laboratory D exhibits a severe positive bias with serotype 9V and negative biases for serotypes 18C and 19A. Laboratory E has the greatest degree of overall mean bias, underestimating titers for serotypes 4, 9V, 19F, and 23F. Overall, laboratory F exhibits a minor mean bias for serotypes 1 and 23F and, with the exception of serotype 23F, displays small amounts of variability in the fold differences for the remaining serotypes.
FIG. 2. Scatter plots of pairwise comparisons between laboratories and consensus values aggregated over serotypes. Predicted OPA titers for each laboratory and consensus titers were derived from ANOVA random-effects models. The solid line indicates perfect agreement (more ...)
FIG. 3. Box plots by serotype and laboratory for the fold differences between the consensus and observed OPA titers. Consensus OPA titers were estimated for each sample within a serotype using the random-effects ANOVA model. In these plots, the box is defined (more ...) Interlaboratory and laboratory-to-consensus agreement for OPA titers.
Table presents accuracy (Ca), precision (r), and concordance (rc) measures of agreement between pairs of laboratories and between laboratories and consensus OPA titers. While laboratory E has a definite systematic bias, laboratories A, B, C, D, and F all perform comparably to each other for precision, accuracy, and concordance. Laboratory E consistently underestimated OPA titers compared to the other laboratories, as seen in Fig. and , and this is reflected in Table , with the lowest values for Ca and rc. The rc value for laboratory E is less than 0.80 (range, 0.67 to 0.78), whereas the rc values are >0.80 for the remaining five laboratories. Similarly, comparison of the results for the laboratories with the consensus OPA titers (Table ) reveals that laboratory E has the lowest accuracy (0.92) and has the least amount of concordance (0.85) with the consensus values. In contrast, all other laboratories have accuracy values close to 1.0, with concordance values of >0.90.
Comparison of OPA titers between laboratories and laboratory-to-consensus OPA titersa
Repeatability and reproducibility.
Intralaboratory (repeatability) and interlaboratory (reproducibility) variances are diagramed in Fig. . Serotype 9V shows the greatest interlaboratory variability and is influenced by the extreme positive bias in laboratory D for this serotype (Fig. ). The interlaboratory variance for serotype 9V is reduced by >85% when laboratory D is removed from the analysis, but the intralaboratory variance remains virtually unchanged. If laboratory E is removed from the analysis, then the interlaboratory variability levels are reduced by >82%, >34%, and >58% for serotypes 4, 19F, and 23F, respectively. Repeatability values are similar across serotypes and reflect each laboratory's ability to replicate its results, which are stable across serotypes.
FIG. 4. Plots of within-laboratory (repeatability ) and between-laboratory (reproducibility [•]) variability by serotype.
Within-laboratory bias per serotype was estimated from the random-effect ANOVA models and illustrated using box plots of titer fold differences (Fig. ). The mean bias varied greatly across serotypes and laboratories. Bias quantified by serotype and laboratory revealed that for the 71 possible laboratory-serotype combinations, 13 (18.3%) had a mean bias greater than a 2-fold difference and that 2 (2.8%) had a mean bias greater than a 4-fold difference (Fig. ). There were no systematic patterns, as these represented all laboratories and 8 serotypes. Across all serotypes, laboratories A, C, and D exhibit the lowest degrees of bias, with deviations from consensus titers of 0.01, 0.03, and −0.04, respectively. Laboratories B and F exhibit moderately higher mean bias, with deviations of 0.40 and 0.17, respectively. Laboratory E has a relatively high systematic mean bias compared to the other laboratories, with a deviation of −1.0 compared to the consensus titers. Within a serotype, the expectation is that the mean bias is 0, and this is one of the assumptions of the random-effect ANOVA models. The average absolute bias is generally less than 0.5 deviations from the consensus titers, with a range of 0.25 to 0.88. Serotypes 6B and 9V have the lowest and highest absolute differences, respectively.
Consensus OPA titers and intervals.
Four separate prediction intervals (PIs) were formed about the consensus values for each of the 16 samples and 13 serotypes. These intervals may be used to judge whether future assays generate results comparable to those produced for this study. The 95% and 80% PIs were derived from the ANOVA models and reflect the variability of the titers reported in the present study. We also constructed two nonparametric intervals, representing ±2- and ±4-fold differences from the consensus values, which may also be used as guides for future assays. We calculated the percentages of observed OPA titers that fell within each of the intervals (Table ). The results illustrate that the ±2-fold-difference interval likely captured less than a desirable percentage of the observed data. Overall, the 80% and 95% prediction intervals and ±4-fold-difference intervals all perform similarly. The 80% prediction interval captured >85% of the observed data for all serotypes and exhibits little variability among the serotypes for the percentages captured (the range is 85.3 to 96.4). In contrast, ±4-fold-difference intervals captured an adequate percentage but exhibited more variability among the serotypes for the percentages, with a range of 72.1 to 98.2.
Overall percentages of observed OPA titers that fall within the defined interval aggregated over serotype and sample