Assessment of Noise and Stability of Experimental Data Acquisition
Our ability to accurately quantify trace species using a c
) analysis depends strongly on the theoretical concentration profiles faithfully representing the data within the noise of data acquisition. To have meaningful estimates for the detection and quantitation limits, it is therefore important to assess and understand the magnitude and sources of noise in the experiment. In addition to the statistical noise, there are radial-invariant (RI) and time-invariant (TI) noise components. In our experience the RI noise occurs only in interference data acquisition, and reflects uniform changes in the pathlength difference of the sample and reference beams across the entire solution column, as well as the lack of an absolute assignment of a zero fringe shift. The calculated TI noise can account for the radially-dependent sources of systematic signal offsets, such as those arising from surface roughness of the windows and optical elements in the interference optics, and uneven transmission profiles of the absorbance windows. In our experience, TI noise is unavoidable, even with absorbance optics, and accounting for it invariably leads to a significant improvement in the quality of fit. A key assumption in this regard is that the TI signals are truly constant throughout the entire experimental time needed for macromolecular sedimentation.The magnitude and stability of the experimental noise was tested first in experiments where distilled water was placed in both the sample and the reference compartment. This excludes any concentration gradients or sedimentation process contributing to the measured signal. The SV experiment and data acquisition was conducted as in a standard macromolecular sedimentation experiment (data not shown). The appropriate model for this data is one that accounts only for noise. With the absorbance system, the root-mean-square deviation (rmsd) was 0.0039 OD. (Without modeling the TI noise, the rmsd was 0.0041 OD and the residuals bitmap showed clearly discernable features constant in time). A histogram of the residual values in the absence of sedimentation can be described well with a normal distribution, although the residuals bitmap exhibited some unaccounted systematic errors (Fig. a). For the interference data, the rmsd was 0.0019 fringes (Fig. b). The residuals bitmap showed a low level of systematic deviations discernable from the horizontal lines that may be caused by a transient oscillating angular deflection of some optical element, which likely is responsible for the slightly elevated tails of histogram compared to the normal distribution. When conducting analogous experiments with PBS buffer placed in both sectors instead of water, rmsd values of 0.0039 OD and 0.0034 fringes were observed for the absorbance and interference acquisition, respectively. The origin of the slightly increased rmsd in the interference optics data using PBS in comparison to water is unclear, but may be a result of slight errors in the geometry of the sectors leading to unmatched optical signals from the sedimenting buffer salts.
Such experiments show that the data acquisition is sufficiently stable and its noise components are reasonably well-described with the theoretical model. Based on this observation, one could set a first absolute lower limit of the possible trace component sensitivity at the rmsd divided by the square-root of the number of data points, resulting for the absorbance optics in values of 4
OD, or 0.004% of a hypothetical loading concentration of 1 OD (the reference concentration in the following). Similarly, for the interference optics, this value would be 1
fringes or 0.001%. Clearly, these values are unrealistically low, because (1
) not all data points collected during a sedimentation velocity experiment of a purified antibody would report on the dimer fraction (2
), the use of the c
) model will increase uncertainties because of parameter correlations and bias, and (3
) the presence of any systematic experimental errors in the modeling of the sedimentation profiles of a macromolecule will decrease confidence in the analysis. These points will be examined in the following.
Fig. 3 Error statistics of the absorbance and interference optical SV data. a A histogram of the residuals of an absorbance experiment with water inserted in the cells, and modeled with only TI noise. The solid line shows the best-fit Gaussian distribution ( (more ...)
Sedimentation Analysis of Experimental Data We next proceeded with a second analysis of the buffer data, mimicking the c(s) analysis of macromolecular sedimentation. The signal densities of the derived distributions should ideally be zero, but may deviate from this as a result of the noise in the data. For the absorbance optical data, in the preset expected s-value range of the dimer, a signal density of 0.0001 OD loading concentration was obtained with the standard ME regularization, corresponding to 0.01% of a hypothetical 1 OD signal, while with the Bayesian regularization approach the signal density obtained in the dimer range of c(P)(s) is <10–5. For the interference data, the calculated signal density in the c(s) was 0.0006, corresponding to 0.06% in the ME approach and 0.0009, corresponding to 0.09% in the Bayesian approach. It is noteworthy that by decreasing the smin from 4 S (used for the simulations) to 0.1 in order to account for sedimenting buffer salts due to a composition mismatch between reference and sample sectors—the quality of the fit improves slightly but the additional flexibility in the model results in more signal density deposited within the trace integration range: 0.002% or 0.20% by ME and 0.0023% or 0.23% via Bayesian regularization. This observation was not made when the same procedure was applied to the absorbance data, which points to correlation of the combination of RI and TI noise parameters with the sedimentation coefficient distribution.Although these estimates for the detection sensitivity do take into account which data points carry information on the dimer fraction, and we regard such preliminary buffer experiments as highly useful in establishing confidence in the experimental protocol, this analysis does not yet fully represent the correlation that would be encountered in the c(s) model when applied to the analysis of real macromolecular sedimentation, because so far the non-negativity requirements are actively constraining the possible distribution values. Therefore, we approached the determination of theoretical limits of detection and quantitation by using simulated spike recovery experiments, generating ‘in silico’ data for macromolecular sedimentation profiles mimicking those that would be observed experimentally.To this end, we sought to establish first that experimental macromolecular sedimentation profiles can indeed be modeled with similar quality of noise and stability of systematic TI signal offsets. Figure shows sedimentation profiles of a purified IgG sample acquired with the absorbance system. The rmsd is 0.0059 OD, somewhat higher than in the absence of protein, and slight systematic residuals are observed when the boundary transitions through the radial region of 6.75–6.85 cm. The origin of this is unclear. Further, despite the attempted chromatographical separation, the calculated dimer fraction is 0.42% with ME and 0.59% with the Bayesian estimate. This correlated well with the observation of a small dimer peak when repurifying the sample chromatographically, pointing to some dynamic dimerization process in the IgG batch used.
A different purified IgG stock was used for the collection of interference data, and a discrete component was added to the c(s) analysis to account for any buffer composition mismatch. Despite the appearance of small systematic trends in the residuals (note the scales on the residuals and data plots in Fig. ), the c(s) model using ME is still able to describe well the evolution of the macromolecular concentration gradient to an rmsd of 0.0043 fringes from a loading concentration of 1.6 fringes. Again, chromatographic re-injection of the ‘purified’ material showed the presence of dynamically generated dimers in these samples.
Overall, these sedimentation analyses show that very good fits can be achieved to experimental data, although minor instabilities and low level systematic errors do occur. Since the intrinsic data acquisition is robust (see above), we attribute these adventitious deviations from the theoretical predictions to the effects of convection arising from imperfections in the centerpieces and temperature control (see discussion). Nevertheless, the very good quality of fit and close to random distribution of residuals supports the use of simulated ‘in silico’ mixing experiments to establish theoretical lower limits for the detection sensitivity. Using these as a benchmark for real mixing experiments should facilitate the identification of the experimentally limiting factors.
Fig. 4 Analysis of experimental sedimentation velocity data for an IgG solution in phosphate buffered saline. Displayed in the upper panel are the raw time-dependent radial concentration profiles that were observed at a rotor speed of 50,000 rpm and (more ...)
Fig. 5 Analysis of experimental sedimentation velocity data for an IgG solution in phosphate buffered saline. Displayed in the upper panel are the raw time-dependent radial concentration profiles that were observed at a rotor speed of 50,000 rpm and (more ...) c(s) Analysis with Standard ME Regularization
To explore what the theoretical limits are for the detection and quantitation of trace components, we generated theoretical SV data sets mimicking a highly purified antibody monomer solution that contains discrete amounts of trace levels of irreversibly aggregated dimer. The data sets covered a wide range of possible experimental conditions including several rotor speeds, and acquisition noise levels as well as two different solution column lengths. Figure , derived from the analysis of absorbance data simulated for a 12 mm solution column and 0.005 OD normally distributed noise, illustrates the problem encountered with the standard ME regularization: (1
) the peak of the dimer is not at a constant position, but moves towards higher s
-values with lower concentration; (2
) there appears an additional, artificial peak at the highest s
-limit, growing in amplitude with lower trace concentration.
The distortions of the ME c
) relative to the expected peak positions can be understood from the effect of monomer-dimer correlation. The regularization tends to attribute a fraction of the dimer signal towards the high-s
-value tail of the broadened monomer peak in order to produce a more parsimonious monomer peak. Compensatory shifts in the dimer s
-value then occur.As a result, in some analyses, inspection of the c
) distribution at the s
-value at which the dimer is known to sediment (9.2 S) may erroneously lead one to believe that no dimer is present in this formulation but rather that there are larger aggregates present with s
-values in excess of 10 S or that there are rapid kinetic associations present. This may lead some investigators to assume that their formulation is more contaminated or unstable than it really is. Further, the integration of c
) consistently underestimates the concentration of aggregate in the sample (Table ). Interestingly, as shown in Fig. , the underestimate is nearly independent of the trace concentration, but dependent on the regularization level. At a one standard deviation confidence level (P
0.68), the underestimate is constant ~0.1%, whereas at a regularization level of P
0.95 the trace concentration is underestimated by a close to constant ~0.15%. If using our cutoff limit of 20% relative error for the quantitation of the trace amount, this would lead to LOQ values for standard ME c
) analysis of ~0.6% or 2% when using regularization levels of P
0.68 and P
0.95, respectively. Similarly, if we define the LOD as the concentration limit at which the confidence interval of the estimated trace amount does not include zero, then the LOD of the standard c
) ME analysis appears to be slightly above 0.2%, at least for this experimental condition (Table ).
In summary, the standard ME c
) analysis works well at characterizing the monomer s
-value (Table ), but fails to accurately estimate the s
-value and loading concentration of trace quantities of material below an LOQ of ~0.6–2%.
Fig. 6 Overlay of the trace and monomer (inset) peaks in the c(s) distributions with standard ME with uniform prior probability, calculated from simulated data with absorbance optics, at 50,000 rpm, a solution column length of 12 mm, 1 OD total (more ...)
Fig. 7 Effect of the regularization level on the estimation of the area of the trace peaks in ME c(s). Plotted here is the % trace detected in the analysis vs. the % trace known to be present in the simulation. The solid black line represents perfect quantitation (more ...)
Calibration of the Amplitudes of the Prior Expectation for the Bayesian Approach We performed the calibration procedure described in the Theory section for simulated sample data from a 12 mm solution column at 50,000 rpm, with scan parameters as obtained when using the absorbance optics with 0.005 OD of random noise. Some variation of the area of the dimer peak returned from c(P)(s) was observed dependent on the amplitude of the prior expectation. For example, for the data with a known 1% dimer concentration, c(P)(s) returned values from 0.893% to 1.090% when changing the amplitude of the dimer prior peak between 0.035 and 354. A similar observation was made using the data with other known trace concentrations, where generally an error of ±0.1% was observed for the trace estimate returned by c(P)(s) when using extreme values for the amplitude of the dimer peak in the prior expectation. The best balance was found with the consensus values of 1,000 for the delta-function of the monomer and 0.177 for the dimer Gaussian. Similarly, the amplitude was calibrated for the IF data for the same experimental condition—the consensus values there were 1,000 for monomer and 0.071 for dimer areas, respectively. The strong emphasis on the monomer prior eliminates the peak broadening of the monomer, which in the ME regularization partially detracts from the estimated dimer signal. The small dimer prior amplitude is sufficient to pin the dimer peak to the predicted s-range. This provided the basis for the results shown below.
Bayesian c(P)(s) analysis The c(P)(s) distributions calculated for the same data as shown above for the standard ME c(s) approach are shown in Fig. . First, it is striking that the localization of the dimer peak is much more accurate, and independent of trace concentration. Further, the monomer peak is much sharper, following the design of the prior expectation. Some signal density at higher s-values is obtained at low trace concentration, which is a result of the uniform prior (resembling the standard ME) in that range. However, the signal at large s-values is relatively uniform and most of the signal density is correctly assigned to the dimer peak. As shown in Fig. , the estimated trace populations do not show the systematic offset of −0.1% exhibited by ME. The statistical analysis of the data is given in Table . Using our criterion for a 20% relative error as defining the LOQ, we find the quantitation limit at 0.2%. Similarly, the Bayesian analysis enhances the LOD down to about 0.1%, considering the possible range of error mentioned above arising from the calibration. At very low trace concentrations, the Bayesian analysis tends to overestimate the amount of trace.
Fig. 8 Overlay of the trace aggregate (dimer) and monomer (inset) peaks in the Bayesian c(P)(s) distributions of the same data as analyzed in Fig. . The simulated dimer concentrations percentages (relative to 1 OD total) are 0.6% (red), 0.4% (orange (more ...)
The Analysis of Simulated Trace Aggregate Data Using the Bayesian c(P)(s) for Experimental Condition 1
Other Experimental Conditions We have systematically examined a grid of experimental conditions (Table ). In order to summarize these results and allow a concise comparison, Table shows only the LOQ (arbitrarily set at 20% relative error) obtained at each condition. However, this reflects well the general tendencies.
An obvious trend that holds true under all conditions is that a longer solution column gives significantly better results than a shorter one. This is to be expected simply due to the larger amount of data available, but also due to the better hydrodynamic separation of monomer and dimer, which reduces the monomer-dimer correlation. Likewise, an obvious trend to be expected is that lower noise in the data acquisition leads to significantly better results. This is particularly apparent in the absorbance data. The LOQ is more sensitive to the experimental noise in the standard ME c(s), and less so for the Bayesian c(P)(s) approach.The effect of rotor speed is more complex. For the standard ME c(s), a higher rotor speed improves the LOQ throughout. One can conclude from this that the advantage of providing for better hydrodynamic separation and lower correlation of the monomer and dimer signal outweighs the disadvantage of having fewer data scans and data points, except at the noisiest conditions (absorbance optics with 0.01 OD noise). Similarly, another consequence of the diminishing monomer-dimer correlation at 60,000 rpm is that the systematic error in the trace estimate shown in Fig. is reduced. For example, for absorbance data with 0.005 OD noise it is reduced from −0.1% to −0.03% going from 50,000 rpm in Fig. to 60,000 rpm (data not shown).For the Bayesian c(P)(s), two effects are apparent. First, at low rotor speeds, c(P)(s) suppresses the mis-assignment of the dimer to monomer signal that occurred in ME due to the higher correlation of the monomer and dimer signal. For this reason, c(P)(s) appears to work particularly well at the lower rotor speeds, taking full advantage of the larger number of data points. In particular for the interference optics, at the highest rotor speed of 60,000 rpm under low noise conditions, where the monomer and dimer are better hydrodynamically resolved, it appears in Table that Bayesian c(P)(s) performs worse than the standard maximum entropy c(s) analysis. Under these conditions, a systematic overestimate of the dimer population occurs (for example by approximately constant 0.07% for interference data at 60,000 rpm with 0.002 fringes noise, data not shown). We attribute this to the breakdown of the validity of the calibration values of the prior probabilities, which was derived for conditions of 50,000 rpm. In fact, a trivial exercise in performing the calibration at 60,000 provides significantly better results.
Limits of Quantitation (LOQ) for Measuring Trace Amounts of Aggregate Using ME Regularization or Bayesian Approach for Different Experimental Conditions
Effect of Meniscus Position and Frictional Ratio In order to asses what impact variation of the frictional ratio or the meniscus position would have on the ability to accurately quantify trace aggregates we explored the statistical tolerance of these parameters. The IF data with 0.4% trace aggregate was used for this assay simulated for the experimental condition of 50 krpm, 12 mm solution column length, at 0.002 fringes rmsd noise. With the meniscus position fixed at the known value (6.0 cm) we systematically varied the frictional ratio parameter to be fixed at non-optimal values and determined what effect that had on the estimated trace concentration. Using F-statistics we could calculate the relative increase in the rmsd that would result in a statistically significant decrease in the quality-of-fit. This value was used as a cutoff for accepting estimate trace concentration values. Similarly, we varied the meniscus position to find the limits of statistically acceptable values and observed the effect on trace quantitation. Figure illustrates the results of this exploration of the error surface. In the statistically tolerable range of rmsd variation, the trace estimate varied by ±0.02% for floating values of either the weight-average frictional ratio or the meniscus position. The magnitude of this variance is insignificant when considering that it is well below the LOD. Therefore, it seems that the trace concentration estimate is not sensitive to these floated parameters as long as the parameters used are at the global minimum in the error surface. It should be noted that the effect on the trace estimates increases outside the statistically acceptable range (data not shown), such that one can expect fixing these parameters to incorrect values would potentially lead to significant larger deviations in the trace estimates than those reported here for the confidence intervals of meniscus and frictional ratio.
Fig. 9 Exploring the effect of the statistical variance of the weight average frictional ratio and of the meniscus position, respectively, on the reliability of trace detection by ME for the sample containing 0.4% trace. Plotted on the left axis is the effect (more ...)