|Home | About | Journals | Submit | Contact Us | Français|
Sedimentation velocity analytical ultracentrifugation (SV-AUC) has become an important tool for the characterization of the purity of protein therapeutics. The work presented here addresses a need for methods orthogonal to size-exclusion chromatography for ensuring the reliable quantitation of immunogenic oligomers, for example, in antibody preparations. Currently the most commonly used approach for SV-AUC analysis is the diffusion-deconvoluted sedimentation coefficient distribution c(s) method, previously developed by us as a general purpose technique and implemented in the software SEDFIT. In both practical and theoretical studies, different groups have reported a sensitivity of c(s) for trace oligomeric fractions well below the 1% level. In the present work we present a variant of c(s) designed for the purpose of trace detection, with customized Bayesian regularization. The original c(s) method relies on maximum entropy regularization providing the most parsimonious distribution consistent with the data. In the present paper, we use computer simulations of an antibody system as example to demonstrate that the standard maximum entropy regularization, due to its design, leads to a theoretical lower limit for the detection of oligomeric traces and a consistent underestimate of the trace populations by ~0.1% (dependent on the level of regularization). This can be overcome with a recently developed Bayesian extension of c(s) (Brown et al., Biomacromolecules, 8:2011–2024, 2007), utilizing the known regions of sedimentation coefficients for the monomer and oligomers of interest as prior expectation for the peak positions in the distribution. We show that this leads to more clearly identifiable and consistent peaks and lower theoretical limits of quantization by approximately an order of magnitude for some experimental conditions. Implications for the experimental design of SV-AUC and practical detection limits are discussed.
The safety and efficacy of administered protein therapeutics depends on the absence of potentially immunogenic aggregates or oligomeric degradation products (1–3). Therefore, methods for reliably detecting trace contaminations of oligomeric and aggregate species are very important in the purification and formulation stages, as well as for quality control and comparability studies in the biopharmaceutical industry (4). Due to potential limitations of the standard tool of size exclusion chromatography (SEC) for the detection of aggregates, increasing interest has arisen for techniques orthogonal to SEC that can provide accurate information on the protein size distribution in solution in the absence of a chromatographic matrix. Both asymmetric flow field-flow fractionation (AF4) and sedimentation velocity analytical ultracentrifugation (SV-AUC) can achieve hydrodynamic separation of different size species in free solution (5–7) and are being applied for the quantitation of trace aggregates (8–13).
SV-AUC is a first-principle technique of physical biochemistry that allows to monitor in real-time (2,3) the evolution of macromolecular concentration gradients in free solution arising from a high gravitational field (14). In the last decade the field experienced a transformation with the modern computational abilities allowing us to very efficiently solve the fundamental equations of motion underlying the sedimentation/diffusion process, which had previously been a bottle-neck for the detailed data analysis. Further, modern techniques were developed for the modeling of SV-AUC data directly in the original data space (15), taking maximum advantage of the rich information contained in the shape of the sedimentation boundaries and the excellent signal-to-noise ratio of the ultracentrifugal detection (for recent reviews, see (5,16)) systems. Several years ago, we introduced the sedimentation coefficient distribution c(s) method, which builds on these developments. The key approach of c(s) analysis is the inversion of a Fredholm integral equation of Lamm equation solutions stabilized by maximum entropy regularization to obtain high-resolution, diffusion-deconvoluted sedimentation coefficient distributions from direct modeling of the experimental data (17–19). It has found application in many SV-AUC studies (5,20), and has become the method of choice in most laboratories for the determination of trace populations of aggregates (10–13,21,22), even in the presence of dynamic density gradients created by sedimenting co-solutes (23,24).
Figure 1 illustrates where the information about the population of trace oligomers arises in the experimental SV-AUC data. Faster-sedimenting oligomeric species contribute signal to both deformations in the shape of the leading edge of the sedimentation boundaries, as well as the solution plateau. Given the signal-to-noise ratio of ~200 or ~1,000 of the absorbance optical and interferometric detection, respectively, and the high number of data points reporting from these regions, it is clear that, in principle, an exquisite sensitivity for traces far below 1% is possible. Recent literature has been directed at exploration of the limits of detection (LOD) and limits of quantitation (LOQ) of SV-AUC both from practical (11,22) as well as theoretical viewpoints (12,25). Theoretical limits were estimated to be approximately 0.6% for the quantitation of antibody dimer and 0.2% for larger tetrameric aggregates (12,25), and in practical studies a range of values were reported from a few tenths of a percent for aggregates sedimenting near the main peak, 0.05% for larger aggregates (22) and 0.2% LOD to ~2.8% LOQ for dimeric antibody (11).
In the present work, we have re-examined the limits of quantitation and the role played by the regularization procedure (see the illustration in Fig. 2). Regularization is a computational technique that stabilizes otherwise ill-conditioned inversion problems, such as Fredholm integral equations. The latter have the property that a large number of different distributions fit the experimental data indistinguishably well, opening the possibility of noise amplifying and dominating the major features of the distribution in the over-all best-fit (e.g., solid black line in Fig. 2). Maximum entropy regularization (ME) is the strategy to choose from the set of possible distributions the one that is most parsimonious in the sense that it has the least detail (or highest information entropy), such as to suppress statistically unwarranted features (blue dashed line in Fig. 2). This approach is consistent with the principle of Occam’s razor, and the maximum entropy principle has been used very successfully in many scientific disciplines (26–30), as well as for many SV-AUC studies (20). However, as we found in the present work, close to the limit of detection, the bias towards uniform distributions by ME becomes limiting and tends to slightly and systematically under-report the signals assigned to specific peaks (red dashed line in Fig. 2). This bias towards parsimony results in a broadening of peaks in the distribution which clashes with expectations for the highly purified samples, common to trace aggregation studies, to be represented by sharp, narrow peaks. On the other hand, analyses using models constrained strictly to a few discrete species are too restrictive and typically do not lead to satisfactory fits of the data. To address this problem, we have developed a regularization procedure custom-tailored to the problem of oligomeric trace detection, based on the new tool of Bayesian analysis of sedimentation coefficient distributions c(P)(s) (31). The Bayesian approach allows us to reduce the bias toward uniform distributions, and to actively incorporate our knowledge on the purity and the predetermined sedimentation coefficients of monomer and oligomers into the regularization (red solid line in Fig. 2), thus maintaining flexibility of the model to honor the complete (including unexpected) information of the data. The method is implemented in the public domain software SEDFIT (32). In the present work we have analyzed systematically the limits of quantitation and have tried to identify using a ‘bottom-up’ approach where errors can be introduced.
In the standard c(s) method, the sedimentation coefficient distributions are determined by numerical inversion of the Fredholm integral equation
(17,18,33) where a(r,t) denotes the evolution of the experimentally measured signal profiles, b(r) and β(t) denote the time-invariant (TI) and radial-invariant (RI) signal contributions, and χ1(s,D,r,t) denotes the normalized solution to the Lamm equation (34) for a single species
using the symbol χ(r,t) for the macromolecular concentration distribution as a function of radial position and time, ω for the rotor angular velocity, and s and D for the macromolecular sedimentation and diffusion coefficient, respectively. After discretization of the range of sedimentation coefficients, Lamm equation solutions were calculated for each s-value with the new high-precision Claverie method with dynamic non-equidistant grid described recently (35). The systematic baseline signals b(r) and β(t) were determined explicitly using the algebraic method described in (15). Obviously, we request that c(s) is positive throughout. The relationship between D and s was approximated with the hydrodynamic scaling law
where k denotes the Boltzmann constant, T the absolute temperature, the protein partial specific volume, and ρ and η the solvent density and viscosity (17,18,33). It is based on a weight-average frictional ratio (f/f0)w of all sedimenting species. Since it is a weight (or signal) average, for the application to the problem of trace determination (f/f0)w can be assumed virtually identical to that of the monomer (assuming that is the predominant species). Further, for oligomeric species at low abundance, their boundary shape is not well defined by the experimental data (in contrast to their signal amplitude), and therefore can be approximated well with the frictional ratio of the monomer, even though their true frictional ratio may be different. In principle, other scale relationships are possible and have been implemented in SEDFIT, and even a completely user-defined relationship can be used without further complication and without affecting the conclusions presented in the present paper.
The numerical solution of Eq. 1 using ME regularization is obtained on a grid of s-values between smin and smax (denoted as si with i=1,…,N).
with the indices r and t denoting the discrete radial and time points for which experimental data exist, the normalized Lamm equation solution at r and t for a species with si and D(si). The values ci(si) represent the discrete form of c(s), which are computed on the basis of this constrained optimization problem. N should be chosen such that at least two or three grid points are provided to describe each peak (usually N ~100–300). In Eq. 4, the first term represents the least-squares optimization of the fit to the data points, while the second term represents the ME penalty term.
The scaling parameter α is iteratively adjusted to a value that provides maximum regularization but ensures that the goodness of fit is still not significantly affected by the regularization term, as judged by Fisher statistics for a pre-determined confidence level (e.g. P=0.7 for the conventional one standard deviation), using the absolute best-fit solution obtained in the absence of regularization (α=0) as a reference. Unfortunately, the latter is not useable as a physically reasonable solution because it is susceptible to noise amplification and artificial high-frequency oscillation, as shown by the Lemma of Riemann-Lebesgue (36). In contrast, the ME regularization biases the solution towards that with the minimum information content required to explain the experimental data. In the limit of very little (or very noisy) data, this generates a uniform distribution.
with the prior expectation (or short prior) pi. (Analogous modifications leading to similar results can be made in the alternate regularization method on the basis of the Tikhonov-Phillips approach (31). The resulting distribution is labeled c(P)(s) to indicate its dependence on the prior expectation. In this form, following Bayesian principles (37) the analysis of data with very little signal will reproduce the expected distribution pi. For data with significant signal/noise ratio, the resulting distribution will be the one that deviates as little as possible from the prior expectation while not sacrificing the quality of fit. In the extreme case where the prior expectation is grossly inconsistent with the data, c(P)(s) will be completely different from the pi values.
For the determination of trace oligomers or aggregates, we constructed the prior expectation pi in the following way: we placed a discretized analogue of a Dirac δ-function (31) at the s-value of the monomer, s1. As a systematic series of simulation showed (see Table I below), s1 could be obtained by integrating the monomer peak of a standard c(s) ME analysis of the protein material under investigation, because the dimer peak is sufficiently baseline separated. Assuming that from experience (or from the analysis of SV experiments with sufficiently pure oligomeric material, at concentrations above trace levels) it would be known approximately what the s-value s2 for the dimer is, we additionally placed a Gaussian at s2 with a width of 0.2 S. The finite width of the dimer peak allows accommodating better potentially different dimer conformations, or errors on the estimate of s2. For the simulated data, we used values of s1=6.5 S and s2=9.2 S. Finally, in order to obtain regularization outside these regions with enhanced signal probability, a uniform baseline was added to pi.
As described in (31), it is very difficult and impractical to derive suitable amplitudes of these peaks (i.e. the relative strength of the regularization objectives) from theoretical considerations. In fact, we found the optimal values to be sensitive to the actual conditions of the ultracentrifugation experiment. Therefore, the amplitudes were empirically calibrated using the following bootstrapping procedure: Using the SV data to be analyzed as a template regarding the radial range, time-interval of scans, rotor speed, and signal/noise ratio, one can simulate theoretical SV data with a series of trace concentrations. These data can then be analyzed with c(P)(s) using different monomer to dimer ratios of area under the respective prior peaks. The consensus peak areas that reproduce best overall the underlying simulated trace fractions should be used for the analysis of the experimental data. To accomplish such a calibration procedure, suitable utility functions for the automated serial simulation and analysis of a large number of data sets was implemented in SEDFIT. The values used in the present study were an area of 1,000 for the monomer and 0.177 for the dimer when analyzing the ABS data and 0.071 when analyzing IF data.
The properties of the new analysis method can be tested best with simulated data that mimic the experimental data in all regards, but have known, well-defined fractions of trace oligomers. To this end, and to study the influence of different experimental configurations, we generated 360 data sets which represented simulated data for sedimentation of a highly pure antibody monomer with trace quantities of an aggregate dimer under a variety of experimental conditions (see below). The model protein was an IgG antibody solution with a mass of 150 kDa and sedimentation coefficient of 6.5 S (f/f0=1.566) at a total concentration of 1.0 signal unit. The aggregate was simulated to sediment as a more asymmetric irreversibly aggregated dimer with a sedimentation coefficient of 9.2 S (f/f0 1.756) present in 10 different trace amounts from 2% down to 0.02% of the total signal. These values correspond approximately to those observed experimentally in some studies (10,12). Data were simulated at rotor speeds of 40, 50 and 60 krpm and included a rotor acceleration phase or 200 rpm/s. The scan rate and radial density of points were set at values consistent with acquisition of data using an eight-hole rotor by either the ABS optics (300 s, 0.003 cm) or IF detection system (90 s, 0.0007 cm) and randomly distributed noise was added at levels of 0.01, 0.005, and 0.002 signal units for solution column lengths of either 6 or 12 mm. The total number of scans simulated was adapted to the experimental condition.
SEDFIT (version 10.3c September 2007) was used to analyze the data either by the standard c(s) with ME regularization or by c(P)(s) using the Bayesian prior. In order to ensure a fair comparison, it was necessary to use the same fitting parameters in both analysis methods. Among the different ‘experimental’ conditions, the criteria used for the choice of the total number of scans analyzed was to load all scans until the boundary midpoint of the last scan reached the pre-defined fitting limit of 7.1 cm. As a consequence, more scans and therefore more data points were included when analyzing the data from the 40,000 rpm experiments than for the 60,000 rpm experiments, which will enable a realistic comparison.
For all analyses of the simulated data, the distribution was discretized with 200 s-values from 4 to 15 S, the meniscus position was fixed (unless noted otherwise) at the known value of 6.0 or 6.6 cm for the 12 and 6 mm solution columns, respectively and the bottom position fixed for all data sets at 7.2 cm. The lower fitting limit chosen was 0.01 cm beyond the meniscus position to exclude the region usually obscured by optical artifacts, and the upper fitting limit was chosen as 7.1 cm to exclude back-diffusion. The weight average-frictional ratio was fixed at the simulated monomer value of 1.566 (unless noted otherwise). Software default values were utilized for the vbar (0.73 mL/g), density (1.0 g/mL), and viscosity (0.01002 P). The regularization was set to a confidence level of one standard deviation (P=0.68).
Because the peak for the trace species in the c(s) analysis with ME can drift over a large s-range, particularly at low signal/noise (as we show later), it is not unambiguous how to determine the values for the weight average sedimentation coefficient (sw) and an estimate of the concentration. In order to make a fair comparison of the two methods at all concentrations, it was necessary to assign an integration range for both the monomer and dimer. The range chosen for the monomer was from 5.5 to 7.5 S and for the dimer aggregate 8 to 10.4 S. The integral over the calculated distribution in these ranges were used as the monomer and dimer values, respectively. To this end, a utility function for the consistent automatic use of the same integration ranges was implemented in SEDFIT.
In order to assess the precision of the obtained parameters, we conducted a Monte-Carlo analysis for each fit. The reported values for concentration and weight average sw represent the average of 500 iterations, and the error estimate corresponds to 1 standard deviation. Convergence of the Monte-Carlo statistics at 500 iterations was shown in a series of analyses with different numbers of iterations. For example, increasing the number of iterations to 1,000 did not significantly change the results.
Analytical ultracentrifugation experiments were conducted with a ProteomeLab XL-I protein characterization system (Beckman-Coulter, Palo Alto, CA). For the experiments determining the baseline stability and magnitude of noise, 400 μl of ddH20 were loaded into both the reference and sample sectors of often-utilized AUC cells with charcoal-filled double sector epon centerpieces sandwiched between two sapphire lenses contained within aluminum cell housings. The water samples were centrifuged at 50,000 rpm, 20°C, and interference fringe displacement was imaged every 90 s. The first 100 scans were loaded covering a time range of scans similar to the range analyzed in the real or simulated sedimentation experiments (~9,500 s). For the experiments with human IgG, the sample was obtained from Jackson ImmunoResearch Laboratories, Inc. (ChromPure human IgG lot #60910) and stored at 4°C in 10 mM Sodium Phosphate buffer (pH 7.6), 500 mM NaCl at a stock concentration of 11 mg/mL. Partial separation of oligomers was achieved by gel filtration through a Sephacryl 300 16/60 chromatography column using phosphate buffered saline as the mobile phase. Monomer enriched fractions were combined and diluted to a final concentration of ~1.0 OD280. AUC experiments were conducted as described above.
Our ability to accurately quantify trace species using a c(s) analysis depends strongly on the theoretical concentration profiles faithfully representing the data within the noise of data acquisition. To have meaningful estimates for the detection and quantitation limits, it is therefore important to assess and understand the magnitude and sources of noise in the experiment. In addition to the statistical noise, there are radial-invariant (RI) and time-invariant (TI) noise components. In our experience the RI noise occurs only in interference data acquisition, and reflects uniform changes in the pathlength difference of the sample and reference beams across the entire solution column, as well as the lack of an absolute assignment of a zero fringe shift. The calculated TI noise can account for the radially-dependent sources of systematic signal offsets, such as those arising from surface roughness of the windows and optical elements in the interference optics, and uneven transmission profiles of the absorbance windows. In our experience, TI noise is unavoidable, even with absorbance optics, and accounting for it invariably leads to a significant improvement in the quality of fit. A key assumption in this regard is that the TI signals are truly constant throughout the entire experimental time needed for macromolecular sedimentation.
The magnitude and stability of the experimental noise was tested first in experiments where distilled water was placed in both the sample and the reference compartment. This excludes any concentration gradients or sedimentation process contributing to the measured signal. The SV experiment and data acquisition was conducted as in a standard macromolecular sedimentation experiment (data not shown). The appropriate model for this data is one that accounts only for noise. With the absorbance system, the root-mean-square deviation (rmsd) was 0.0039 OD. (Without modeling the TI noise, the rmsd was 0.0041 OD and the residuals bitmap showed clearly discernable features constant in time). A histogram of the residual values in the absence of sedimentation can be described well with a normal distribution, although the residuals bitmap exhibited some unaccounted systematic errors (Fig. 3a). For the interference data, the rmsd was 0.0019 fringes (Fig. 3b). The residuals bitmap showed a low level of systematic deviations discernable from the horizontal lines that may be caused by a transient oscillating angular deflection of some optical element, which likely is responsible for the slightly elevated tails of histogram compared to the normal distribution. When conducting analogous experiments with PBS buffer placed in both sectors instead of water, rmsd values of 0.0039 OD and 0.0034 fringes were observed for the absorbance and interference acquisition, respectively. The origin of the slightly increased rmsd in the interference optics data using PBS in comparison to water is unclear, but may be a result of slight errors in the geometry of the sectors leading to unmatched optical signals from the sedimenting buffer salts.
Such experiments show that the data acquisition is sufficiently stable and its noise components are reasonably well-described with the theoretical model. Based on this observation, one could set a first absolute lower limit of the possible trace component sensitivity at the rmsd divided by the square-root of the number of data points, resulting for the absorbance optics in values of 4×10−5 OD, or 0.004% of a hypothetical loading concentration of 1 OD (the reference concentration in the following). Similarly, for the interference optics, this value would be 1×10−5 fringes or 0.001%. Clearly, these values are unrealistically low, because (1) not all data points collected during a sedimentation velocity experiment of a purified antibody would report on the dimer fraction (2), the use of the c(s) model will increase uncertainties because of parameter correlations and bias, and (3) the presence of any systematic experimental errors in the modeling of the sedimentation profiles of a macromolecule will decrease confidence in the analysis. These points will be examined in the following.
We next proceeded with a second analysis of the buffer data, mimicking the c(s) analysis of macromolecular sedimentation. The signal densities of the derived distributions should ideally be zero, but may deviate from this as a result of the noise in the data. For the absorbance optical data, in the preset expected s-value range of the dimer, a signal density of 0.0001 OD loading concentration was obtained with the standard ME regularization, corresponding to 0.01% of a hypothetical 1 OD signal, while with the Bayesian regularization approach the signal density obtained in the dimer range of c(P)(s) is <10–5. For the interference data, the calculated signal density in the c(s) was 0.0006, corresponding to 0.06% in the ME approach and 0.0009, corresponding to 0.09% in the Bayesian approach. It is noteworthy that by decreasing the smin from 4 S (used for the simulations) to 0.1 in order to account for sedimenting buffer salts due to a composition mismatch between reference and sample sectors—the quality of the fit improves slightly but the additional flexibility in the model results in more signal density deposited within the trace integration range: 0.002% or 0.20% by ME and 0.0023% or 0.23% via Bayesian regularization. This observation was not made when the same procedure was applied to the absorbance data, which points to correlation of the combination of RI and TI noise parameters with the sedimentation coefficient distribution.
Although these estimates for the detection sensitivity do take into account which data points carry information on the dimer fraction, and we regard such preliminary buffer experiments as highly useful in establishing confidence in the experimental protocol, this analysis does not yet fully represent the correlation that would be encountered in the c(s) model when applied to the analysis of real macromolecular sedimentation, because so far the non-negativity requirements are actively constraining the possible distribution values. Therefore, we approached the determination of theoretical limits of detection and quantitation by using simulated spike recovery experiments, generating ‘in silico’ data for macromolecular sedimentation profiles mimicking those that would be observed experimentally.
To this end, we sought to establish first that experimental macromolecular sedimentation profiles can indeed be modeled with similar quality of noise and stability of systematic TI signal offsets. Figure 4 shows sedimentation profiles of a purified IgG sample acquired with the absorbance system. The rmsd is 0.0059 OD, somewhat higher than in the absence of protein, and slight systematic residuals are observed when the boundary transitions through the radial region of 6.75–6.85 cm. The origin of this is unclear. Further, despite the attempted chromatographical separation, the calculated dimer fraction is 0.42% with ME and 0.59% with the Bayesian estimate. This correlated well with the observation of a small dimer peak when repurifying the sample chromatographically, pointing to some dynamic dimerization process in the IgG batch used.
A different purified IgG stock was used for the collection of interference data, and a discrete component was added to the c(s) analysis to account for any buffer composition mismatch. Despite the appearance of small systematic trends in the residuals (note the scales on the residuals and data plots in Fig. 5), the c(s) model using ME is still able to describe well the evolution of the macromolecular concentration gradient to an rmsd of 0.0043 fringes from a loading concentration of 1.6 fringes. Again, chromatographic re-injection of the ‘purified’ material showed the presence of dynamically generated dimers in these samples.
Overall, these sedimentation analyses show that very good fits can be achieved to experimental data, although minor instabilities and low level systematic errors do occur. Since the intrinsic data acquisition is robust (see above), we attribute these adventitious deviations from the theoretical predictions to the effects of convection arising from imperfections in the centerpieces and temperature control (see discussion). Nevertheless, the very good quality of fit and close to random distribution of residuals supports the use of simulated ‘in silico’ mixing experiments to establish theoretical lower limits for the detection sensitivity. Using these as a benchmark for real mixing experiments should facilitate the identification of the experimentally limiting factors.
To explore what the theoretical limits are for the detection and quantitation of trace components, we generated theoretical SV data sets mimicking a highly purified antibody monomer solution that contains discrete amounts of trace levels of irreversibly aggregated dimer. The data sets covered a wide range of possible experimental conditions including several rotor speeds, and acquisition noise levels as well as two different solution column lengths. Figure 6, derived from the analysis of absorbance data simulated for a 12 mm solution column and 0.005 OD normally distributed noise, illustrates the problem encountered with the standard ME regularization: (1) the peak of the dimer is not at a constant position, but moves towards higher s-values with lower concentration; (2) there appears an additional, artificial peak at the highest s-limit, growing in amplitude with lower trace concentration.
The distortions of the ME c(s) relative to the expected peak positions can be understood from the effect of monomer-dimer correlation. The regularization tends to attribute a fraction of the dimer signal towards the high-s-value tail of the broadened monomer peak in order to produce a more parsimonious monomer peak. Compensatory shifts in the dimer s-value then occur.
As a result, in some analyses, inspection of the c(s) distribution at the s-value at which the dimer is known to sediment (9.2 S) may erroneously lead one to believe that no dimer is present in this formulation but rather that there are larger aggregates present with s-values in excess of 10 S or that there are rapid kinetic associations present. This may lead some investigators to assume that their formulation is more contaminated or unstable than it really is. Further, the integration of c(s) consistently underestimates the concentration of aggregate in the sample (Table I). Interestingly, as shown in Fig. 7, the underestimate is nearly independent of the trace concentration, but dependent on the regularization level. At a one standard deviation confidence level (P=0.68), the underestimate is constant ~0.1%, whereas at a regularization level of P=0.95 the trace concentration is underestimated by a close to constant ~0.15%. If using our cutoff limit of 20% relative error for the quantitation of the trace amount, this would lead to LOQ values for standard ME c(s) analysis of ~0.6% or 2% when using regularization levels of P=0.68 and P=0.95, respectively. Similarly, if we define the LOD as the concentration limit at which the confidence interval of the estimated trace amount does not include zero, then the LOD of the standard c(s) ME analysis appears to be slightly above 0.2%, at least for this experimental condition (Table I).
In summary, the standard ME c(s) analysis works well at characterizing the monomer s-value (Table I), but fails to accurately estimate the s-value and loading concentration of trace quantities of material below an LOQ of ~0.6–2%.
We performed the calibration procedure described in the Theory section for simulated sample data from a 12 mm solution column at 50,000 rpm, with scan parameters as obtained when using the absorbance optics with 0.005 OD of random noise. Some variation of the area of the dimer peak returned from c(P)(s) was observed dependent on the amplitude of the prior expectation. For example, for the data with a known 1% dimer concentration, c(P)(s) returned values from 0.893% to 1.090% when changing the amplitude of the dimer prior peak between 0.035 and 354. A similar observation was made using the data with other known trace concentrations, where generally an error of ±0.1% was observed for the trace estimate returned by c(P)(s) when using extreme values for the amplitude of the dimer peak in the prior expectation. The best balance was found with the consensus values of 1,000 for the delta-function of the monomer and 0.177 for the dimer Gaussian. Similarly, the amplitude was calibrated for the IF data for the same experimental condition—the consensus values there were 1,000 for monomer and 0.071 for dimer areas, respectively. The strong emphasis on the monomer prior eliminates the peak broadening of the monomer, which in the ME regularization partially detracts from the estimated dimer signal. The small dimer prior amplitude is sufficient to pin the dimer peak to the predicted s-range. This provided the basis for the results shown below.
The c(P)(s) distributions calculated for the same data as shown above for the standard ME c(s) approach are shown in Fig. 8. First, it is striking that the localization of the dimer peak is much more accurate, and independent of trace concentration. Further, the monomer peak is much sharper, following the design of the prior expectation. Some signal density at higher s-values is obtained at low trace concentration, which is a result of the uniform prior (resembling the standard ME) in that range. However, the signal at large s-values is relatively uniform and most of the signal density is correctly assigned to the dimer peak. As shown in Fig. 7, the estimated trace populations do not show the systematic offset of −0.1% exhibited by ME. The statistical analysis of the data is given in Table II. Using our criterion for a 20% relative error as defining the LOQ, we find the quantitation limit at 0.2%. Similarly, the Bayesian analysis enhances the LOD down to about 0.1%, considering the possible range of error mentioned above arising from the calibration. At very low trace concentrations, the Bayesian analysis tends to overestimate the amount of trace.
We have systematically examined a grid of experimental conditions (Table III). In order to summarize these results and allow a concise comparison, Table III shows only the LOQ (arbitrarily set at 20% relative error) obtained at each condition. However, this reflects well the general tendencies.
An obvious trend that holds true under all conditions is that a longer solution column gives significantly better results than a shorter one. This is to be expected simply due to the larger amount of data available, but also due to the better hydrodynamic separation of monomer and dimer, which reduces the monomer-dimer correlation. Likewise, an obvious trend to be expected is that lower noise in the data acquisition leads to significantly better results. This is particularly apparent in the absorbance data. The LOQ is more sensitive to the experimental noise in the standard ME c(s), and less so for the Bayesian c(P)(s) approach.
The effect of rotor speed is more complex. For the standard ME c(s), a higher rotor speed improves the LOQ throughout. One can conclude from this that the advantage of providing for better hydrodynamic separation and lower correlation of the monomer and dimer signal outweighs the disadvantage of having fewer data scans and data points, except at the noisiest conditions (absorbance optics with 0.01 OD noise). Similarly, another consequence of the diminishing monomer-dimer correlation at 60,000 rpm is that the systematic error in the trace estimate shown in Fig. 7 is reduced. For example, for absorbance data with 0.005 OD noise it is reduced from −0.1% to −0.03% going from 50,000 rpm in Fig. 7 to 60,000 rpm (data not shown).
For the Bayesian c(P)(s), two effects are apparent. First, at low rotor speeds, c(P)(s) suppresses the mis-assignment of the dimer to monomer signal that occurred in ME due to the higher correlation of the monomer and dimer signal. For this reason, c(P)(s) appears to work particularly well at the lower rotor speeds, taking full advantage of the larger number of data points. In particular for the interference optics, at the highest rotor speed of 60,000 rpm under low noise conditions, where the monomer and dimer are better hydrodynamically resolved, it appears in Table III that Bayesian c(P)(s) performs worse than the standard maximum entropy c(s) analysis. Under these conditions, a systematic overestimate of the dimer population occurs (for example by approximately constant 0.07% for interference data at 60,000 rpm with 0.002 fringes noise, data not shown). We attribute this to the breakdown of the validity of the calibration values of the prior probabilities, which was derived for conditions of 50,000 rpm. In fact, a trivial exercise in performing the calibration at 60,000 provides significantly better results.
In order to asses what impact variation of the frictional ratio or the meniscus position would have on the ability to accurately quantify trace aggregates we explored the statistical tolerance of these parameters. The IF data with 0.4% trace aggregate was used for this assay simulated for the experimental condition of 50 krpm, 12 mm solution column length, at 0.002 fringes rmsd noise. With the meniscus position fixed at the known value (6.0 cm) we systematically varied the frictional ratio parameter to be fixed at non-optimal values and determined what effect that had on the estimated trace concentration. Using F-statistics we could calculate the relative increase in the rmsd that would result in a statistically significant decrease in the quality-of-fit. This value was used as a cutoff for accepting estimate trace concentration values. Similarly, we varied the meniscus position to find the limits of statistically acceptable values and observed the effect on trace quantitation. Figure 9 illustrates the results of this exploration of the error surface. In the statistically tolerable range of rmsd variation, the trace estimate varied by ±0.02% for floating values of either the weight-average frictional ratio or the meniscus position. The magnitude of this variance is insignificant when considering that it is well below the LOD. Therefore, it seems that the trace concentration estimate is not sensitive to these floated parameters as long as the parameters used are at the global minimum in the error surface. It should be noted that the effect on the trace estimates increases outside the statistically acceptable range (data not shown), such that one can expect fixing these parameters to incorrect values would potentially lead to significant larger deviations in the trace estimates than those reported here for the confidence intervals of meniscus and frictional ratio.
In the present work we have examined the limits of sedimentation velocity for the quantitation of oligomeric protein aggregates. Complementary to a recent experimental study (11), we have aimed at building up a theoretical model that can provide guidance for the experimental setup and provide realistic estimates as benchmarks to test the experimental performance against, and thereby identify the remaining limiting experimental factors. In the process, we have developed a customized data analysis approach that can significantly extend the sensitivity of the method.
The detection limits for trace aggregates seem to be the result of two factors—the number and noise of data points, and the hydrodynamic resolution. SV data has excellent statistical properties, with on the order of 104–105 data points with signal/noise ratio of up to 1,000:1 and quite stable baseline profiles throughout the experimental time. As a result, it is no surprise that the detection limit can routinely be lower than the rmsd noise in the data acquisition. Unfortunately, however, for small or medium sized proteins such as antibodies, the dimer sedimentation takes place in the leading edge of the diffusionally broadened boundary of the monomer, such that no separate dimer boundary can be directly discerned. Although the deconvolution of diffusion has become possible with the c(s) approach, this required inversion of an integral equation is intrinsically an ill-posed problem, and a residual correlation between the estimated monomer and dimer signals exists.
Regularization is the state-of-the-art technique for the suppression of noise amplification and artifactual spikes, and indispensable for reliable data interpretation involving distributions and integral equations. In the standard maximum entropy approach, it biases the solution towards maximal information entropy, assuming uniform probabilities for the occurrence of species at all s-values. This principle will give the simplest distribution consistent with the raw data and with the principle of Occam’s razor, and has been extremely useful in a large number of studies in a variety of biophysical disciplines. In the context of trace detection, unfortunately it has some undesirable properties, most notably the tendency to overestimate the area of the monomer peak, at the cost of diminishing the calculated dimer fraction and an artificial shift of the dimer peak. Thus, ME provides a more parsimonious curve, but does not reflect what we know about the sample. In practice, this limits the ability of the c(s) method to quantify trace levels of aggregates. We estimate the LOQ to be ~1–2% for conditions close to those frequently applied in the literature for this problem, using the absorbance system with 0.005 OD–0.010 OD noise at 40,000 rpm (Table III). Our predicted theoretical limits are not far from the LOQ values observed experimentally in the work of Pekar & Sukumar (11) and those cited by Berkowitz (13).
The new Bayesian approach is a refinement of the ME regularization to take into account different probabilities of signal densities in different regions of s-values in the interpretation of the experimental data. As shown here, it can be effectively used to eliminate the undesirable excessive broadening of the monomer peak and shift of the dimer peak. For the same conditions of using the absorbance system with 0.005 OD–0.010 OD noise at 40,000 rpm, the LOQ is improved by a factor 5–10 (Table III). The implementation explored here uses prior information of the s-value of the monomer and dimer, in combination with the expectation that the monomer peak should be mono-disperse (i.e. very sharp) due to the high purity and intrinsic discrete nature of the protein sample. For the prior expectation of the dimer, it should be noted that the dimer does not require a precise assignment, allowing potentially for different dimer conformations to be accounted for in the same slightly broadened peak. Other schemes for assigning prior probabilities to suppress the broadening of the monomer peak towards the higher s-values, which do not require the assumption of a sharp monomer peak or knowledge of the precise s-value, may be possible, such as perhaps a linearly decreasing probability distribution in the range of the monomer. Further, the extension to include prior expectations of the localization of the peaks of trimers and higher oligomers, either derived from experiments or hydrodynamic scale relationships, should be straightforward. These possibilities are provided for in the SEDFIT software.
A more traditional approach of using prior knowledge would be the implementation of ‘hard’ constraints, for example, the substitution of regions of s-values in the c(s) distribution with discrete sedimenting species, such as in the hybrid discrete/continuous (HDC) model applied in some trace aggregation studies (11). However, if the constraints are not perfectly reflecting the information in the data, models with ‘hard’ constraints will give significantly worse fits, and as a consequence have to be rejected and/or produce uncertain compensatory effects on the estimated signal amplitudes. Interestingly, this expectation was corroborated in Pekar & Sukumar’s observation that the HDC model was less precise for trace detection than the standard ME c(s) approach. There is a fundamental difference between the Bayesian approach and ‘hard’ constraints: The Bayesian approach is much more subtle in that it allows for the data to maintain the same degrees of freedom as the original distribution, and imperfections in the prior expectation will result in c(P)(s) displaying the additional features that are statistically necessary to maintain the same high quality of fit as with the standard c(s). Such features may be, for example, additional species in solution, or micro-heterogeneity in the population of the monomer or oligomers, or slight imperfections in the s-values of the expected species.
A drawback of the Bayesian approach is that the amplitudes of the peaks assigned in the prior expectation should be calibrated to the experimental conditions for best results. However, performing the calibration procedure for the run conditions planned to apply in the SV experiment does not provide any principal obstacle, and to facilitate this process, a serial simulation and analysis tool has been provided in SEDFIT. However, examining the sensitivity of the results to extreme choices of the amplitudes, we found that the errors are within only 0.1% of the true dimer fraction that was known to be present from the simulations. Further, although we calibrated the prior probabilities with data from one particular condition (absorbance data or IF data from a 12 mm solution column at 50,000 rpm with 0.005 OD noise), the values were sufficiently robust to provide a significant improvement of c(P)(s) over c(s) under all conditions except for those at 60,000 rpm. These results suggest that the calibration will also be sufficiently robust to permit application to real data (which will typically vary slightly in quality from cell to cell).
Our results indicate that when using the standard ME c(s) approach, it would be better to conduct the SV experiment at the highest rotor speed of 60,000 rpm, where the hydrodynamic resolution is best. This is contrary to current practice. Lower rotor speeds may be preferred because they produce lower strain on the centerpieces, provide double the number of samples per run (allowing an eight-hole rotor to be used), and likely would offer better opportunity to detect larger oligomers and aggregates, which at 60,000 rpm would only reside for a short time in the observation window. With the new Bayesian analysis method, however; the disadvantage of using lower rotor speeds is overcome by the use of the more appropriate regularization procedure. Also, our results suggest that interferometric detection should be significantly better than the absorbance system, considering also that for an IgG molecule the same sample concentration will provide at least twice the signal (or half the relative noise), compounding the advantages of the interference optics. However, in practice this can be more difficult when working with complex buffers and difficult to achieve optical and geometric matching of the sample with the buffer in the reference sector. Our experimental results on the stability of the optical detection systems also points towards greater correlation of systematic noise contributions in interference data (requiring both TI and RI offsets to be considered) with the estimated dimer concentration. Independent of rotor speed and detection system, the longest possible solution column clearly provides the best data.
Factors influencing the model that were not included in the present theoretical analysis are the frictional ratio and the meniscus position. It was pointed out that the hydrodynamic scale relationship underlying both the c(s) and the new c(P)(s) approach, which states that the diffusion coefficients of all species can be approximated with a single frictional ratio, does not strictly apply to antibody oligomers (12,21,38). However, for the purpose of the determination of trace concentrations, it obvious that the actual boundary shape of the trace species is very poorly defined in the experimental data (in contrast to the boundary amplitude of the trace), such that the approximation of a single frictional ratio by that of the monomer should be excellent. This was confirmed in the present work, which used experimentally determined s-values (10,12) as the basis for the simulation. However, there is no additional complexity in exchanging the hydrodynamic scaling law by a user-defined relationship between s and M, as provided for in SEDFIT. Such a relationship could be easily empirically established by interpolation of the known s-values for the different oligomers.
Regarding the meniscus position, we fixed it in the present work to the known position. In the analysis of experimental data, it should be floated since the meniscus position is obstructed from view due to the optical artifacts. There is no known relationship between the shape of the artifact and the true meniscus position, other than the bounds of the optical artifact representing bounds also for the true meniscus position. It is questionable whether any empirically observed relationship would hold true, for example, for different wavelengths, buffer refractive indices, solution refractive index, and sample surface tension and sample concentration. As shown, for example, in the study by Pekar & Sukumar, and consistent with the experience from studies with different systems in our own laboratory, the meniscus position is typically well-determined by the data (11). Fixing the meniscus position from visual inspection will likely introduce a systematic bias and may lead to a significantly lower quality of fit and therefore reduce the confidence in the results.
If the rmsd is much larger than the noise of data acquisition or if the residuals show systematic behavior, then the fitted distribution should be assumed not to be an accurate representation of the material in the cell and therefore no confidence should be given to the integrated concentrations. In general, to what extent a misfit can bias the estimated trace components can be decided only empirically for specific systems. However, the literature of c(s) shows—and the present study confirms—that it is possible to obtain models for the measured concentration profiles that fit experimental data close to the noise of data acquisition, at least under dilute conditions where macromolecular hydrodynamic non-ideality is still absent and where dynamic density gradients from co-solvents at high concentrations are absent. As a consequence, the LOQ reported here should be benchmarks that may be used to systematically identify (and hopefully then eliminate) further experimental limitations in the practice of trace detection with samples from biotechnology products. Pekar & Sukumar have identified possible sources of bottlenecks in experimental practice (11), including the mechanical stability and smoothness of the centrifugal cells, and it may be impacted also by convection arising from temperature gradients. In any case, however, we believe that the Bayesian approach may be more appropriate even with imperfect experiments, to eliminate the constant small under-estimation expected from the standard c(s) and to oppose the known peak-deforming properties of ME regularization when applied near the LOQ.
This work was supported by the Intramural Research Program of the National Institutes of Health, NIBIB. The authors wish to thank A. Pekar and his colleagues for providing access to preprints of their recent publication concerning trace detection during the preparation of this manuscript.