|Home | About | Journals | Submit | Contact Us | Français|
Multidimensional nuclear magnetic resonance (NMR) experiments measure spin-spin correlations, which provide important information about bond connectivities and molecular structure. However, direct observation of certain kinds of correlations can be very time-consuming due to limitations in sensitivity and resolution. Covariance NMR derives correlations between spins via the calculation of a (symmetric) covariance matrix, from which a matrix-square root produces a spectrum with enhanced resolution. Recently, the covariance concept has been adopted to the reconstruction of non-symmetric spectra from pairs of 2D spectra that have a frequency dimension in common. Since the unsymmetric covariance NMR procedure lacks the matrix-square root step, it does not suppress relay effects and thereby may generate false positive signals due to chemical shift degeneracy. A generalized covariance formalism is presented here that embeds unsymmetric covariance processing within the context of the regular covariance transform. It permits the construction of unsymmetric covariance NMR spectra subjected to arbitrary matrix functions, such as the square root, with improved spectral properties. This formalism extends the domain of covariance NMR to include the reconstruction of non-symmetric NMR spectra at resolutions or sensitivities that are superior to the ones achievable by direct measurements.
Multidimensional nuclear magnetic resonance (NMR) is a powerful tool for probing molecular connectivity and structure by displaying magnetization transfer between nuclear spins due their magnetic interaction as correlation peaks in a multidimensional spectrum.1 However, multi-dimensional NMR spectra with high resolution and sensitivity require the acquisition of a large number of scans, which is NMR spectrometer time intensive.2 Establishment of direct correlations between insensitive nuclei, such as 13C and 15N, requires particularly long measurement times.3
Indirect covariance NMR4 offers a linear algebraic approach to establish correlations between pairs of hetero-nuclei that are coupled to a common set of protons. Formally, the indirect covariance transform of the N1 × N2 NMR spectrum X produces the (symmetric) spectrum (where the superscripts T and 1/2 denote the matrix transpose and matrix-square root, respectively). Unsymmetric covariance NMR5-8 generates asymmetric spectra via matrix multiplication of two distinct spectra that share (at least) one common dimension. An example is the multiplication of an 13C-1H HSQC9 with a 1H-1H TOCSY10 to correlate all 1H and 13C nuclei in the same spin system. This reconstructs a 13C-1H HSQC-TOCSY spectrum from two standard 2D experiments without requiring additional measurement time and thereby yields additional 13C, 1H correlations, which can facilitate chemical shift assignment by linking unassigned 13C chemical shifts to already assigned 1H and 13C chemical shifts.6 Hyperdimensional NMR reconstructs high-dimensional spectra, which are often asymmetric, from lower dimensional spectra for the purpose of protein resonance assignment.11,12 COBRA13,14 and Burrow-Owl15 apply linear algebraic spectral manipulations for the same purpose.
An important property of unsymmetric covariance NMR is that the sensitivity of the covariance spectrum is limited only by the sensitivity of the experiments it combines.16 For example, unsymmetric covariance of an 13C-1H HMBC17 with a 13C-1H HSQC spectrum establishes carbon-carbon correlations with the enhanced sensitivity characteristic of an inverse detected 13C-1H heteronuclear spectra rather than that of a direct detected 13C-13C correlation spectrum.4
A key difference between symmetric and un-symmetric covariance NMR is the applicability of the matrix-square root transform. The matrix-square root, which minimizes artifacts due to relay effects and chemical shift (near) degeneracy (“pseudo-relay effects”)4,18-20 is properly defined only for symmetric and positive semi-definite covariance spectra, e.g. when the product matrix is a regular covariance matrix.
In this paper, a general approach is presented for constructing a covariance matrix from multiple NMR spectra. Since the standard covariance transform is recovered as a special case when identical spectra are used as input, the generalized covariance matrix formalism reconciles symmetric and un-symmetric covariance processing. The generalized covariance matrix is symmetric, which makes it amenable to the extraction of arbitrary matrix functions, including the matrix-square root and other matrix powers λ. Depending on the types of spectra that are correlated, application of the square root suppresses false positives. It is found that the analysis of the variation of covariance peak intensity as a function of λ is an effective indicator for the identification of false positives in unsymmetric covariance spectra. Covariation of a 13C-1H HMBC with a 1H-1H TOCSY spectrum to obtain reliable 13C,1H correlations not detectable in the HMBC experiment demonstrates the utility of this method. The generalized covariance formalism therefore expands the power of covariance NMR to the reconstruction of non-symmetric spectra.
Unsymmetric indirect covariance NMR5-8 takes an N1,1 × N2 2D spectrum X1 (matrix) and an N1,2 × N2 2D spectrum X2 and ‘concatenates’ them into a single N1,1 × N1,2 spectrum C via matrix multiplication:
Matrix element Cij of C is a measure of the correlation between the pair (i,j) of spins belonging to the ith row vector of X1 and the jth row vector of X2. Such a correlation either indicates a direct interaction between the two spins, a mutual correlation to a common 3rd spin, e.g. via spin-diffusion in NOESY spectra,18,21 or a pseudo-relay effect due to correlations to different spins with identical chemical shift. In the symmetric case, i.e. X1 = X2, extraction of the matrix-square root effectively reduces both relay and pseudo-relay effects.18,19,22
Generalized (indirect) covariance (GIC) NMR provides a framework in which unsymmetric covariance spectra are embedded in symmetric covariance spectra amenable to general matrix functions. GIC starts out with the construction of a stacked spectrum from n 2D spectra of dimensions N1i × N2 (i = 1,…,n):
A generalized covariance matrix is then defined as
Because of Parseval’s theorem, Eq. (3) yields (up to a constant prefactor) the same result irrespective whether the direct dimensions of X1,…,Xn are in the time domain or in the frequency domain.18 Matrix C is symmetric and semi-positive definite, which permits the straightforward calculation of arbitrary matrix functions, including matrix roots. For n=1, Eq. (3) reduces to the indirect covariance NMR spectrum.4 For n ≥ 2, C contains the unsymmetric covariance matrix given in Eq. (1) as an off-diagonal submatrix. For simplicity, the GIC spectrum from X1 and X2 (n = 2) is denoted by X1*X2 and, when raised to the matrix power λ, by [X1*X2]λ.
For the matrix-square root, λ = ½, it follows C0.5 = U·D·UT and for general powers
Of practical importance, calculation of a series of spectra with different powers λ of C only requires a single SVD, which makes such calculations efficient.
The unsymmetric covariance matrix given by Eq. (1) constitutes an off-diagonal submatrix of the generalized covariance matrix C of Eq. (3). The same submatrix of Cλ defines the λth power of the unsymmetric covariance matrix including the matrix-square root of an unsymmetric covariance matrix.
GIC is applicable to a stack of spectra, X1,…,Xn, as long as each combination of covariance spectra , gives rise to non-diagonal blocks and thereby expands the block-diagonal parts stemming from the “auto-covariances” . GIC can reconstruct any spectrum that factors into individually measurable NMR experiments. For example, a [13C-1H-HMBC*1H-1H-TOCSY]λ covariance spectrum reconstructs a 2D 13C-1H HMBC-TOCSY spectrum while [13C-1H-HMBC*15N-1H-HSQC]λ yields a 2D through-bond 13C-15N correlation spectrum.23 Experiments probing spin-diffusion, relay, or multi-spin correlation effects (NOESY, TOCSY, HMBC) are particularly suitable for GIC analysis due to the analogy between the matrix (square) root operation of covariance NMR and the shortening of the experimental mixing time.18
In symmetric covariance, the matrix-square root minimizes artifacts due to pseudo-relay effects.18,19,22 Likewise, the square root of the generalized covariance matrix suppresses artifacts in sub-matrices belonging to the unsymmetric covariance spectra. Hence, the intensities of pseudo-relay correlation peaks are systematically weakened by the root operation as compared to the intensities of bona fide signals. Generally, the more rapidly the covariance cross-peak intensity Cij(λ) increases with λ, the less likely is that peak to be a valid signal. Hence, the slope of as a function of λ serves as a useful metric by complementing signal intensity alone for assessing the veracity of the signal for matrix element (i,j).
Eq. (5) may be rewritten in terms of matrix elements (where Dk denotes the kth singular value and Uik the ith component of the kth singular vector)
Thus the slope of the natural is
2D 1H-1H-TOCSY10 (90 ms mixing time using MLEV-17 24) and 13C-1H-HMBC spectra17 were recorded at 18.8 T and 298 K for a mixture of seven common metabolites at natural 13C abundance (D-carnitine, D-glucose, L-glutamine, L-histidine, L-lysine, myo-inositol, and shikimic acid) each at a concentration of 10 mM in D2O. The direct 1H dimension of each spectrum was acquired with 2048 complex points and a spectral width of 8013 Hz. The indirect 1H dimension of the TOCSY was acquired with 1024 complex points and the same spectral with as the direct dimension. The indirect 13C dimensions of the HMBC spectrum was acquired with 1024 complex points and a spectral width of 32206 Hz, respectively.
Additionally, 2D 1H-1H-TOCSY (50 ms mixing time using DIPSI-2 25) and 13C-1H HMBC spectra were also recorded at 298 K using a sample of the MDM2-binding p53 peptide construct with sequence ETFSDLWKLLPEN, described previously.26 The spectra were acquired with the same spectral widths as above but with half the number of complex points along each dimension, except for the indirect dimension of the TOCSY having only 256 complex points, and with a spectral width of 44643 Hz in the indirect (13C) dimension of the HMBC spectrum.
All spectra were recorded on a Bruker AVANCE 800 spectrometer equipped with a cryogenic probe and processed in NMRPipe.27 For the HMBC spectra, a magnitude spectrum was calculated after 2D FT.17 All other calculations were performed in Matlab.28
To demonstrate the approach, a generalized indirect covariance (GIC) HMCB*TOCSY spectrum for a 2-component mixture was calculated from a simulated 13C-1H HMBC spectrum (Fig. 1A) and 1H-1H TOCSY spectrum (Fig. 1B) with sharp lines. The mixture consists of two molecules represented by 2 different spin systems: the first has 3 linked 13C,1H pairs X-Y-Z and the second has 2 pairs U-V. To simulate the effects of overlap, the protons of pairs Y and U are assigned degenerate chemical shifts. Related models with different degenerate chemical shifts were explored, but all gave results similar to those reported here. λ = 1 gives rise to a false peak in the generalized indirect covariance spectrum between CX-HV as indicated in (Fig. 1C).
Figure 2A shows the suppression of the false positive CX-HV peak (red) achieved by varying the exponent λ in Eq. (5). This log-linear plot demonstrates the higher slope (Eq. (8)) associated with the false positive signal (red) relative to the true signals (black).
Figure 2B shows the analogous plot for a GIC HMBC*TOCSY spectrum derived from experimental 13C-1H-HMBC and 1H-1H TOCSY spectra of a metabolite mixture sample. The false positive signal, which incorrectly correlates a 13C resonance of myoinositol to a 1H resonance of carnitine, exhibits a systematically stronger λ scaling compared to the true positive signals. Its intensity in the λ = 1 covariance matrix lies between the intensities of two true positive signals, a glucose cross-peak and a myoinositol cross-peak, but when λ = 0.5, its intensity is only as high as the weaker of the two true signals and the slope of its intensity build up as a function of λ is higher than the slope of the true signals. The higher slope and weaker intensity at λ = 0.5 provide a signature that this peak is a false positive.
Fig. 3 demonstrates the preferential suppression of artifact signals via the matrix-square root in two GIC HMBC*TOCSY covariance spectra calculated from two experimental pairs of 13C-1H-HMBC and 1H-1H TOCSY spectra recorded of the metabolite mixture (Fig. 3A,B) and the p53 peptide (Fig. 3C,D). Peak intensity better separates false peaks (red dots) from true peaks (black dots) in the λ = 0.5 spectrum than in the λ = 1 spectrum (Fig. 3A,C and Table 1). However, while intensity in the λ = 1 spectrum alone is a relatively poor indicator of peak veracity, deviations from the trend visible amongst the true peaks in Fig. 3A,C are indicative of peak authenticity: peaks lying on the upper left hand side of the distribution marked by the ellipse, i.e. peaks for which the matrix-square root reduces peak intensity by a large amount, are most likely to be false.
Plotting the slope (Eq. (8)) versus the intensity at λ = 0.5 also separates true from false peaks (Fig. 3B,D). Peaks characterized by especially high slopes relative to their intensity (above and to the left of the ellipse surrounding most peaks) are most likely to be false. In fact, plotting the slope versus the intensity at λ = 0.5 identifies false peaks more effectively than does plotting intensity at λ = 1 versus that at λ = 0.5.
The selection procedure can be formalized by applying principal component analysis (PCA) in two dimensions,29 which in good approximation reproduces the ellipses drawn in Fig. 3. The major axis of the ellipse is given by the first principal component and the minor axis by the second principal component. PCA transforms intensity and slope into a new variable pair of independent statistics that is a linear combination of the original pair. The first principal component adjusts peak intensity using slope information, while the second component combines intensity and slope information into a measure of peak quality. Under the assumption that the principal components are Gaussian distributed, the value for the second principal component calculated for a given peak can be transformed into a p-value that quantifies the probability that this peak is real rather than an artifact arising from spurious chemical shift degeneracy.
The following procedure allows one to edit peaks picked from a GIC derived spectrum: i) perform PCA as described above on (only) the peaks picked in the λ = 0.5 spectrum, ii) reject peaks for which the p-value calculated (as in a one-tailed test) from the second principal component is less than 5%. Application of this procedure cuts the false-positive rate (reported for the λ = 0.5 spectra in Table 1) in half while only rejecting one (p53 peptide) and two (metabolite mixture) true peaks. The peaks plotted in Fig. 3 include only those peaks reported in Table 1 whose line shapes do not qualitatively change as a function of λ as illustrated in Fig. 4. This figure shows a region of the metabolite mixture GIC [HMBC*TOCSY]λ spectrum for different λ values. The unsymmetric covariance spectrum (λ = 1) displays a noise ridge (cross-hatched box) 16 due to the covariance of a signal arising from the carnitine methyl groups with noise. This ridge is suppressed after application of the matrix roots using the GIC formalism.
The decrease in intensity with decreasing λ for the false positive is again much more pronounced than for the other peaks: relative to the other peaks in panel A, peak (3) is quite strong whereas it is weak relative to the other peaks in panel C and negative in panel D. The slope given by Eq. (8) at λ = 0.5 for this peak is 52 while a slope of 45 is typical for this data set. This peak appears in the upper left of Fig. 3B (encircled in red) outside of the ellipse surrounding true peaks. Due to its high slope and low intensity at λ = 0.5, this peak can be easily identified and eliminated improving the analysis of the GIC HMBC*TOCSY spectrum.
Application of λ ≤ 0.5 also recovers the splitting present in the direct dimensions of the HMBC and TOCSY spectra of this mixture, which is lost by covariation of the direct dimension in the unsymmetric covariance process. However, the onset of distortions in line-shape (e.g. peak 2 in Fig. 3D) and signal reduction generally preclude the use of very low λ values (λ ≤ 0.25).
Fig. 5 shows a region of the GIC HMBC*TOCSY spectrum of the p53-peptide. Again, the matrix-square root suppresses a false positive peak and a ridge, demonstrating the applicability of generalized covariance to larger systems, such as peptides. Unlike an experimentally recorded HSQC-TOCSY, the GIC HMBC*TOCSY exhibits correlations connecting quaternary and other non-protonated carbons, such as carbonyl and carboxyl carbons as illustrated in Fig. 6. Thus, GIC provides a powerful representation of spectral information for the resonance assignment of small and large molecules, including peptides.
Many informative spin correlations are not directly accessible by experiment by multidimensional NMR due to measurement and sensitivity considerations. For instance, correlations between insensitive nuclei can often be observed only indirectly, i.e. via correlations between those nuclei via protons. Other spectra, such as heteronuclear NOESY and TOCSY, which contain useful information for resonance assignment and structure determination of complex molecules, are often not collected due to limited sensitivity and spectrometer time constraints. However, unsymmetric covariance NMR can reconstruct heteronuclear TOCSY and NOESY spectra from homonuclear NOESY and TOCSY spectra and common heteronuclear 13C-1H HSQC or HMBC spectra.7
Similarly, the high-dimensional correlation information required to make chemical shift assignments in polypeptides can often only be practically measured by a series of lower dimensional spectra. A typical manual analysis of NMR spectra establishes higher order correlations via a comparison of strip plots. Visual assessment of a non-vanishing correlation of peaks between slices (strip plots) in two NMR spectra links the spin-systems associated with the strip plots being compared. Automated analysis methods, particularly those for protein backbone assignment,30-37 often work with peak lists rather than with the underlying spectra. However, such methods generally require high quality peak lists that are manually curated. Recently developed methods such as hyperdimensional NMR,11,12 COBRA13 and Burrow-Owl15 use unsymmetric covariance5,7 to automate the traditional manual approach of establishing spin correlations via comparison of strip plots, prior to peak picking. However, the application of such methods can confound downstream analysis due to the presence of spurious correlations between strip plots caused by (near-)degenerate chemical shifts and therefore may benefit from the generalized indirect covariance approach presented here. GIC establishes correlations between spectra rather than peak lists and thereby ‘delays’ the otherwise iterative and sometimes difficult process of peak picking until true peaks become self-evident.
The GIC formalism generalizes the use of the matrix-square root for the suppression of relay effects and pseudo-relay effects, originally demonstrated for symmetric covariance NMR spectra,18,19 to unsymmetric covariance spectra.6 Previous work in covariance reconstruction of unsymmetric spectra compared unsymmetric and indirect covariance results in order to identify artifacts in each.20 The generalized covariance matrix (Eq. (3)) presented here computes both unsymmetric and symmetric covariance spectra in the same step. Furthermore, the GIC formalism allows for the extraction of multiple roots in a single covariance calculation. For the examples used here, extraction of the square root via the generalized covariance matrix reduces the false positive count of a HMBC*TOCSY spectrum by about a factor of three. Removal of peaks characterized by weak intensity following extraction of the square root concomitant with a rapid intensity build up with λ further reduces the false positive rate.
The generalized covariance formalism addresses the issue of false positives in unsymmetric covariance spectra caused by resonance overlap and extends the applicability of unsymmetric covariance NMR to systems with an increased number of signals of greater resonance degeneracy, including complex mixtures, for example of metabolites, and biological macromolecules, such as peptides and proteins. By providing a mechanism to identify false positive correlations, generalized indirect covariance lays a linear-algebraic foundation for the accurate and sensitive identification of spin correlations that are distributed over multiple 2D NMR spectra. The establishment of spin correlations that are not easily experimentally observable via an automated method analogous to the comparison of strip plots, mark a path toward the development of computer-based assignment procedures that are as robust as are the most expert manual analyses of NMR data.
We thank Fengli Zhang and Scott Showalter for kindly providing us with the metabolite mixture and p53 peptide NMR spectra, respectively, and Wolfgang Bermel for useful discussion. This work was supported by the National Institutes of Health (Grant GM 066041). The NMR experiments were conducted at the National High Magnetic Field Laboratory (NHMFL) supported by cooperative agreement DMR 0654118 between the NSF and the State of Florida.