Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2806822

Formats

Article sections

Authors

Related links

Anal Chim Acta. Author manuscript; available in PMC 2011 January 11.

Published in final edited form as:

PMCID: PMC2806822

NIHMSID: NIHMS157626

Donald A. Barkauskas: gro.hcraeseruc@saksuakrab.nod

The publisher's final edited version of this article is available at Anal Chim Acta

See other articles in PMC that cite the published article.

A common feature of many modern technologies used in proteomics—including nuclear magnetic resonance imaging and mass spectrometry—is the generation of large amounts of data for each subject in an experiment. Extracting the signal from the background noise, however, poses significant challenges. One important part of signal extraction is the correct identification of the baseline level of the data. In this article, we propose a new algorithm (the “BXR algorithm”) for baseline estimation that can be directly applied to different types of spectroscopic data, but also can be specifically tailored to different technologies. We then show how to adapt the algorithm to a particular technology—matrix-assisted laser desorption/ionization Fourier transform ion cyclotron resonance mass spectrometry—which is rapidly gaining popularity as an analytic tool in proteomics. Finally, we compare the performance of our algorithm to that of existing algorithms for baseline estimation.

The BXR algorithm is computationally efficient, robust to the type of one-sided signal that occurs in many modern applications (including NMR and mass spectrometry), and improves on existing baseline-estimation algorithms. It is implemented as the function baseline in the *R* package FTICRMS, available either from the Comprehensive *R* Archive Network (http://www.r-project.org/) or from the first author.

A common feature of many modern technologies used in proteomics—including nuclear magnetic resonance imaging and mass spectrometry—is the generation of large amounts of data for each subject in an experiment. Extracting the signal from the background noise, however, poses significant challenges. One important part of signal extraction is the correct identification of the baseline level of the data. In this article, we first generalize an algorithm of Xi and Rocke [1] which was developed for NMR baseline correction and show how it can be applied to data from generic spectroscopic technologies. We also indicate how it can be adapted to the unique qualities of different technologies and illustrate this by adapting it to a specific technology: matrix-assisted laser desorption/ionization Fourier transform ion cyclotron resonance mass spectrometry (MALDI FT-ICR MS). Finally, we compare the performance of our algorithm to that of existing algorithms for baseline estimation.

There are different possible interpretations of what exactly a “baseline” is in spectroscopic analysis. If it is assumed that the signal is all positive standing out from a (theoretically) zero baseline level, then some kind of (smoothed) running minimum would be an appropriate baseline. This is the approach taken in software packages such as Cromwell [2], LCMS-2D [3], LIMPIC [4], PrepMS [5], and PROcess [6]. Alternatively, if the noise is assumed to fluctuate about a baseline level (like in an independent, identically-distributed (iid) normal case), then some measure of center (median, mean, etc.) is more appropriate. This is the approach taken in software packages such as msInspect [7]. Software packages such as LMS [8] have options to compute either of these types of baselines. (A third common type of analysis is continuous wavelet analysis, which does not have a separate baseline correction step as such; the baseline is automatically removed as part of the wavelet transformation. This is the approach taken in software packages such as MassSpecWavelet [9] and OpenMS [10].)

The Xi-Rocke algorithm uses the second interpretation of baseline (measure of center of the noise), and as explained in Section 3, this is the appropriate way to analyze (in particular) MALDI FT-ICR MS spectra, and is arguably appropriate in other applications. In the remainder of this article we will concentrate on this type of baseline and compare our algorithm to LMS. (msInspect was designed for liquid chromatography mass spectrometry and is not directly comparable to our current algorithm.)

Suppose that the data have the form (*x _{t}*,

$$F(\{{b}_{t}\})=\sum _{t=1}^{n}{b}_{t}-{A}_{1}\sum _{t=2}^{n-1}{({b}_{t-1}-2{b}_{t}+{b}_{t+1})}^{2}-{A}_{2}\sum _{t=1}^{n}{[{({b}_{t}-{y}_{t})}_{+}]}^{2}$$

(1)

Here, *z*_{+} max{*z*, 0}, *b _{t}* represents the value of the baseline at the

To make the analysis easier, we change notation. Let ** b** = (

$$F(\mathit{b})={\mathbf{1}}^{\prime}\mathit{b}-{\mathit{b}}^{\prime}{\mathbf{\Delta}}_{2}\mathit{b}-(\mathit{b}-{\mathit{y}}^{\prime})\mathit{N}(\mathit{b}-\mathit{y}),$$

(2)

where ** N** is an

$${\mathit{M}}_{2}=[\begin{array}{ccccccc}1& -2& 1& 0& 0& 0& \phantom{\rule{0.16667em}{0ex}}\\ 0& 1& -2& 1& 0& 0& & 0& 0& 1& -2& 1& 0& \phantom{\rule{0.16667em}{0ex}}\\ \phantom{\rule{0.16667em}{0ex}}& \phantom{\rule{0.16667em}{0ex}}& \phantom{\rule{0.16667em}{0ex}}& \phantom{\rule{0.16667em}{0ex}}& \phantom{\rule{0.16667em}{0ex}}& \\ ]\end{array}$$

is an (*n*−2) ×*n* matrix and *A*_{1} is an (*n* − 2) × (*n* − 2) diagonal matrix with entries *A*_{1,}* _{t}*. We will refer to the process of maximizing this modified score function with respect to the baseline

Note that the only change that the BXR algorithm makes over Xi and Rocke’s original algorithm is allowing *A*_{1} and *A*_{2} to vary with *t*. This seemingly minor change has a profound impact on the effectiveness of the algorithm in the analysis of real-world data, however, as we show in Sections 2.3 and 3.

To maximize the function in Equation (2), we calculate the gradient and Hessian:

$$\begin{array}{l}F(\mathit{b})=\mathbf{1}-2{\mathbf{\Delta}}_{2}\mathit{b}-2\mathit{N}(\mathit{b}-\mathit{y})& [H(F)](\mathit{b})=-2({\mathbf{\Delta}}_{2}+\mathit{N}).\end{array}$$

Note that *F* is continuous everywhere, and *H*(*F*) is continuous except for jump discontinuities where *b _{t}* =

Since the estimated baseline should be linear in the data (i.e., for any constants *m* and *c*, if ** b** corresponds to

$$\frac{\text{Var}(\mathit{b})}{\text{Var}(\mathit{y})}.$$

(3)

The optimal value of
${A}_{1}^{}$ can then be estimated by calculating the baseline ** b** using different choices for
${A}_{1}^{}$ and seeing which one gives the best match to the ACF of the noise portion of the spectrum when substituted into Equation (3). (See Figure 3 in Section 3 for an example of this applied to a MALDI FT-ICR spectrum.)

In order to determine *A*_{2,}* _{t}*, we set the

(Of course, if *g*(*Y _{t}*) were known, there would be no need to run the algorithm.) One obvious choice for

$${A}_{2,t}=\frac{1}{2\mathbb{E}\{{(\mathbb{E}{Y}_{t}-{Y}_{t})}_{+}\}}=\frac{1}{2(\sigma /\sqrt{2\pi})}=\frac{\sqrt{\pi /2}}{\sigma},$$

which recovers the result in Xi and Rocke [1].

We observe that for an arbitrary random variable *Y*, we have {(*Y* − *Y*)_{+}} = |*Y* − *Y*|/2. Thus, if there is no information about the distribution of the random variables {*Y _{t}*}, then a reasonable choice might be

Figure 1 shows the effect of these choices on the estimated baseline. We ran two simulations of 973,720 observations (the number of observations in the “noise” spectrum analyzed in Section 3), each with *x*-coordinates equally spaced between zero and three and with baseline given by *y* = sin(2*πx*). In the first simulation we used iid (0, 1) noise added to the baseline, and in the second we used independent
$\mathcal{N}(0,{\sigma}_{x}^{2})$ noise, where *σ _{x}* = 1 + 0.5 cos(4

Baseline estimation for simulated homoscedastic and heteroscedastic normal data using each of the two possible choices for *A*_{2,}_{t} from Section 2.3.

For the data generated with homoscedastic noise, both versions of the BXR algorithm perform well, with only a small amount of bias near the extreme values of the baseline (Figure 1, top). The major advantage here is that the algorithm that assumes homoscedasticity runs much faster (using roughly 10–20% of the computing time, depending on the exact convergence criterion chosen). However, if the noise is actually heteroscedastic, then assuming homoscedasticity causes the BXR algorithm to badly mis-estimate the baseline—underestimating the baseline when the variance is above average and overestimating the baseline when the variance is below average (Figure 1, bottom). The distribution-free version of the BXR algorithm, however, still produces a result that is almost indistinguishable from the true baseline. Thus, if the noise can reasonably be assumed to be iid normal, then
${A}_{2,t}={\sigma}^{-1}\sqrt{\pi /2}$ is a good choice, but if the noise is heteroscedastic with unknown distribution, then *A*_{2,}* _{t}* = (

Of course, if information on the distribution of the noise for a particular technology is available, it would be advantageous to explore whether distribution-specific choices for *A*_{1,}* _{t}* and

Matrix-assisted laser desorption/ionization Fourier transform ion cyclotron resonance mass spectrometry (MALDI FT-ICR MS) is a technique for high mass-resolution analysis of substances that is rapidly gaining popularity as an analytic tool in proteomics. Typically in MALDI FT-ICR MS, a sample (the *analyte*) is mixed with a chemical that absorbs light at the wavelength of the laser (the *matrix*) in a solution of organic solvent and water. The resulting solution is then spotted on a MALDI plate and the solvent is allowed to evaporate, leaving behind the matrix and the analyte. A laser is fired at the MALDI plate and is absorbed by the matrix. The matrix breaks apart and transfers a charge to the analyte, creating the ions of interest (with fewer fragments than would be created by direct ablation of the analyte with a laser). The ions are guided with a quadrupole ion guide into the ICR cell where the ions cyclotron in a magnetic field. While in the cell, the ions are excited and ion cyclotron frequencies are measured. The angular velocity, and therefore the frequency, of a charged particle is determined solely by its mass-to-charge ratio. Using Fourier analysis, the frequencies can be resolved into a sum of pure sinusoidal curves with given frequencies and amplitudes. The frequencies correspond to the mass-to-charge ratios and the amplitudes correspond to the concentrations of the compounds in the analyte. FT-ICR MS is known for high mass resolution, with separation thresholds on the order of 10^{−3} Daltons (Da) or better [13, 14].

As an application of the methods developed in Section 2, we use them on two MALDI FT-ICR spectra, one of which is a “noise” spectrum—one created with no analyte or matrix, pictured in Figure 2—and the other of which was prepared for a cancer study [15] with human blood serum as the analyte. The spectra analyzed in this article were recorded in the Lebrilla lab in the Chemistry Department at the University of California at Davis on an external source MALDI FT-ICR instrument (HiResMALDI, IonSpec Corporation, Irvine, CA) equipped with a 7.0 T superconducting magnet and a pulsed Nd:YAG laser 355 nm. The serum sample was collected at the University of California at Davis Cancer Center; the patient gave written informed consent under an IRB-approved protocol.

A typical noise spectrum. The spike extending off the top of the picture is actually two peaks at frequencies of 41.21 kHz and 42.21 kHz which extend upward to intensities of approximately 222.7 and 95.4, respectively.

The BXR algorithm is especially suited to analyzing MALDI FT-ICR spectra because of the following property, first observed in Barkauskas *et al*. [12]: the data obtained by dividing the noise portion of a MALDI FT-ICR spectrum by the expected value at each point can be closely modeled by a causal and invertible autoregressive, moving-average time series with generalized gamma innovations. Thus, identifying the mean level of the noise determines the entire distribution of the noise, which leads to a nice method for identifying peaks in a MALDI FT-ICR spectrum as either noise or signal.

It follows that if we consider the random variables ${Y}_{t}^{\prime}={Y}_{t}/\mathbb{E}{Y}_{t}$, then { ${Y}_{t}^{\prime}$} should be identically distributed with mean 1. (Note that { ${Y}_{t}^{\prime}$} are not independent; the autocorrelation function is non-trivial.) Thus, we get

$$\mathbb{E}\{{(\mathbb{E}{Y}_{t}-{Y}_{t})}_{+}\}=\mathbb{E}{Y}_{t}\xb7\mathbb{E}\{{(1-{Y}_{t}^{\prime})}_{+}\}.$$

Using the spectrum in Figure 2 as {*Y _{t}*} (and running means to estimate {

To choose an appropriate value of
${A}_{1}^{}$, we tried values of 10^{−}* ^{j}* for

For each of the two spectra, we calculated the baseline using four methods: a running Tukey’s biweight with *K* = 9 and bandwidth 8001; and the BXR algorithm with
${A}_{1}^{}$ and *A*_{2,}* _{t}* chosen to be one of the three choices
${\sigma}^{-1}\sqrt{\pi /2}$ (the “iid normal method”, which is just Xi and Rocke’s original algorithm), or (

For the noise spectrum, we used running means with bandwidth 8001. The noise spectrum has two spikes at frequencies of 41.21 kHz and 42.21 kHz which extend upward to intensities of approximately 222.7 and 95.4, respectively, and are apparently instrumental noise (they have no isotope peaks; if they were real compounds, the isotope peaks should be easily large enough to show above the noise). In the calculation of the running means, we set the values of the spectrum at frequencies corresponding to these two peaks to be missing.

For the serum spectrum, the presence of multiple large peaks would badly skew the running means. To get a reasonable estimate of the running means of the noise portion of the spectrum, we used a baseline calculated using the BXR algorithm with parameters *A*_{1,}* _{t}* and

The results for the noise spectrum are displayed in Figure 4, and the results for the serum spectrum are displayed in Figure 5. Note that each algorithm performs similarly for both spectra. Specifically, the iid normal method underestimates the baseline for small frequencies (where the noise variance is large) and overestimates the baseline for large frequencies (where the noise variance is small), as expected. The distribution-free method does much better, but is still consistently underestimating the baseline. Furthermore, in the noise spectrum the bias increases in absolute value as the frequency decreases. The running Tukey’s biweight underestimates the running means by a fairly consistent amount (although by less than the distribution-free method), which is not surprising, since a simple calculation shows that the distribution of the noise is right-skewed. Finally, we see that the baseline estimated using the distribution-specific parameters is (on average) unbiased.

Thus, applying the distribution-specific BXR algorithm to a MALDI FT-ICR spectrum is roughly equivalent to simply calculating running means for the noise portion of that spectrum. However, the BXR algorithm has two main advantages over running means. The first is speed: the BXR algorithm uses roughly half the computing time of the running means. (Of course, optimizing each algorithm could change this. Also, as noted in Yang *et al*. [16], even aside from issues of algorithm optimization, running times are only really comparable for programs in the same language. Thus, comparing run times of various algorithms should only be considered as a rough guideline.) More importantly, the negativity penalty *A*_{2,}* _{t}* in the BXR algorithm only comes into play when the baseline is above the data. If the data is above the baseline, it doesn’t matter by how much. Thus, the extremely large values in a spectrum which constitute the signal are automatically ignored by the BXR algorithm, while extra work is needed to ignore the signal when calculating the running means (as we had to do above in the estimation of the running means for the serum spectrum).

This is even more clearly illustrated by a comparison of baselines computed by the BXR algorithm and the LMS algorithm (Figures 6 and and7).7). Note that for the noise spectrum, the two baselines are extremely close to each other, except for an apparent edge effect at low frequencies for the LMS algorithm. In fact, except near the peak and at the low frequency edge, the estimates never differ by more than ±5%. However, in the areas of the serum spectrum that have signal, the estimate from the LMS algorithm is pulled up toward the signal drastically, reaching up to more than three times as high as the BXR estimate. In the areas with little or no signal (frequencies greater than 100 kHz), the LMS and BXR estimates are still within ±5% of each other. While there is a mild inflation of the baseline in the presence of signal in the BXR algorithm, it is only inflated up to 13% larger than the estimated baseline for the noise spectrum. Thus, it is clear that the BXR algorithm is far less sensitive to the presence of signal than the LMS algorithm.

One obvious question is how many of this results in this article are due to the particular experimental setup used to generate the spectra analyzed in this article and how much can be generalized. One encouraging sign is that the coefficients obtained in Section 3 are consistent in replicates; for a set of 56 noise spectra similar to the one displayed in Figure 2, the estimated values of
$\mathbb{E}\{{(1-{Y}_{t}^{\prime})}_{+}\}$ had a mean of 0.210011 and a standard deviation of 2.46 × 10^{−4}, while the estimated values for the standard deviation of
${Y}_{t}^{\prime}$ had a mean of 0.522584 and a standard deviation of 6.36 × 10^{−4}. Thus, it would seem to be justified to use the mean values of the two parameters for analyses on any spectrum generated on the same MALDI FT-ICR machine rather than having to calculate them individually for each spectrum. Whether or not these same numbers would apply to other MALDI FT-ICR machines is unknown, but it seems likely that at the very worst, each experimenter could use the techniques described in this article to determine the appropriate numbers for his or her experimental setup and use those.

Additionally, we have concentrated on the case the the estimated quantity is *Y _{t}* because that is the key quantity in MALDI FT-ICR MS. However, any measure of center

Several variants of the BXR algorithm could be useful. One possibility is to try penalizing different order derivatives rather than the second. This would involve changing **Δ**_{2} by changing *M*_{2}. For example, if we wanted to penalize large values of the fourth derivative, we could use
${\mathbf{\Delta}}_{4}={\mathit{M}}_{4}^{\prime}{\mathit{A}}_{1}{\mathit{M}}_{4}$, where

$${\mathit{M}}_{4}=[\begin{array}{ccccccccc}1& -4& 6& -4& 1& 0& 0& 0& \phantom{\rule{0.16667em}{0ex}}\\ 0& 1& -4& 6& -4& 1& 0& 0& & 0& 0& 1& -4& 6& -4& 1& 0& \phantom{\rule{0.16667em}{0ex}}\\ \phantom{\rule{0.16667em}{0ex}}& \phantom{\rule{0.16667em}{0ex}}& \phantom{\rule{0.16667em}{0ex}}& \phantom{\rule{0.16667em}{0ex}}& \phantom{\rule{0.16667em}{0ex}}& \phantom{\rule{0.16667em}{0ex}}& \phantom{\rule{0.16667em}{0ex}}& \\ ]\end{array}$$

is (*n* − 4) × *n* (and *A*_{1} would then be (*n* − 4) × (*n* − 4)). As in Section 2.2, the resulting Hessian is almost certainly negative definite (unless the baseline is below all but at most three points of the spectrum). However, it appears to be difficult to adequately smooth the spectrum in this way, since using a large enough
${A}_{1}^{}$ to get a reasonably smooth estimate causes the Hessian to be computationally singular.

Another variant would be to allow
${A}_{1}^{}$ to depend on *t*. Especially with MALDI FT-ICR spectra—which each show a large spike in baseline and variance near 53.75 kHz—it might be useful to incorporate
${A}_{1,t}^{}$ into the formula, with an appropriate adjustment to **Δ**_{2}.

A third possibility would be to allow the matrix ** N** from Equation (2) to be non-diagonal. This is an especially attractive idea in light of the non-trivial autocorrelation of MALDI FT-ICR spectra.

A fourth possibility is to extend the score function to cases where the masses are not equally spaced. Although in MALDI FT-ICR MS we can use the frequencies as equally-spaced data, it is certainly possible that there are (or will be) technologies which will not generate equally-spaced data.

We also observe that in deriving the appropriate values for *A*_{1,}* _{t}* and

Finally, we note that although the BXR algorithm has been developed for one-dimensional data, the same principles should be applicable to higher-dimensional data, such as data generated by liquid chromatography mass spectrometry.

**Funding**

This work was supported by the National Human Genome Research Institute (R01-HG003352); National Institute of Environmental Health Sciences Superfund (P42-ES04699); National Institutes of Health Training Program in Biomolecular Technology (2-T32-GM08799 to DAB); and the Ovarian Cancer Research Fund.

The authors would like to thank Scott Kronewitter and Carlito Lebrilla (University of California at Davis, Department of Chemistry) for providing the MALDI FT-ICR spectra used in Section 3 and Ralph de Vere White (University of California at Davis Cancer Center, Division of Urology) for providing the serum sample used to generate the serum spectrum.

^{1}Note that the values {*x _{t}*} do not appear in

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1. Xi Y, Rocke D. Baseline correction for NMR spectroscopic metabolomics data analysis. BMC Bioinformatics. 2008;9(1):324. [PMC free article] [PubMed]

2. Coombes KR, Tsavachidis S, Morris J, Baggerly K, Hung MC, Kuerer H. Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform. Proteomics. 2005;5(16):4107–4117. [PubMed]

3. Du P, Sudha R, Prystowsky MB, Angeletti RH. Data reduction of isotope-resolved LC-MS spectra. Bioinformatics. 2007;23(11):1394–1400. [PubMed]

4. Mantini D, Petrucci F, Pieragostino D, Del Boccio P, Di Nicola M, Di Ilio C, Federici G, Sacchetta P, Comani S, Urbani A. LIMPIC: a computational method for the separation of protein MALDI-TOF-MS signals from noise. BMC Bioinformatics. 2007;8(1):101. [PMC free article] [PubMed]

5. Karpievitch YV, Hill EG, Smolka AJ, Morris JS, Coombes KR, Baggerly KA, Almeida JS. PrepMS: TOF MS data graphical preprocessing tool. Bioinformatics. 2007;23(2):264–265. [PMC free article] [PubMed]

6. Li X, Gentleman R, Lu X, Shi Q, Iglehart JD, Harris L, Miron A. Proteomics spectra. In: Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S, editors. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer; 2005. pp. 91–109.

7. Bellew M, Coram M, Fitzgibbon M, Igra M, Randolph T, Wang P, May D, Eng J, Fang R, Lin C, Chen J, Goodlett D, Whiteaker J, Paulovich A, McIntosh M. A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS. Bioinformatics. 2006;22(15):1902–1909. [PubMed]

8. Yasui Y, Pepe M, Thompson ML, Adam B-L, Wright J, George L, Qu Y, Potter JD, Winget M, Thornquist M, Feng Z. A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatastics. 2003;4(3):449–463. [PubMed]

9. Du P, Kibbe WA, Lin SM. Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching. Bioinformatics. 2006;22(17):2059–2065. [PubMed]

10. Lange E, Gröpl C, Reinert K, Kohlbacher O, Hildebrandt A. High-accuracy peak picking of proteomics data using wavelet techniques. Pac Symp Biocomput. 2006;11:243–254. [PubMed]

11. Zhang LK, Rempel D, Pramanik BN, Gross ML. Accurate mass measurements by Fourier transform mass spectrometry. Mass Spectrom Rev. 2005;24(2):286–309.

12. Barkauskas DA, Kronewitter SR, Lebrilla CB, Rocke DM. Analysis of MALDI FT-ICR mass spectrometry data: A time series approach. Anal Chim Acta. 2009;648(2):207–214. [PMC free article] [PubMed]

13. Herbert CG, Johnstone RAW. Mass Spectrometry Basics. CRC Press; Boca Raton, FL: 2003.

14. Park Y, Lebrilla CB. Application of Fourier transform ion cyclotron resonance mass spectrometry to oligosaccharides. Mass Spectrom Rev. 2005;24(2):232–264. [PubMed]

15. Barkauskas DA, An HJ, Kronewitter SR, de Leoz ML, Chew HK, de Vere White RW, Leiserowitz GS, Miyamoto S, Lebrilla CB, Rocke DM. Detecting glycan cancer biomarkers in serum samples using MALDI FT-ICR mass spectrometry data. Bioinformatics. 2009;25(2):251–257. [PMC free article] [PubMed]

16. Yang C, He Z, Yu W. Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis. BMC Bioinformatics. 2009;10(1):4. [PMC free article] [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |