Home | About | Journals | Submit | Contact Us | Français |

**|**Scientific Reports**|**PMC3240955

Formats

Article sections

Authors

Related links

Sci Rep. 2011; 1: 181.

Published online 2011 December 5. doi: 10.1038/srep00181

PMCID: PMC3240955

Received 2011 October 18; Accepted 2011 November 14.

Copyright © 2011, Macmillan Publishers Limited. All rights reserved

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/

This article has been cited by other articles in PMC.

Recent science of science research shows that scientific impact measures for journals and individual articles have quantifiable regularities across both time and discipline. However, little is known about the scientific impact distribution at the scale of an individual scientist. We analyze the aggregate production and impact using the rank-citation profile *c _{i}*(

A scientist's career path is subject to a myriad of decisions and unforeseen events, such as Nobel Prize worthy discoveries^{1}, that can significantly alter an individual's career trajectory. As a result, the career path can be difficult to analyze since there are potentially many factors (individual, mentor-apprentice, institutional, coauthorship, field)^{2}^{,3}^{,4}^{,5}^{,6}^{,7}^{,8}^{,9} to account for in the statistical analysis of scientific panel data.

The rank-citation profile, *c _{i}*(

We group the 300 scientists that we analyze into three sets of 100, referred to as datasets A, B and C, so that we can analyze and compare the complete publication careers of each individual, as well as across the three groups:

- [A] 100 highly-profile scientists with average
*h*-index*h*= 61 ± 21. These scientists were selected using the citation shares metric^{9}to quantify cumulative career impact in the journal*Physical Review Letters*(PRL). - [B] 100 additional “control” scientists with average
*h*-index*h*= 44 ± 15. - [C] 100 current Assistant professors with average
*h*-index*h*= 14 ± 7. We selected two scientists from each of the top-50 US physics departments (departments ranked according to the magazine*U.S. News*).

In the methods section we describe in detail the selection procedure for datasets A, B, and C and in tables S1-S6 we provide summary statistics for each career.

There are many conceivable ways to quantify the impact of a scientist's *N _{i}* publications. The

To address the shortcomings of the *h*-index, numerous remedies have been proposed in the bibliometric sciences^{15}. For example, Egghe proposed the *g*-index, where the most cited *g* papers cumulate *g*^{2} citations overall^{16}, and Zhang proposed the *e*-index which complements the *h* and *g* indices quantitatively^{17}.

To justify the importance of analyzing the entire profile *c _{i}*(

In Fig. 1 we plot *c _{i}*(

proposed in^{18} and further analyzed in^{19}, with the relation *h _{p}* ≤

The aim of this analysis is not to add another level of scrutiny to the review of scientific careers, but rather, to highlight the regularities across careers and to seed further exploration into the mechanisms that underlie career success. The aim of this brand of quantitative social science is to utilize the vast amount of information available to develop an academic framework that is sustainable, efficient and fruitful. Young scientific careers are like “startup” companies that need appropriate venture funding to support the career trajectory through lows as well as highs^{13}.

For each scientist *i*, we find that *c _{i}*(

has been proposed as a model for rank profiles in the social and natural sciences that exhibit such truncated scaling behavior^{20}^{,21}. The parameters *A _{i}*,

The DGBD proposed in^{20} is an improvement over the Zipf law (also called the generalized power-law or Lotka-law^{22}) model and the stretched exponential model^{14} since it reproduces the varying curvature in *c _{i}*(

The *β _{i}* value determines the relative change in the

In Fig. 2(a) we plot *c _{i}*(

for 10^{0} ≤ *r* ≤ 10^{2}. The scaling value calculated for other rank-size (Zipf) distributions in the social and economic sciences is typically around unity, *β* ≈ 1, for example in studies of word frequency^{23} and city size^{20}^{,21}^{,24}. Here we calculate *β _{i}* for each individual author and observe a distribution which is centered around characteristic values

We calculate each *β _{i}* value using a multilinear least-squares regression of ln

In order to demonstrate the common functional form of the DGBD model, we collapse each *c _{i}*(

A main advantage of the *h*-index is the simplicity in which it is calculated, e.g. *ISI Web of Knowledge*^{25} readily provides this quantity online for distinct authors. Another strength of the *h*-index is its stable growth with respect to changes in *c _{i}*(

It is truly remarkable how a single number, *h _{i}*, correlates with other measures of impact. Understandably, being just a single number, the

Instead of choosing an arbitrary *h _{p}* as an productivity-impact indicator, we use the analytic properties of the DGBD to calculate a crossover value . In the methods section, we derive an exact expression for which highlights the distinguished papers of a given author. To calculate , we use the logarithmic derivative

The advantage of is that this characteristic rank value is a comprehensive representation of the stellar papers in the high-rank scaling regime since it depends on the DGBD parameter values *β _{i}*,

To further contrast the values of and the *h*-index, we propose the “peak indicator” ratio , which corrects specifically for the *h*-index penalty on the stellar papers in the peak region of *c _{i}*(

An alternative “single number” indicator is *C _{i}*, an author's total number of citations

which incorporates the entire *c _{i}*(

We test the aggregate properties of *c _{i}*(

where *H _{N′,β}* is the

We use the DGBD model to provide an analytic description of *c _{i}*(

Many studies analyze only the high rank values of generic Zipf ranking profiles *c*(*r*), e.g. computing the scaling regime for *r* < *r _{c}* below some some rank cutoff

To measure the upward mobility of a scientist's career, in the SI text we address the question: given that a scientist has index *h*, what is her/his most likely *h*-index value Δ*t* years in the future? In consideration of the bulk of *c _{i}*(

Even though the productivity of scientists can vary substantially^{9}^{,36}^{,37}^{,38}^{,39}, and despite the complexity of success in academia, we find remarkable statistical regularity in the functional form of *c _{i}*(

With little calculation, the *β _{i}* metric developed here, used in conjunction with the

We use disambiguated “distinct author” data from *ISI Web of Knowledge.* This online database is host to comprehensive data that is well-suited for developing testable models for scientific impact^{9}^{,32}^{,40} and career progress^{11}. In order to approximately control for discipline-specific publication and citation factors, we analyze 300 scientists from the field of physics.

We aggregate all authors who published in *Physical Review Letters* (PRL) over the 50-year period 1958–2008 into a common dataset. From this dataset, we rank the scientists using the citations shares metric defined in^{9}. This citation shares metric divides equally the total number of citations a paper receives among the *n* coauthors, and also normalizes the total number of citations by a time-dependent factor to account for citation variations across time and discipline.

Hence, for each scientist in the PRL database, we calculate a cumulative number of citation shares received from only their PRL publications. This tally serves as a proxy for his/her scientific impact in all journals. The top 100 scientists according to this citation shares metric comprise dataset [A]. As a control, we also choose 100 other dataset [B] scientists, approximately randomly, from our ranked PRL list. The selection criteria for the control dataset [B] group are that an author must have published between 10 and 50 papers in PRL. This likely ensures that the total publication history, in all journals, be on the order of 100 articles for each author selected. We compare the tenured scientists in datasets A and B with 100 relatively young assistant professors in dataset [C]. To select dataset [C] scientists, we chose two assistant professors from the top 50 U.S. physics and astronomy departments (ranked according to the magazine *U.S. News*).

For privacy reasons, we provide in the SI tables only the abbreviated initials for each scientist's name (last name initial, first and middle name initial, e.g. L, FM). Upon request we can provide full names.

We downloaded datasets A and B from ISI Web of Science in Jan. 2010 and dataset C from ISI Web of Science in Oct. 2010. We used the “Distinct Author Sets” function provided by ISI in order to increase the likelihood that only papers published by each given author are analyzed. On a case by case basis, we performed further author disambiguation for each author.

We test the statistical significance of the DGBD model fit using the *χ*^{2} test between the 3-parameter best-fit DGBD *c _{m}*(

The significant number of *c _{i}*(

Here we use the analytic properties of the DGBD defined in Eq. [3] to calculate the special *r* values from the parameters *β*, *γ* and *N* which locate the two tail regimes of *c*(*z*), and in particular, the distinguished paper regime. The scaling features of the DGBD do not readily convey any characteristic scales which distinguish the two scaling regimes. Instead, we use the properties of ln *c _{i}*(

We begin by considering *c _{i}*(

in the domain *z* [− (*z*_{0} − 1), (*z*_{0} − 1)]. The logarithmic derivative of *c*(*z*) expresses the relative change in *c*(*z*),

where *x* = *z*/*z*_{0}, , and . The extreme values of for are given by

and the average value is calculated by,

The function *χ*(*z*) takes on the value of twice at the values corresponding to the solutions to the quadratic equation,

which has the solution

for . Converting back to rank, then

and so the value is the special rank value which distinguishes the set of excellent papers of each given author. The *c*-star value *c _{i}*(

Furthermore, the crossover *z*_{x} between the *β* scaling regime and the *γ* scaling regime is calculated from the inflection points of ln *c*(*z*),

which has 2 solutions , where . only is a physical solution. Transforming back to rank values, we find . We illustrate these special *z* values in Fig. 5.

A. M. P., H. E. S., & S. S. designed research, performed research, wrote, reviewed and approved the manuscript. A. M. P. performed the numerical and statistical analysis of the data.

We thank J. E. Hirsch and J. Tenenbaum for helpful suggestions.

- Mazloumian A., Eom Y.-H., Helbing D., Lozano S., Fortunato S. How citation boosts promote scientific paradigm shifts and Nobel prizes. PLoS ONE 6(5), e18975 (2011). [PMC free article] [PubMed]
- Merton R. K. The Matthew effect in science. Science 159, 56–63 (1968). [PubMed]
- Merton R. K. The Matthew effect in science, II: Cumulative advantage and the symbolism of intellectual property. ISIS 79, 606–623 (1988).
- Cole J. R. Social Stratification in Science (Chicago, Illinois, The University of Chicago Press, 1981).
- Guimera R., Uzzi B., Spiro J., Amaral L. A. N. Team assembly mechanisms determine collaboration network structure and team performance. Science 308, 697–702 (2005). [PMC free article] [PubMed]
- Malmgren R. D., Ottino J. M., Amaral L. A. N. The role of mentorship in protégé performance. Nature 463, 622–626 (2010). [PubMed]
- Azoulay P., Zivin J. S. G., & Wang J. Superstar Extinction. Q. J. of Econ. 125 (2), 549–589 (2010).
- Radicchi F., Fortunato S. & Castellano C. Universality of citation distributions: Toward an objective measure of scientific impact. Proc. Natl. Acad. Sci. USA 105, 17268–17272 (2008). [PubMed]
- Petersen A. M., Wang F., Stanley H. E. Methods for measuring the citations and productivity of scientists across time and discipline. Phys. Rev. E 81, 036114 (2010). [PubMed]
- Simonton D. K. Creative productivity: A predictive and explanatory model of career trajectories and landmarks. Psychol. Rev. 104, 66–89 (1997).
- Petersen A. M., Jung W.–S., Yang J.–S. & Petersen A. M., Jung W.–S., Yang J.–S. & Stanley H. E. Quantitative and empirical demonstration of the Matthew effect in a study of career longevity. Proc. Natl. Acad. Sci. USA 108, 18–23 (2011). [PubMed]
- Wu J., Lozano S., Helbing D. Empirical study of the growth dynamics in real career h-index sequences. J. Informetrics 5, 489–497 (2011). (In press)
- Petersen A. M., Riccaboni M., Stanley H. E., Pammolli F. Persistency and Uncertainty in the Academic Career. (2011). In preparation.
- Hirsch J. E. An index to quantify an individual's scientific research output. Proc. Natl. Acad. Sci. USA 102, 16569–16572 (2005). [PubMed]
- Bornmann L., Mutz R., Daniel H.–J. Are there better indices for evaluation purposes than the h Index? A comparison of nine different variants of the h Index using data from biomedicine. JASIST 59, 001–008 (2008).
- Egghe L. Theory and practise of the g-index. Scientometrics 69, 131–152 (2006).
- Zhang C–T. Relationship of the h-index, g-index, and e-index. JASIST 62, 625–628 (2010).
- van Eck J. N., Waltman L. Generalizing the h- and g-indices. J. Informetrics 2, 263–271 (2008).
- Wu Q. The w-index: A measure to assess scientific impact focusing on widely cited papers. JASIST 61, 609–614 (2010).
- Naumis G. G., Cocho G. Tail universalities in rank distributions as an algebraic problem: The beta-like function. Physica A 387, 84–96 (2008).
- Martinez-Mekler G., Martinez R. A., del Rio M. B., Mansilla R., Miramontes P., Cocho G. Universality of rank-ordering distributions in the arts and sciences. PLoS ONE 4, e4791 (2009). [PMC free article] [PubMed]
- Egghe L., Rousseau R. An informetric model for the Hirsch-index. Scientometrics 69, 121–129 (2006).
- Zipf G. Human Behavior and the principle of least effort (Cambridge, MA, Addison-Wesley, 1949).
- Gabaix X. Zipf's law for cities: An explanation. Q. J. of Econ. 114 (3), 739–767 (1999).
- ISI Web of Knowledge: www. isiknowledge.com/
- Henzinger M., Sunol J., Weber I. The stability of the h-index. Scientometrics 84, 465–479 (2010).
- Hirsch J. E. Does the h index have predictive power. Proc. Natl. Acad. Sci. USA 104, 19193–19198 (2008). [PubMed]
- Batista P. D., Campiteli M. G., Martinez A. S. Is it possible to compare researchers with different scientific interests? Scientometrics 68, 179–189 (2006).
- Iglesias J. E., Pecharromán C. Scaling the h-index for different scientific ISI fields. Scientometrics 73, 303–320 (2007).
- Bornmann L., Daniel H.–J. What do we know about the h index? JASIST 58, 1381–1385 (2007).
- Redner S. On the meaning of the h-index. J. Stat. Mech. 2010, L03005 (2010).
- Radicchi F., Fortunato S., Markines B., Vespignani A. Diffusion of scientific credits and the ranking of scientists. Phys. Rev. E 80, 056103 (2009). [PubMed]
- Egghe L. Dynamic h-Index: the Hirsch index in function of time. JASIST 58, 452–454 (2006).
- Burrell Q. L. Hirsch's h-index: A stochastic model. J. Informetrics 1, 16–25 (2007).
- Guns R., Rousseau R. Simulating growth of the h-index. JASIST 60, 410–417 (2009).
- Shockley W. On the statistics of individual variations of productivity in research laboratories. Proc. of the IRE 45, 279–290 (1957).
- Allison A. D., Stewart J. A. Productivity differences among scientists: Evidence for accumulative advantage. Amer. Soc. Rev. 39(4), 596–606 (1974).
- Huber J. C. Inventive productivity and the statistics of exceedances. Scientometrics 45, 33–53 (1998).
- Peterson G. J., Presse S., Dill K. A. Nonuniversal power law scaling in the probability distribution of scientific citations. Proc. Natl. Acad. Sci. USA 107, 16023–16027 (2010). [PubMed]
- Radicchi F., Castellano C. Rescaling citations of publications in Physics. Phys. Rev. E 83, 046116 (2011). [PubMed]
- Redner S. How popular is your paper? An empirical study of the citation distribution. Eur. Phys J. B 4, 131–134 (1998).
- De Solla Price D. A general theory of bibliometric and other cumulative advantage processes. JASIST 27, 292–306 (1976).
- Petersen A. M., Jung W.-S. & Stanley H. E. On the distribution of career longevity and the evolution of home-run prowess in professional baseball. EPL 83, 50010 (2008).
- Petersen A. M., Penner O. & Stanley H. E. Methods for detrending success metrics to account for inflationary and deflationary factors. Eur. Phys. J. B 79, 67–78 (2011).
- Lazer D.,
*et al.*Computational social science. Science 323, 721–723 (2009). [PMC free article] [PubMed] - Castellano C., Fortunato S., Loreto V. Statistical physics of social dynamics. Rev. Mod. Phys. 81, 591–646 (2009).
- Redner S. Citation statistics from 110 years of Physical Review. Phys. Today. 58, 49–54 (2005).

Articles from Scientific Reports are provided here courtesy of **Nature Publishing Group**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |