A scientist's career path is subject to a myriad of decisions and unforeseen events, such as Nobel Prize worthy discoveries
1, that can significantly alter an individual's career trajectory. As a result, the career path can be difficult to analyze since there are potentially many factors (individual, mentor-apprentice, institutional, coauthorship, field)
2,3,4,5,6,7,8,9 to account for in the statistical analysis of scientific panel data.
The rank-citation profile,
ci(
r), represents the number of citations of individual
i to his/her paper
r, ranked in decreasing order
ci(1) ≥
ci(2) ≥ …
ci(
N), and provides a quantitative synopsis of a given scientist's publication career. Here, we analyze the rank-ordered citation distribution
ci(
r) for 300 scientists in order to better understand patterns of success and to characterize scientific production at the individual scale using a common framework. The review of scientific achievement for post-doctoral selection, tenure review, award and academy selection, at all stages of the career is becoming largely based on quantitative publication impact measures. Hence, understanding quantitative patterns in production are important for developing a transparent and unbiased review system. Interestingly, we observe statistical regularities in
ci(
r) that are remarkably robust despite the idiosyncratic details of scientific achievement and career evolution. Furthermore, empirical regularities in scientific achievement suggest that there are fundamental social forces governing career progress
10,11,12,13.
In the methods section we describe in detail the selection procedure for datasets A, B, and C and in
tables S1-S6 we provide summary statistics for each career.
There are many conceivable ways to quantify the impact of a scientist's
Ni publications. The
h-index
14 is a widely acknowledged single-number measure that serves as a proxy for production and impact simultaneously. The
h-index
hi of scientist
i is defined by a single point on the rank-citation profile
ci(
r) satisfying the condition
To address the shortcomings of the
h-index, numerous remedies have been proposed in the bibliometric sciences
15. For example, Egghe proposed the
g-index, where the most cited
g papers cumulate
g2 citations overall
16, and Zhang proposed the
e-index which complements the
h and
g indices quantitatively
17.
To justify the importance of analyzing the entire profile
ci(
r), consider a scientist
i = 1 with rank-citation profile
c1(
r)
![[equivalent]](/corehtml/pmc/pmcents/equiv.gif)
[100, 50, 33, 25, 20, 16, 14, 12, 11, 10, 9…] and a scientist
i = 2 with
c2(
r)
![[equivalent]](/corehtml/pmc/pmcents/equiv.gif)
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9…]. Both scientists have the same
h-index value
h = 10, although
c1(
r) tallies 2.9 times as many citations as
c2(
r) from his/her most-cited 10 papers. Hence, an additional parameter
βi is necessary in order to distinguish these two example careers. Specifically, the
βi parameter quantifies the scaling slope in
ci(
r) for the high-rank papers corresponding to small
r values. In this simple illustration,
β1 ≈ 1 while
β2 ≈ 0.
In we plot
ci(
r) for 5 extremely high-impact scientists. The individuals EW, ACG, MLC, and PWA are physicists with the largest
hi values in our data set; BV is a prolific molecular biologists who we include in this graphical illustration in order to demonstrate the generality of the statistical regularity we find, which likely exists across discipline. However, citation and h-index metrics should not be compared across discipline since baseline publication and citation rates can vary significantly between research fields Refs[8, 9]. To demonstrate how the singe point
ci(
hi) is an arbitrary point along the
ci(
r) curve, we also plot the lines
Hp(
r)
p r for 5 values of
p = {1, 2, 5, 20, 80}. The value
p ![[equivalent]](/corehtml/pmc/pmcents/equiv.gif)
1 recovers the
h-index
h1 =
h proposed by Hirsch. The intersection of any given line
Hp(
r) with
ci(
r) corresponds to the “generalized
h-index”
hp,
proposed in
18 and further analyzed in
19, with the relation
hp ≤
hq for
p >
q. Since the value
p ![[equivalent]](/corehtml/pmc/pmcents/equiv.gif)
1 is chosen somewhat arbitrarily, we take an alternative approach which is to quantify the entire
ci(
r) profile at once (which is also equivalent to knowing the entire
hp spectrum). Surprisingly, because we find regularity in the functional form
ci(
r) for all 300 scientists analyzed, we can relate the relative impact of a scientist's publication career using the small set of parameters that specify the
ci(
r) profile for the entire set of papers ranging from rank
r = 1…
Ni. Using a much smaller parameter space than the
hp spectrum, we can begin to analyze the statistical regularities in the career accomplishments of scientists.
The aim of this analysis is not to add another level of scrutiny to the review of scientific careers, but rather, to highlight the regularities across careers and to seed further exploration into the mechanisms that underlie career success. The aim of this brand of quantitative social science is to utilize the vast amount of information available to develop an academic framework that is sustainable, efficient and fruitful. Young scientific careers are like “startup” companies that need appropriate venture funding to support the career trajectory through lows as well as highs
13.