The critical assumption of our model is the correspondence between the social dynamics of scholar communities and the evolution of scientific disciplines. To illustrate this intuition, let us look at the coauthorship network for papers published by the American Physical Society (APS). Using journals as proxies for scholarly communities, we can track the changes in community structure over time. plots the modularity21
of the partition induced by the journals; higher values indicate a more clustered structure (see Methods). To gauge the significance of the network modularity, we construct a null model by shuffling the edges of the co-authorship network in such a way that the degree sequence of the network is preserved.
Modularity Q of APS journal-induced scholar communities (solid blue line).
We observe noticeable changes in modularity around the introduction of new journals. Some of these changes suggest a scenario in which a new field emerges (e.g., quantum mechanics in the late 1920's), and a new journal captures the corresponding scholar community, leading to an increase in modularity. Interdisciplinary interactions across established areas lead to a decrease in modularity (e.g., prior to the introduction of Physical Review E
in the 1990's). Note that the modularity baseline of the shuffled network is significantly lower, and does not display clear spikes in proximity of the introduction of new journals. This suggests that the birth of new scholar communities reflects the introduction of new journals and cannot be explained solely by the increased number of nodes and edges. An alternative, suggestive visualization of the emergence of topics can be obtained by tracking communities over time in the citation network22
. These observations motivate the use of community detection algorithms in a model of discipline evolution.
In the proposed model, which we call SDS (Social Dynamics of Science), we build a social network of collaborations whose nodes represent scholars, linked by coauthored papers as illustrated in . Each scholar is represented by a list of disciplines indicating the scientific fields they have been working on, and every discipline has a list of papers. Similarly, each link is represented by a list of disciplines with associated papers describing the collaborations between two scholars. The social network starts with one scholar writing one paper in one discipline. The network then evolves as new scholars join, new papers are written, and new disciplines emerge over time.
Figure 2 (a) Illustration of the social network structure. Nodes and edges represent scholars and their collaborations. They are annotated with lists of (co)authored papers grouped by scientific fields. For example, scholar b has five papers including four in (more ...)
At every time step, a new paper is added to the network. Its first author is chosen uniformly at random, so that every scholar has the same chance to publish a paper. In modeling the choice of collaborators, we aim to capture a few basic intuitions: (i) scholars who have collaborated before are likely to do so again; (ii) scholars with common collaborators are likely to collaborate with each other; (iii) it is easier to choose collaborators with similar than dissimilar background; and (iv) scholars with many collaborations have higher probability to gain additional ones23,24
. We model these behaviors through a biased random walk25
, illustrated in . The random walk traverses the collaboration network starting at the node corresponding to the first author. At each step, the walker decides to stop at the current node i
with probability pw
, or to move to an adjacent node with probability 1 – pw
. In the latter case a neighbor j
is selected according to the transition probability
is the weight of the edge connecting scholars i
, that is, the number of papers that i
have coauthored. Each visited node becomes an additional collaborator. Note that the walk may result in a single author.
Each paper is characterized by one main topic and possibly additional, secondary topics. The discipline that is shared by the majority of authors is selected as the main topic of the paper. Each coauthor acquires membership in this main topic, to model exposure of scholars to new disciplines through collaboration. Additionally, a paper with authors from multiple disciplines inherits the union of these disciplines as topics. This choice is motivated by a desire to capture highly multidisciplinary efforts that are likely to lead to the emergence of new fields. This mechanism could be modified to reflect a more conservative notion of discipline by adopting a stricter rule for discipline inheritance.
At every time step, with probability pn, we also add a new scholar to the network. The parameter pn regulates the ratio of papers to scholars. The new scholar is the first author of the paper created at that time step. To generate other collaborators, an existing scholar is first selected uniformly at random as the first coauthor. Then the random walk procedure is followed to pick additional collaborators. The new scholar acquires the main topic of the paper.
We introduce a novel mechanism to model the evolution of disciplines by splitting and merging communities in the social collaboration network. The idea, motivated by the earlier observations from the APS data, is that the birth or decline of a discipline should correspond to an increase in the modularity of the network. Two such events may occur at each time step with probability pd. The process is illustrated in .
For a split event we select a random discipline with its collaborator network and decide whether a new discipline should emerge from a subset of this community. We partition the collaboration network into two clusters (see Methods). If the modularity of the partition is higher than that of the single discipline, there are more collaborations within each cluster than across the two. We then split the smaller community as a new discipline. For papers labeled with the discipline corresponding to the smaller community in the split, this discipline label may be updated; all other labels remain unchanged. In particular, the papers whose authors are all in the new community are relabeled to reflect the emergent discipline. Borderline papers with authors in both old and new disciplines are labeled according to the discipline of the majority of authors. Some authors may as a result belong to both old and new discipline.
For a merge event we randomly select two disciplines with at least one common author. If the modularity obtained by merging the two groups is higher than that of the partitioned groups, the collaborations across the two communities are stronger than those within each one. The two are then merged into a single new discipline. In this case, all the papers in the two old disciplines are relabeled to replace the old discipline with the new one; other labels of those papers remains unchanged.
To evaluate the predictive power of the SDS model we consider a number of stylized facts, i.e., broad empirical observations that describe essential characteristics of the dynamic relationships between disciplines, scholars, and publications. Our model provides an explanation for the evolution of scientific fields if it can reproduce these empirical observations. The complex interactions of a changing group of scientists, their artifacts, and their disciplinary aggregations can be captured by the broad empirical distributions of six quantitative descriptors: the number of authors per paper AP (collaboration size); the number of papers per scholar PA (scholar productivity); the number of scholars per discipline AD (discipline popularity); the number of disciplines per scholar DA (scholar interdisciplinary effort); the number of papers per discipline PD (discipline productivity); and the number of disciplines per paper DP (publication breadth).
To validate the SDS model, one would ideally require a single real-world dataset mapping the three-way relationships between scholars, publications, and disciplines. Unfortunately, no such dataset is available to date. One possibility would be to use a dataset such as those derived from Web of Science or Scopus, and attempt to infer associations between subjects, papers, and authors based on the subject categories of the journals in which the papers are published. However, such an inference approach is necessarily arbitrary. A less biased validation approach is to trade off the single dataset in exchange for multiple ones that capture the desired associations explicitly. We therefore adopt three large datasets that each map a binary projection of the three-way relationships: NanoBank26
to validate the relationship between scholars and papers, Scholarometer27
to study the relationship between scholars and disciplines, and Bibsonomy28
to analyze the relationship between papers and disciplinary topics. The datasets are described in the Methods section. The parameters pn
, and pd
of our model are tuned to fit the quantitative descriptors of each dataset (see Methods).
presents a compelling fit between the real data and the predictions of our model. SDS reproduces the stylized facts about the relationships between scholars, publications, and disciplines, characterized by these six distributions.
Stylized facts characterizing relationships between scholars, papers, and disciplines.
These results focus on the relationships between disciplines, scholars, and papers, for which there is little prior quantitative analysis. The collaboration network, on the other hand, has been studied extensively in the past29,30
. As shown in , the SDS model generates collaboration networks whose long-tailed degree distributions are consistent with the empirical data, as well as with those in the literature.
Degree distribution of the collaboration network generated by the SDS model, compared to the empirical distribution from the Bibsonomy dataset.