Average Number of Tweets per Article
A total of 4208 tweetations were identified, which cited a total of 286 distinct JMIR articles, with each article receiving on average 14 tweetations (median 9). However, these averages should be interpreted with care, as JMIR has published articles since 1999 (560 articles in total). Among the 286 articles referenced in tweetations, there were many articles that were published before data collection began or before Twitter even existed. As these older articles receive only sporadic tweetations, the average and median are not reflective of more recent articles.
The 55 articles published in issues 3/2009–2/2010 received an average of 21.2 tweetations within 356 days after article publication (median 12, range 0–149), and 13.9 (median 8, range 0–96) tweetations within 7 days. shows the cumulative number of tweetations within 7 days (tw7) for these articles.
Tweet Dynamics
When, in relationship to the date of publication of an article, did the tweetations occur? shows the general distribution of all tweetations (n = 3318) that were sent within 60 days after publication of the article they are citing, by day. In this graph, day 0 refers to the day of article publication, day 1 is the following day, and so on; the left y-axis shows how many of the tweetations were sent on that day (tweet rate), as a proportion relative to all tweetations within a 60-day period; and the right y-axis (and red line) shows the cumulative proportion. The majority of tweets were sent on the day when an article was published (1458/3318, 43.9%) or on the following day (528/3318, 15.9%). Only 5.9% (197/3318) of all tweetations are sent on the second day after publication, and the downward trend continues, until a little plateau between days 5 and 7 occurs (about 2% of all 60-day tweetations). There is a dip on days 8 and 9, which may be explained by the fact that, while JMIR publishes articles on different days of the week, Friday is slightly more prevalent, so days 8 and 9 would fall on the following weekend. After day 10 (66/3318, 2%) the rate of new tweetations declines rapidly.
shows the same curve of new tweetations by day, but this time replotted with logarithmic horizontal and vertical axes. Now an interesting pattern emerges, showing a strong regularity: the tweetation distribution during the first 30 days on a log–log plot follows a straight line, which is indicative of a Pareto distribution, also known as Zipf’s law or Bradford distribution, which are said to follow a power law [
31]. In our sample, the number of tweetations per day after the article has been published during the first 30 days can be predicted by the formula ln(tw) = –1.53 * ln(d) + 7.25, where tw is number of new tweetations on day d, and d is days since publication (publication date = day 1).
This model has an excellent fit (R2 = .90). While the intercept of this formula is not important (it is dependent on the total number of tweetations), the term –1.53 is called alpha or the exponent of the power law (slope of the linear curve in the log–log diagram).
We can divide the pattern in into two distinct phases: I call the first 30 days the “network propagation phase,” where the new information is propagated through the Twitter social network. After 30 days, the network propagation phase gives way to what I call the “sporadic tweetation phase,” where only sporadic mentionings of older articles and small clusters of localized outbreaks of information propagation occur.
shows the tweetation dynamics for all articles in JMIR issue 1/2010. Note that while shows the number of new tweetations per day (tweet rate, which is sharply declining), shows them in a cumulative manner. The figure illustrates how some articles attract tweets only on the first day, while some other articles continue to attract tweetations and are more widely retweeted. Incidentally, these are often articles that turn out to be highly cited, as shown in more detail below.
Other Regularities
There were other strong regularities of tweetations following power laws. Tweetations were sent from 1668 distinct Twitter accounts (tweet authors). The most tweetations (n = 370) were sent by @JMedInternetRes, JMIR’s Twitter account. If we rank the accounts by the number of tweetations they sent and plot them against the number of tweetations for each account, the power law distribution shown in emerges. Half of all tweets (2105/4208, 50%) were sent by only 132 distinct tweet authors—that is, 8% of all tweet authors. The top 20% of the tweet authors (those ranked 1–334 by number of tweetations) accounted for 63.4% (2676/4208) of all tweetations. This uneven distribution of work is typical for Pareto distributions, an observation that is sometimes colloquially referred to as the 80/20 rule, where roughly 80% of the effects come from 20% of the causes.
The third power law I looked at was where I expected it most, because this distribution is typically observed for citations and can be demonstrated in a Zipf plot, in which the number of citations of the nth most-cited paper is plotted versus the rank n (, left). Tweetations follow a strikingly similar distribution (, right).
Citations
The 55 articles in our tweetations-versus-citations subset had an average of 7 citations on Scopus (median 4) and 13 citations on Google Scholar (median 9). shows the Google Scholar citation counts for all 55 articles included in the tweetation/citation analysis, as of November 2011.
First, the number of citations from Scopus were correlated with the number of citations from Google Scholar to test agreement between the two database sources. There was good agreement, with a Pearson correlation coefficient of .87 (P < .001) for the 55 articles. As Google Scholars’ citation counts were higher and appeared more robust, most results presented here refer to Google Scholar citation counts, unless noted otherwise.
compares a typical citation and a tweetation curve, illustrating the very different dynamics in tweetations compared with citations in scholarly articles. While citations in scholarly articles begin to accumulate only about 1 year after the article is published, tweetations accumulate mainly within the first few days after publication.
Correlation Between Tweetations and Citations
For each journal issue, I separately plotted scatterplots and calculated Pearson correlation coefficients of the raw count, the logs, and Spearman rank correlation coefficients, to establish the degree of correlation between citations and tweetations.
My primary tweetation metric was tw7 (cumulative number of tweetations 7 days after publication of the article, with day 0 being the publication date), a metric I also call twimpact factor or TWIF7 (see below).
The Pearson product moment correlation coefficients (r) for the raw citation versus tw7 tweetation counts were statistically significant on a 5% level for all journal issues, and ranged from .57 to .89 (). Pearson correlations between the logs of citations and logs of tweets, as well as Spearman rank correlation coefficients, were all statistically significant when articles across issues were combined, except for the rank correlation between Scopus citation counts and tweetations. When stratified by journal issue, the correlations for some issues were statistically significant for some computations, while for others they were not, perhaps due to a small sample size. Generally, the Google Scholar citations showed better correlations with tweetations than did Scopus citations (). The Spearman rank correlations (rank by citations versus rank by tw7) were statistically significant for only one issue, with rho = .51, P = .04 for issue 2/2010.
I also conducted analyses with other tweetation metrics (tw0, tw1, tw2, tw3, tw4, tw5, tw6, tw7, tw10, tw12, tw14, tw30, and tw365) and derived various metrics (tw365–tw7, ie, late-stage tweets; tw7–tw0, tw0/tw7 etc), which produced very similar correlation coefficients (data not shown).
Multivariate Analysis
In a linear regression model I tried to predict the log of the number of Google Scholar citations from the log of the number of tweets and time (days since publication of the first article in the sample of 55 articles). The regression equation was log(cit + 1) = 0.467 * log(tw7 + 1) + –.001 * days + 0.817, where cit is the number of citations, and tw7 is the cumulative number of tweetations at day 7. Both independent variables were significant predictors (P < .001), and the model explained 27% of the variation of citations (R2 = .27).
Binary Analysis
Based on the observation that tweets were sent primarily during the early days after publication, I hypothesized that tw7, the cumulative number of tweetations by day 7 (perhaps as early as day 3), could be used as a diagnostic test to predict highly cited articles. Highly tweeted and highly cited are defined as articles in the 75th–100th percentile of each journal issue; thus, the cut-off points on what constitutes highly tweeted or highly cited varied by issue (tweets: 11, 19, 34.8, 28.5; Google Scholar citations: 15, 9, 22.75, 15, for issues 3/2009, 4/2009, 1/2010, and 2/2010, respectively).
is a 2 × 2 table categorizing articles into the four groups. Articles that were less frequently tweeted and not in the top-cited quartile are interpreted as true negatives (tn, lower left quadrant in and ). Articles that were highly tweeted and highly cited are true positives (tp, upper right quadrant in and ). Articles that were highly tweeted but not highly cited fall into the upper left quadrant and are referred to as false positives (fp). Finally, articles that were not highly tweeted but highly cited are false negatives (fn).
| Table 2 2 × 2 table using top-tweeted articles as a predictor for top-cited articles |
Using tweetation status (highly versus less tweeted) as a predictive test for citation status, this test identified 40 out of the 43 not highly cited articles, which translates to a 93% specificity (true-negative rate, tn/[tn + fp], 40/43). The test was able to correctly identify 9 out of the 12 highly cited papers, which corresponds to a 75% sensitivity (tp/[tp + fn], 9/12). Another way to express these results is to say that the positive predictive value (tp/[tp + fp]) or precision is 75%, meaning that if an article is highly tweeted (tests positive for social media impact), then there is a 75% likelihood that the article ends up in the top quartile of all articles of an issue, ranked by citations. The negative predictive value (tn/[tn + fn]) is 93% (40/43), meaning that if an article was not highly tweeted (tests negative for social media impact), then there is only a 7% (3/43) chance that it will fall into the top 25% of cited articles. Yet another way to express these results is to say that highly tweeted articles are almost 11 times more likely than less tweeted articles to be highly cited (9/12, 75% highly tweeted article are highly cited, while only 3/43, 7% of the less tweeted articles are highly cited; rate ratio 0.75/0.07 = 10.75, 95% confidence interval, 3.4–33.6).
There was a highly statistically significant association between citation status and tweetation status (Fisher exact test, P < .001).
I repeated this analysis for a range of different metrics such as twn (cumulative number of tweetations after n days, with n = 0, 1–10, 12, 14, 30, or 365), and the number of late-response tweetations tw365–tw7. Starting on day 3 (tw3), the heuristic started to identify the same top-tweeted articles as tw7, indicating that the test is predictive as early as 3 days after publication. Choosing later days (letting tweetations accumulate for more than 7 days) or the late-response tweetations did not improve the test results (data not shown).
Proposed Twitter-Based Metrics for Social Impact
The research reported here focuses on articles from one journal. However, I suggest that the metrics introduced here should be useful to measure the impact any article (or collections or sets of articles) has on Twitter, to gauge how much attention users pay to the topic of an article, to measure how the question and/or conclusions resonate with Twitter users, and ultimately to use them as proxies for social impact. Although I use Twitter as an example here, these metrics can be used in other social media (eg, Facebook status updates). The metrics presented here can also be generalized and applied to measure the impact of any issue (not just scholarly articles but, for example, current events and newspaper articles) on a social media user population.
Twimpact Factor (eg, tw7) Using raw tweetation counts to compare the impact of different articles with each other is problematic, because the number of tweetations is a function of time since publication. Although the data suggest that after an initial period of 30 days tweetations usually occur only sporadically, the raw number of tweets should not be used when comparing articles with each other if they have been published on different dates. An average tweetation count per month since publication is possible to calculate (and is currently displayed on the JMIR Top Articles webpage, see ), but due to the highly skewed power law distribution, this average will always favor articles that have been published recently (within the last month).
I therefore propose to use (and have used in this paper) the twimpact factor twn as a metric for immediate impact in social media, which is defined as the cumulative number of tweetations within n days after publication (eg, tw7 means total number of tweetations after n = 7 days). Tweetations can be replaced by URL mentionings if we apply this metric to other social media (URL being the URL or set of URLs of a specific article).
As a standard twimpact factor metric for an article on Twitter, I suggest (and JMIR will use in the future) tw7—that is, the absolute, cumulative number of tweetations an article receives by day 7 after publication (the day of publication is referred to as day 0). This is also a very practical metric: using a relatively short period of time makes the twimpact factor easier to compute, as the Twitter stream needs to be monitored for only 7 days.
I have shown that the number of new tweetations drops off rapidly after publication, even for the most highly cited papers. The immediate social media response is highly correlated with the later social media response; therefore, it is likely that the late response can be ignored. An even shorter period of time (3 days), tw3, was already sufficient in the sample to discriminate between highly cited and less cited articles, but I suggest a standard n of 7, which has the advantage that it always includes a weekend; thus, journal articles published toward the end of the week are less penalized for the weekend effect.
Any article, but also a collection of articles, can have a twimpact factor (eg, on a journal or issue level). JMIR is now monitoring the c
ollective t
wimpact f
actor ctwn/m for each journal issue (where n is the number of days after publication tweetations accumulate, and m is the percentile), eg, ctw7/50 is the median (50th percentile) of tw7 for all articles in the set. The ctw7/75 for JMIR issue 2/2010 is 29, meaning that the top 25% most-tweeted articles in issue 2/2010 were tweeted more than 29 times during the first week. We prefer to report the 75th percentile instead of the mean or median (ctw7/50) because of the power distribution and because it seems a useful cut-off point to predict top-cited articles. At least in our sample, the practical meaning of the collective twimpact factor ctw7/75 is that articles with a tw7 greater than the ctw7/75 of a journal issue have a 75% likelihood of being top-cited (ending up in the top quartile of all articles of an issue, ranked by citations).
Note that the twimpact factor is an absolute measure counting tweetations; thus, just like for the journal impact factor, caveats apply. First, it is highly subject specific, so if comparisons are made between journals or even articles from the same journal, they should be made within a narrow subject category. An article on social media will more likely than an article about molecular biology be picked up by social media. Although within a specific field the twimpact factor may predict citations (predict which article is more likely to be highly cited), it would not be legitimate to compare the twimpact factor of an article on social media with a twimpact factor of an article about molecular biology, and conclude that the social media article will be more likely cited.
Second, similar to the caveat that journal impact factors should not be compared across different years, as the total number of citations is constantly growing, only articles that are published in a similar timeframe should be compared with each other (perhaps even 1 year is too long; thus, we made comparisons on a quarterly within-issue level). This is because both the number of Twitter users and the number of followers of a journal grow over time.
Tweeted Half-Life The tweeted half-life (THLn) is defined as the point in time after publication by which half of all tweetations of that article within the first n days occur. As n I have used 30 days—that is, as the denominator I chose the total cumulative number of tweets within a 30-day period following the publication date. The THLn is the day when cumulatively half of these tweetations have occurred.
In our sample, the THLn for the less-cited articles was 0 (53% of the tweets were tweeted on day 0), while the THLn of highly cited articles was 1 (on day 0, 37% of all tweetations occurred, while on day 1, 21% occurred, in total 58% by day 1). illustrates this. It may at first seem surprising that less-cited articles appear to show a quicker and proportionally higher response on the first days, but it should be kept in mind that the absolute counts of tweetations for more highly cited articles are higher than for the less-cited articles. Low-impact articles are tweeted and retweeted mainly on day 0 and day 1. Highly cited articles continue to be retweeted widely, which depresses the relative proportion of tweetations on days 0–3.
Twindex As a final metric I propose (and JMIR will use) the twindex (tweetation index), which is a metric ranging from 0 to 100 indicating the relative standing of an article compared to other articles. I define the twindex7 of specific article as the rank percentile of this article when all articles (the specific article and articles from a comparator group) are ranked by the twimpact factor tw7. The comparator articles should be similar articles published in a similar time window (eg, other articles in the same issue, or the 19 articles published previously in the same journal). If an article has the highest twimpact factor tw7 among its comparator articles, it has a twindex of 100. If it has the lowest twimpact factor, it has a twindex of 0. In this study, articles with a twindex > 75 often also turned out to be the most-cited one.