|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: CMC GE. Performed the experiments: CMC GE. Analyzed the data: CMC GE. Contributed reagents/materials/analysis tools: GE. Wrote the paper: CMC GE.
Surveys are popular methods to measure public perceptions in emergencies but can be costly and time consuming. We suggest and evaluate a complementary “infoveillance” approach using Twitter during the 2009 H1N1 pandemic. Our study aimed to: 1) monitor the use of the terms “H1N1” versus “swine flu” over time; 2) conduct a content analysis of “tweets”; and 3) validate Twitter as a real-time content, sentiment, and public attention trend-tracking tool.
Between May 1 and December 31, 2009, we archived over 2 million Twitter posts containing keywords “swine flu,” “swineflu,” and/or “H1N1.” using Infovigil, an infoveillance system. Tweets using “H1N1” increased from 8.8% to 40.5% (R2=.788; p<.001), indicating a gradual adoption of World Health Organization-recommended terminology. 5,395 tweets were randomly selected from 9 days, 4 weeks apart and coded using a tri-axial coding scheme. To track tweet content and to test the feasibility of automated coding, we created database queries for keywords and correlated these results with manual coding. Content analysis indicated resource-related posts were most commonly shared (52.6%). 4.5% of cases were identified as misinformation. News websites were the most popular sources (23.2%), while government and health agencies were linked only 1.5% of the time. 7/10 automated queries correlated with manual coding. Several Twitter activity peaks coincided with major news stories. Our results correlated well with H1N1 incidence data.
This study illustrates the potential of using social media to conduct “infodemiology” studies for public health. 2009 H1N1-related tweets were primarily used to disseminate information from credible sources, but were also a source of opinions and experiences. Tweets can be used for real-time content analysis and knowledge translation research, allowing health authorities to respond to public concerns.
“In the era of the 24-hour news cycle, the traditional once-a-day press conference featuring talking heads with a bunch of fancy titles has to be revamped and supplemented with Twitter posts, YouTube videos and the like. The public needs to be engaged in conversations and debate about issues of public health, they don't need to be lectured to.”
-Andre Picard 
Public health agencies do not act in a void, but rather are part of a larger feedback loop that includes both the media and the public. The social amplification of risk framework postulates that psychological, social, cultural, and institutional factors interact with emergency events and thereby intensify or attenuate risk perceptions . Traditionally, print media, TV and radio are the major transmitters of information from public health agencies to the public and play a large role in risk intensification and attenuation. However, during the most recent public health emergency, 2009 H1N1, respondents cited the internet as their most frequently used source of information on the pandemic .
With the rise of the participatory web and social media (“Web 2.0”) and resulting proliferation of user-generated content, the public potentially plays a larger role in all stages of knowledge translation, including information generation, filtering, and amplification. Consequently, for public health professionals, it is increasingly important to establish a feedback loop and monitor online public response and perceptions during emergency situations in order to examine the effectiveness of knowledge translation strategies and tailor future communications and educational campaigns.
Surveys are the traditional methods for public health officials to understand and measure public attitudes and behavioural responses. Several studies have used telephone, internet, and in-person surveys to elicit such information during the H1N1 pandemic (e.g., , ). Rapid-turnaround surveys best capture changes in attitudes and behaviour influenced by specific events and produce the most relevant information for agency intervention . Unfortunately, time is needed to gather resources, funding, and survey instruments for polling .
New “infoveillance” methods such as mining, aggregating, and analysing online textual data in real-time are becoming available , . Twitter (www.twitter.com) is potentially suitable for longitudinal text mining and analysis. The brief (<140 characters) text status updates (“tweets”) users share with “followers” (e.g., thoughts, feelings, activities, opinions) contain a wealth of data. Mining these data provides an instantaneous snapshot of the public's opinions and behavioural responses. Longitudinal tracking allows identification of changes in opinions or responses. In addition to quantitative analysis, the method also permits qualitative exploration of likely reasons why sudden changes have occurred (e.g., a widely read news report) and may indicate what is holding the public's attention .
H1N1 marks the first instance in which a global pandemic has occurred in the age of Web 2.0 and presents a unique opportunity to investigate the potential role of these technologies in public health emergencies. Using an “infoveillance” approach we report on: 1) the use of the terms “H1N1” versus “swine flu” over time on Twitter, to establish the feasibility of creating metrics to measure the penetration of new terms and concepts (knowledge translation), 2) an in-depth qualitative analysis of tweet content, expression, and resources, and 3) the feasibility and validation of using Twitter as a real-time content, sentiment, and public attention trend-tracking tool.
We developed an open-source infoveillance system, Infovigil , which continuously gathered and mined textual information from Twitter via its Application Programming Interface (the original Twitter data is available at http://infovigil.com/). Every few seconds, Infovigil gathered new publically-available tweets containing keywords of interest and stored them in an internal relational database, including metadata such as username and time. Between May 1 and December 31, 2009, we archived over 2 million tweets containing keywords or hashtags (#) “H1N1”, “swine flu”, and “swineflu”. In addition to recording tweets, we archived the cited web pages beginning in September 2009. This database served as the primary dataset for our study. All statistical analyses used SPSS.
To establish a knowledge translation metric to measure the terminology shift from colloquial term “swine flu” to World Health Organization (WHO)-recommended “H1N1” , a linear regression for the proportion of tweets citing “H1N1” over time was performed using English-only tweets from May 1 to December 31, 2009. Tweets utilizing both “swine flu” and “H1N1” were counted toward the overall total but not the proportion of H1N1 or swine flu tweets.
Qualitative manual coding of tweets commenced on Monday, May 11, 2009, the first set of complete data available. To look for changing content at systematic periods, Mondays, 4 weeks apart were selected over the remainder of 2009 (total of 9 days). Because we were mainly interested in trends, we held the day of the week constant to avoid artificial peaks caused by sampling from different days of the week. 25 randomly selected tweets from every hour of the aforementioned days were coded to avoid time bias associated with posting. Since there are no prior methodologies for sampling tweets, we were unable to perform a formal sample size calculation. Instead, we chose our sample size based on feasibility and determined that 25 tweets per hour (600 tweets per day) would be sufficient to capture a daily “snapshot”. Any re-posted or “re-tweeted” tweets using notation “RT @ username” or “RT@username” were excluded to prevent popular posts or spam from saturating the sample. Non-English tweets were also excluded because translation was not feasible.
We created a tri-axial coding scheme using an iterative process to reflect: 1) the tweet's content, 2) how it was expressed, and 3) the type of link posted, if any. Preliminary coding of 1200 tweets provided the initial categories and codebook. Upon review and discussion, infrequently used categories were collapsed into larger concepts and a subset of tweets (125) was coded by two raters to establish coding reliability (kappa). The last iteration of the codebook was finalized when a sufficient kappa level (>0.7) was obtained for each axis of the coding scheme.
Where multiple qualifiers were present within a tweet, all applicable qualifiers were used. Neutral or ambiguous statements were not coded. Tweets were categorized as misinformation if the tweet was not categorized as a joke and was unsubstantiated by our reference standards: the Centers for Disease Control (CDC) and Public Health Agency of Canada for scientific claims and a panel of credible online news sources (e.g., CNN, BBC) for news-related claims.
The chi-square test for trend was used to determine if the proportion of content, qualifiers, or links tweeted changed linearly over our analysis timeframe. Prior to testing for linearity, scatterplots were performed on each category to detect any non-linear patterns.
Infovigil was configured for real-time analysis and visualization of the tweets by constructing SQL (Structured Query Language) queries searching for keywords and phrases that matched our content categories. For validation purposes, and to maintain consistency with manual coding, we filtered out retweets, i.e. tweets containing “RT@” or “RT @”. While tweet searches included data from May 1, 2009 to the present day, we only used data from the 9 selected days as comparison points with the manual coding. All qualifiers along with 3 content categories (resources, personal experiences and personal opinions/interest) were transformed into concept queries. Initial search patterns (keywords or phrases) for each concept were derived from the codebook and an ongoing list of common phrases. Common misspellings, emoticons, internet slang, and keyword variants were also included. Keywords were modified to include/exclude specified prefixes. Results from each keyword were audited to estimate its precision. Audits were conducted by viewing the results for each keyword for three randomly selected days. Search patterns were modified or deleted if approximately more than 30% of tweets did not reflect the concept.
Concept query totals from the 9 selected days were recorded. Pearson's correlations were used to measure the relationship between the proportions of selected categories resulting from the manual coding and the automated analyses. Automated proportions were obtained by taking the amount of tweets that were returned by a concept search query (e.g., tweets labelled as “personal experience”) and dividing by the total amount of tweets per day. Additionally, chi-square tests for trend were used to determine if changes in automated concepts were trending similar to the manual coding. To conduct external validation, both the proportion and absolute number of weekly automated tweets sharing personal experiences and concern were compared to weekly US H1N1 incidence rates from WHO's FluNet reporting tool (http://gamapserver.who.int/GlobalAtlas/home.asp) using Pearson's correlations. We expected that these two types of tweets would be positively correlated with incidence rates. US incidence rates were chosen because Americans account for the largest proportion of Twitter users .
Longitudinal results from the automated queries were automatically graphed by Infovigil. These graphs were visually examined for large spikes in tweet volume (a potential indicator of public attention) and tweets on those days were reviewed to see what media stories or external events influenced these peaks. For clarity, the largest peak in each longitudinal graph was scaled to 100 on the y-axis and all other peaks were plotted relative to that peak.
Both our manual and automated analysis excluded retweets (RTs). As RTs may be systematically different from original or non-retweets (nonRTs), we performed a sub-analysis on RTs. 3 RTs from every hour of the 9 selected days (12% of the manual sample) were manually coded using the same methodology described previously. Chi-square tests were used to observe differences between manually coded RTs and nonRTs. Fisher's exact tests were used when cell counts were less than 5. Chi-square tests for trend were used to detect linear trends over time. Trend results for RTs were compared to trends of nonRTs.
To compare RTs and nonRTs from the automated analysis, queries for each of the 10 concepts were modified to include RTs in the search results. Longitudinal graphs of RT and nonRT results for each automated query concept were compared visually. Noticeable differences in graph shape or spike volume were noted.
Between May 1 and December 31, 2009, the relative proportion of tweets using “H1N1” increased from 8.8% to 40.5% in an almost linear fashion (R2=.788; p<.001), indicating a gradual adoption of the WHO-recommended H1N1 terminology as opposed to “swine flu” (Figure 1). “H1N1” use became equally prevalent as “swine flu” use on September 1.
Six content categories emerged from the data: resources, direct or indirect personal experiences (e.g., “I have swine flu”), personal reactions to or opinions (e.g., “I'm scared of H1N1”), jokes/parodies, marketing for H1N1-related products, and unrelated posts (Table 1).
Tweets not resource or spam-based were coded with a qualifier, if present. The codebook definitions of the 7 qualifiers took into consideration specific keywords and phrases, common internet expressions (e.g., “lol”), and emoticons (textual expressions representing a face or mood) (Table 2). Tweeted URLs were categorized into one of nine categories (Table 3).
We analyzed 5,395 tweets for our content analysis (Table 4). The total number of tweets was short by 5 because we did not gather enough eligible tweets on September 28 for analysis. The inter-rater reliability (kappa) was estimated as 0.80 for content, 0.74 for qualifiers, and 0.84 for links. H1N1 resources were the most common type of content shared (52.6%), followed by personal experiences (22.5%). 39% of tweets were coded with 1 or more qualifiers. Tweets expressing humour (12.7%), concern (11.7%), and questions (10.3%) were the most common, while 4.5% were classified as possible misinformation. 61.8% of all tweets had links, 23.2% of all posts linked to a news website, while links to government and public health agencies were not commonly shared (1.5%). 90.2% of tweets provided links when a reference was necessary.
The chi-square test for trend showed several linear trends in the data (Table 4). The proportion of tweets containing resources and personal experiences increased over time, while the amount of jokes and personal opinions/interest decreased. Tweets expressing humour, frustration, and downplayed risk became less common. Mainstream and local news websites were cited significantly less, while references to news blogs/feeds/niches, social networks, and other web pages increased. No significant trends were found for misinformation, but the data exhibited a non-linear pattern (Figure 2).
Table 5 presents examples of search patterns used to develop queries (see Table S1 for full list of detailed queries). 7/10 automated queries were found to correlate significantly with the results of manual coding (Table 6), including personal experiences (r=0.91), concern (r=0.87), and personal opinion/interest (r=0.86) (Figure 3). H1N1 incidence rates were correlated with the absolute number of tweets sharing personal experiences (r=.77, p<.001) (Figure 4) and concern (r=.66, p<.001, Figure 5), as did the percentage of tweets sharing personal experiences (r=.67, p<.001) and concern (r=0.39, p=.02).
Chi-square tests for trend found that all 3 content concepts and 4 of 7 qualifier concepts displayed significant linear trends over our timeframe (Table 7). The content categories all trended in the same direction as in the manual coding. Humour/sarcasm and downplayed risk trends also had the same downward trends as in the manual analysis. Trends for misinformation and concern were unique to the automated coding. Although a downward trend for frustration was found in the manual coding, no such pattern was observed in the automated analysis.
Sharp increases in absolute H1N1-related tweet volume coincided with major H1N1 news events. For example, a large peak on June 11 (Figure 1) corresponded to the WHO's Pandemic Level 6 announcement . The volume of humorous tweets also decreased on this day (Figure 6) and the number of frustrated tweets increased (Figure 7).
In Figure 8, the October to November peak directly coincides with the second wave of H1N1 in North America . Similarly, when personal experiences were further broken down into sub-concepts, tweet volume of vaccination experiences increased rapidly following the arrival of H1N1 vaccinations in the United States on October 6 .
Tweets expressing concern had one outstanding peak on July 5 (Figure 9), coinciding with a news story that one of the actors from “Harry Potter” was recovering from H1N1 . Humour (Figure 6) and relief (Figure 10) also increased in response to this story.
Misinformation displayed several large peaks in our timeframe (Figure 11). The largest peak appeared from September 18–21 with circulation of a story listing the “ten swine flu lies told by the mainstream media” . Other peaks (August 2, December 25) were not the result of true misinformation or speculation, rather the popular news stories on those days had keywords associated with the misinformation query , .
Viral dissemination of campaigns on Twitter resulted in several large spikes. One campaign comparing the perceived need for face masks for H1N1 to condoms for AIDS was responsible for two large peaks in “downplayed risk” on July 20 and December 1 (Figure 12). The “#iamthankfulfor” campaign, taking place between November 25–27 (American Thanksgiving), resulted in the largest peak of tweets expressing relief. In this campaign, users posted items they were thankful for, which in our data was related to getting the H1N1 vaccine or not becoming infected. Another notable campaign was the “#oink” movement on August 16 to support the pork industry and farmers by urging the media and public to use “H1N1” instead of “swine flu” . Consequently, the number tweets using “H1N1” increased and those using “swine flu” decreased. In one case, viral dissemination of new information caused a large activity spike of tweets (Figure 13). On September 8, Twitter was used to report the discovery of the first confirmed H1N1 case at a videogame convention in Seattle and urged symptomatic attendees to seek medical advice .
The largest volume of questions posted on Twitter coincided with the WHO pandemic level 6 announcement (June 11), the “Harry Potter” actor illness (July 5), and the face mask versus condom campaign (July 20) (Figure 14). An unexplained significant drop in questions occurred on August 5. An upward trend without major peaks was found within tweeted resources (Figure 15).
Manual coding of RTs found that the proportion of tweets sharing personal experiences was significantly less compared to nonRTs (χ2(1)=11.45, p=.001). No other significant differences in aggregated data were found. Chi-square test for trend found significant downward trends for jokes (χ2(1)=6.83, p=.009) and humour (χ2(1)=6.46, p=.011), matching the nonRT trends for these categories. The only other trend found was an upward trend in links to government or public health websites (χ2(1)=11.77, p=.001). This trend was unique to RTs. Comparisons between longitudinal graphs of RT and nonRT results for automated queries found only minor tweet volume changes in a few concepts. A small number of tweet activity spikes in personal opinions/interest, downplayed risk, and misinformation increased in volume when RTs were included.
The proportion of tweets using the term “H1N1” increased compared to the relative usage of “swine flu”, demonstrating gradual adoption of WHO-recommended terminology by the public and media on Twitter. With some exceptions (#oink campaign, see above) it is likely that the media's and not the public's adoption of “H1N1” was the primary reason for this trend. However, the importance of the media's terminology choice cannot be underestimated as they hold much influence as major information transmitters and word choice is critical in encouraging or discouraging certain risk behaviours .
In our manual coding we found that news and information were the most commonly tweeted H1N1-related material (52.6%). Our results correspond to a study of Twitter use during Hurricane Gustav and Ike, where roughly half of all hurricane-related tweets contained URLs (web-resources) . Collectively, our findings highlight the role of social networking tools in rapid, widespread communication in emergencies.
The change of tweet content over our timeframe is not unexpected. H1N1 surveys reporting longitudinal results using traditional methods also found that public behaviour and attitudes varied over the course of the epidemic. In these studies, public concern and engagement in protective behaviours increased when the threat of the outbreak increased and decreasing when the perceived risk declined , , . Similarly, we found that personal accounts of H1N1 increased over time, while the amount of humorous comments decreased, possibly due to the increasing perceived seriousness of the situation and/or the declining popularity of the subject.
More minute changes were also observed and were found to be highly influenced by the media and external events. Examples of this included the large spike in tweets that resulted from the WHO pandemic level 6 announcement and the two peaks in personal experiences that coincided with the first and second wave of H1N1 in North America. Similarly, a study on tweets circulated during the 2009 Red River flooding in North America also found that tweet volume related to the emergency increased when the threat was largest , indicating that perceived severity and intense news coverage are likely factors that dictate tweet posting activity. It is possible to qualitatively examine tweet content and see what story has captured the online public's attention and what sentiments those stories evoke. Similar to media stories, both viral dissemination of information and Twitter campaigns had a considerable effect on tweet volume and posting behaviour. The use of these techniques and methods may have potential usefulness in public health and should be studied further.
Our retweet analysis found that the only significant difference was that original tweets contained significantly more tweets with personal experiences compared to retweets. This finding indicates that users are not likely to repost another user's status update en mass and there is potentially little interest or perceived benefit in reposting second-hand personal information. Similarly, other studies have shown that retweets must have either broad appeal or provide specific details of local utility to be widely propagated . Consequently, the tweet spikes that decreased when retweets were removed from the automated data provide a likely indicator of stories that had these qualities.
During the outbreak, a variety of traditional media sources speculated that misinformation was rampant in social media . However, we classified only 4.5% of manually coded tweets as possible misinformation or speculation. Although this amount ranged from 2.2 to 9.2% across our 9 time points, increasing amounts of misinformation did not occur until August, months after initial media reports. Tracking tweeted misinformation and questions is potentially useful for public health agencies to address information needs of the public and direct online and offline health education initiatives and campaigns. Media monitoring has been used by the CDC to inform risk communication strategies in previous emergencies .
It is noteworthy that 90.2% of tweets provided references to information they were providing, allowing others to confirm the trustworthiness of the material. While the majority of these tweets linked to mainstream or local news websites, the proportion of links to secondary news sites (news blogs/feeds/niches, social networks, and other web pages) increased over time, likely due to information supply and demand (more information from major news providers when the topic is popular and vice versa). The lack of critical assessment and evaluation of online health information by consumers is a well-documented problem . Public health and government authorities such as the CDC and WHO were rarely referenced directly by users (1.5% of links). While mentionings of governing bodies were higher due to the proliferation of news headlines quoting or referring to them, direct linking to the authority and its resources was infrequent. An analysis of retweets also found this was the case, although there was a significant upward trend in linking to authorities over time. This unique trend may indicate that users began to recognize the utility of official resources over time.
The majority of our automated queries correlated with the results of our manual coding, suggesting the feasibility for monitoring large amounts of textual data over time. Our automated queries for concern and personal experiences were also positively related to H1N1 incidence rates, as expected, indicating that our findings have external validity. Queries that did not perform well had less defined vocabularies than others and were more difficult to associate with particular expressions. A caveat we identified is that spam and popular news articles that contained key phrases can influence search results and create peaks in activity that may not be reflective of the concept. Our queries were limited to keywords found in the manual coding and variants that the authors could anticipate. These issues emphasize the importance of analysing the overall content of the tweet and the intricacies of building a substantial search vocabulary and to employ more advanced natural language processing methods.
Public attitudes, perceptions, and behaviours during the pandemic have been reported by other studies using traditional survey methods (e.g., , , ). However, there may be practical limitations to directly comparing our results to these accounts. The largest limitation to our approach in this respect is the lack of a well-defined study population. While our database allows us to link a user with any given tweet, it was beyond the scope of this study to retrieve every user profile in order to determine the demographics of our sample. But, the service is predominantly used by Americans, accounting for 50.8% of all users . Approximately 19% of all online American adults use Twitter or a similar application . It is estimated that in the United States, 55% of Twitter users are female, 45% are aged 18–34, 69% are Caucasian, 49% have less than a college degree, and 58% make over $60K a year . These numbers may give us a sense of population demographics; however, those who tweet about H1N1 may not necessarily be representative of the Twitter population, and the Twitter population is not representative of the general population. In addition, because we are potentially sampling across the globe, it is difficult to narrow the study context and compare results with H1N1 studies that report on a certain geographic region (e.g., , ). This methodological issue is present also in traditional studies that attempt to corroborate their results with papers from different cities or countries . In the future it may be possible to take advantage of geocoding to address this problem and sort tweets based on location. Secondly, certain questions posed to survey respondents may not be completely translatable to a query concept or category, even if numerous search patterns are used. In regards to our sampling, no existing validated sampling method for Twitter has been documented in the literature and the decisions made in our study may not be optimal in all cases. We recognize that it is likely that not all relevant tweets were represented in our tweet database as some tweets may not have included our keywords and used their own terminology to refer to H1N1.
We did not observe large amounts of misinformation in our data, but this may be a conservative estimate as we did not code humourous or confusing posts as misinformation, nor did we take into account the influence of a tweet based on the number of followers a user had. While our estimates were low, we do not know the effect of any amount of misinformation that exists on the internet, particularly when internet sources are archived and indexed in search engines.
Despite these limitations, there are advantages to using infoveillance. Because our method of data collection is continuous and ongoing, the length of our study time frame likely has no survey-based equivalent. Thus far, the existing H1N1 pandemic studies have collected data anywhere from a span of one day  to four months . Those with shorter time frames have reported their results in aggregate, and only a handful has presented longitudinal results of selected questions , , , . Although our manual coding was limited to 9 time points of analysis, Infovigil is continuously collecting and analysing data, creating a significant database that captures both large and small shifts in user posting and puts them into perspective within the overall pandemic picture. This methodology may offer complementary insight to traditional survey methods at a more rapid and less costly rate.
This study illustrates the potential and feasibility of using social media to conduct “infodemiology” studies for public health. H1N1 pandemic-related tweets on Twitter were primarily used to disseminate information from credible sources to the public, but were also a rich source of opinions and experiences. These tweets can be used for near real-time content and sentiment analysis and knowledge translation research, allowing health authorities to become aware of and respond to real or perceived concerns raised by the public. This study included manual classifications and preliminary automated analyses. More advanced semantic processing tools may be used in the future to classify tweets with more precision and accuracy.
SQL Queries for Automated Tweet Coding & Analysis. SQL syntax for search patterns and keywords used by Infovigil for automated tweet coding and analysis.
(0.14 MB PDF)
The authors thank Mrs. Claudia Lai, MSc for her assistance with coding a subset of data.
Competing Interests: The authors have declared that no competing interests exist.
Funding: Mrs. Chew was generously supported by a Canadian Institutes of Health Research Frederick Banting and Charles Best Canada Graduate Scholarship Master's Award. Infovigil.com is a non-commercial project/website, partly funded by the Canadian Institutes of Health Research (CIHR) [Pandemics in the Age of Social Media: Content Analysis of Tweets for Infoveillance and Knowledge Translation Research, PI: Gunther Eysenbach]. Other parts of the project costs may in the future be defrayed by consulting and collaborating with commercial entities. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.