|Home | About | Journals | Submit | Contact Us | Français|
Automatic monitoring of Adverse Drug Reactions (ADRs), defined as adverse patient outcomes caused by medications, is a challenging research problem that is currently receiving significant attention from the medical informatics community. In recent years, user-posted data on social media, primarily due to its sheer volume, has become a useful resource for ADR monitoring. Research using social media data has progressed using various data sources and techniques, making it difficult to compare distinct systems and their performances. In this paper, we perform a methodical review to characterize the different approaches to ADR detection/extraction from social media, and their applicability to pharmacovigilance. In addition, we present a potential systematic pathway to ADR monitoring from social media.
We identified studies, describing approaches for ADR detection from social media from the Medline, Embase, Scopus and Web of Science databases, and the Google Scholar search engine. Studies that met our inclusion criteria were those that attempted to utilize ADR information posted by users on any publicly available social media platform. We categorized the studies into various dimensions such as primary ADR detection approach, size of data, source(s), availability, evaluation criteria, and so on.
Twenty-two studies met our inclusion criteria, with fifteen (68.2%) published within the last two years. The survey revealed a clear trend towards the usage of annotated data with eleven of the fifteen (73.3%) studies published in the last two years relying on expert annotations. However, publicly available annotated data is still scarce, and we found only six (27.3%) studies that made the annotations used publicly available, making system performance comparisons difficult. In terms of algorithms, supervised classification techniques to detect posts containing ADR mentions, and lexicon-based approaches for extraction of ADR mentions from texts have been the most popular.
Our review suggests that interest in the utilization of the vast amounts of available social media data for ADR monitoring is increasing with time. In terms of sources, both health-related and general social media data have been used for ADR detection— while health-related sources tend to contain higher proportions of relevant data, the volume of data from general social media websites is significantly higher. There is still very limited publicly available annotated data available, and, as indicated by the promising results obtained by recent supervised learning approaches, there is a strong need to make such data available to the research community.
Harmful reactions that are caused by the intake of medication are known as Adverse Drug Reactions (ADRs). Early detection of ADRs associated with drugs in their post-approval periods is a crucial challenge for pharmacovigilance research. Pharmacovigilance is defined as “the science and activities relating to the detection, assessment, understanding and prevention of adverse effects or any other drug problem” . The process of pharmacovigilance begins during the pre-approval clinical trials conducted for a drug and continues after the drug is released into the market. Due to the various limitations of pre-approval clinical trials, it is not possible to fully assess the consequences of the use of a particular drug before it is released . Recent studies have shown that adverse reactions caused by drugs following their release into the market is a major public health problem: with deaths and hospitalizations numbering in millions (up to 5% hospital admissions, 28% emergency visits, and 5% hospital deaths), and associated costs of about seventy-five billion dollars annually [3, 4, 5]. Thus, post-marketing surveillance of drugs is of paramount importance for drug manufacturers, national bodies such as the U.S. Food and Drug Administration (FDA), and international organizations such as the World Health Organization (WHO) .
Government agencies, like the FDA or the European Medicines Agency (EMA), have expanded their pharmacovigilance efforts in various ways to reduce the costs associated with ADRs. In the U.S., post marketing surveillance of drugs occurs actively and passively once they have been approved by the FDA. Methods to accomplish this include Phase IV studies, in addition to voluntary and mandatory reporting through the FDA’s Adverse Event Reporting System (FAERS), MedWatch, and the Institute of Safe Medication Practices’ Medication Error Reporting System (MERP) . The MedWatch program, for example, allows the public (patients and providers) to report ADRs which they suspect or observe . While it is mandatory for manufacturers to report adverse events, reporting by healthcare professionals and the public is voluntary. Due to the voluntary nature of these systems, reporting and detection of adverse events may not be timely and is incomplete. Recent research has exposed the various inadequacies of spontaneous reporting systems, prompting researchers to explore additional sources for ADR monitoring [9, 2, 10]. These systems, for example, suffer from under-reporting, over-reporting of known ADRs, incomplete data, duplicated reporting, and unspecified causal links. Various additional techniques have been utilized for post-marketing monitoring of ADRs, including retrospective chart analysis, prospective surveillance, and information extraction from electronic health records, clinical narratives and case reports. These approaches have their own associated challenges. For example, electronic health records generally face challenges associated with the pervasiveness of confounding variables, and the definition and ascertainment of exposures and outcomes . Clinical narratives present the problem of limited access, as typically, only researchers a liated with medical centers can access the data. The rapid growth of electronically available health related information, and the ability to process large volumes of them automatically, using natural language processing (NLP) and machine learning algorithms, have opened new opportunities for pharmacovigilance that could address some of the above-mentioned limitations.
In terms of monitoring public health, this has included studying smoking cessation patterns on Facebook , identifying user social circles with common medical experiences (like drug abuse) , and monitoring malpractice . When different patients that suffer from a common disease, or use a specific medication, share information about their symptoms, treatments or drug outcomes, this volume of health information can provide valuable clinical insights for both patients and health-related industries that are beyond what could be achieved by traditional communication methods . Infectious/viral disease monitoring, specifically, can benefit strongly from utilizing social media. For example, traditional systems may miss new or rare events (like a new viral strain), and lack the real time capabilities and demographics that social media would provide, including data from people that do not access healthcare through formal channels . Although specific information (e.g., age and gender) about a single user may not be usable for privacy reasons, various resources are currently available to perform demographic analysis using social media data3. Furthermore, since 1994, a number of social media based surveillance systems have been developed, reviewed, and implemented. These have been implemented locally, nationally, and globally . Recent advances in ADR monitoring have seen significant strides towards the use of automatic NLP techniques for mining drugs and associated reactions from social media. User posts in social media contain information about treatment outcomes or provide early access to reported ADRs that can be beneficial for health and pharmaceutical industries. The type and volume of ADR information that social media makes available to the health industry may not be easily accessible by other means. This includes the ADRs experienced by those with special conditions, such as patients with rare diseases, pregnant/nursing women, elderly people or patients with comorbidities who are usually excluded from clinical trials .
From the perspective of regulatory authorities, the intent of mining social media is to obtain additional data from the general public that may be used to supplement existing voluntary systems. For example, the Association of the British Pharmaceutical Industry (ABPI) published a set of guidance notes in 2013, which help researchers and stakeholders manage ADR complaints on digital media . Though the document was created for information purposes only, rather than regulatory/legal advice, it offers instructions on how to handle such ADR reports. It clearly defines a minimal information set needed to report the ADR, which includes an identifiable patient, suspect drug, adverse event, and identifiable reporter. The contact details required for the identifiable reporter are compatible with the social media domain, and include emails or screen names. It also states that this information should be collected “if possible,” which leaves room for incomplete data . The U.S. FDA has not published explicit guidelines for social media based pharmacovigilance, but it has issued regulations for publishing promotional material  and risk/benefit information  on social media. Despite the absence of a formal guidance, ADRs from social media can still be reported to the FDA. The minimum data set for an ADR report is the same as that for the ABPI. Moreover, a recent FDA presentation stated that social media ADR reports are reviewed like any other spontaneous reporting systems, while acknowledging variability in the quality of the reports submitted .
In addition to regulatory authorities, signals identified through social media could be used by pharmaceutical manufacturers, the healthcare system, or healthcare researchers to fulfill requirements of mandatory reporting. While the intent of social media mining is to provide early signals, it could potentially be used by the interested parties to validate or reject signals that have arisen in other reporting systems. Pharmaceutical manufacturers, such as AstraZeneca, have considered the use of social media from an industry perspective: focusing on manufacturers’ responsibilities to provide accurate and quality information regarding drugs . Because regulatory authorities and pharmaceutical manufacturers play a role in public safety, both may utilize social media to fulfill the safety mission.
Various pros and cons of using social media for automatic ADR monitoring [30, 31], and more generally, for public health monitoring, have been mentioned in recent literature— a full discussion of which is outside the scope of this paper. In this subsection, we briefly outline the opportunities that social media presents, and the obstacles associated with its use for health-related research.
As already mentioned earlier in this section, the size and growth of data on social media is unparalleled. Recent advances in the data processing capabilities of machines, and machine learning and NLP research present the possibility of utilizing this massive data source for a variety of purposes, including public health. The fact that it is a direct source of users’ personal experiences makes it a lucrative resource. According to Harpaz et al. , social media offers new opportunities for public health monitoring due to the availability of large amounts of data that is internet-based, patient-generated, unsolicited, and up-to-date. The use of social media for health-related and other tasks is, however, not without drawbacks and difficulties. The drawbacks found when utilizing the user generated content of social media may include issues with the credibility, recency, uniqueness, frequency, and salience of the data . Abbasi and Adjeroh  demonstrate the potential downside of each of these five points and the importance of selecting the right media channel for social media analytics. For example, the potential salience of Twitter with its short text limits. In addition to these general problems related to the data generated within social media, there are difficulties and challenges posed by the processing and extraction of relevant information using NLP techniques. A frequently encountered challenge is due to the fact that the data is generated by consumers, and as such they tend to use non-medical, descriptive terms to discuss health issues. This reduces a system’s ability to extract the terms automatically from text [17, 34, 35]. Traditional methods that are used on longer texts have proven to be inadequate when applied to short texts, such as those found in Twitter .
From the perspective of pharmacovigilance and NLP specifically, existing literature presents the various obstacles associated with the use of social media. First of all, user posts on social media contain colloquial language and also misspellings. Especially when using lexicon-based approaches, these present problems as the accuracy of direct matches decreases. Furthermore, colloquial and informal language is more difficult to parse and tokenize, and thus recent research tasks have focused on developing NLP tools specifically for data from social media . Secondly, some recent articles (e.g., [38, 39, 40, 41]) have reported the imbalance that exists in data coming from social media. Only a small proportion of drug-associated data collected from social media tend to contain information associated with ADRs. This results in problems associated with annotations, since large volumes of data need to be annotated for the inclusion of su cient numbers of posts containing ADRs. This data imbalance issue is a major problem for supervised machine learning approaches, particularly because it is the smaller class that is of primary interest for the research. Thirdly, while access to users’ personal experiences with drugs is one of the key advantages of social media, automatic determination of personal and non-personal experiences is extremely challenging. In addition to these, there are also technical, policy, and privacy challenges associated with the use of social media for pharmacovigilance, as pointed out by Edwards and Lindquist .
In this paper, we present a methodological review, which we conducted on studies that attempt to overcome some of the obstacles, and detect/extract ADRs from social media data using NLP-based techniques. The primary intent of this paper is to categorize the studies across various dimensions such as primary aim, technique/algorithm, size of data, availability of data, source and so on. Despite the recent flurry of work, there is no established evaluation framework for ADR detection. Neither is there any resource that contains the common information from various research efforts and attempts to unify them across various aspects. Thus, we believe that a review, such as this, will provide the necessary information to drive the development and evaluation of future approaches. The rest of the paper is organized as follows: the Methods section discusses the data search, selection and abstraction approaches for the survey; the Results section elaborates the various findings of the survey, including summaries of all the studies that met the inclusion criteria; finally, the Discussion section summarizes the main findings from the Results section, and concludes the paper by proposing a possible pipeline for the development and evaluation of social media based ADR monitoring systems using publicly available resources.
Pharmacovigilance using electronic data is a relatively recent research topic, and the use of social media data has only started receiving significant research attention in the last few years. As such, when collecting data, we searched for articles published in the last ten years only. We searched the databases Medline and Embase, and also the citation databases Scopus and Web of Science. We obtained relevant citations from the Medline and Embase databases by using the advanced search options. When searching, besides enforcing the constraint associated with the year of publication, we added several keyword-based constraints. To summarize, we attempted to obtain publications that contain indications of ADR detection or Pharmacovigilance AND social networks, social media, online forums, online health communities or message boards. Figure 1 presents some example search queries that we used for searching Medline (using the PubMed4 interface). Since ADR detection from social media generally involves the use of natural language processing (NLP), computational linguistics or text mining techniques, we suspected that there might be publications that are more computer science oriented rather than medical informatics. Thus, the publication venues for such articles may not be indexed in Medline. We, therefore, searched Google Scholar using the same keywords to identify publications that may not have been indexed in the more medical focused databases such as Medline.
For all the search engines, we sorted our search results by relevance. We filtered a total of thirty-nine publications for manual review and obtained their full texts. We added articles to this list if their titles or abstracts suggested that the investigators utilized data from social media for detecting ADRs or for monitoring drug safety in general. Studies that met our inclusion criteria were those that presented original data, utilized any internet-based resource (e.g., forums, message boards, social networks), and indicated the use of automatic algorithms for ADR detection (e.g., NLP techniques, and/or rule-based or machine learning approaches). In our initial shortlist, we included studies for which we could not determine if the data in the internet-based resource consisted of user posts, or if we could not immediately determine whether the detection algorithms and analyses were automatic or manual. Our exclusion criteria included studies that utilized data sources that emerged from the following information systems: laboratory, pharmacy, radiology, or administrative. Studies were also excluded if they focused exclusively on drug-drug interactions, detected ADRs in randomized controlled trials, drug labels, or were not published in English.
For all included studies, we abstracted data on study characteristics including study size, research aim(s), primary ADR identification/extraction approach, data source, availability of data, and the type of evaluation performed. For the study size factor, we focused on two aspects— size of data and number of drugs. We also attempted to categorize if the study focused on a specific sub-domain of drugs (e.g., diabetes) or included a more general set of drugs. Classifying the primary identification/extraction technique was slightly more challenging because some articles describe the whole pipeline— from data collection to ADR detection. For these studies, we focused on the general approach that was employed at the final stage of detection. For example, we found that a number of techniques relied on ADR lexicons, while another set of techniques relied on detecting linguistic patterns for the ADR detection task.
For the data source factor, we categorized approaches based on the social network or type of social media from which the data was extracted. In terms of availability of data, we categorized studies based on whether the data used for the study were publicly available for research purposes or not. Furthermore, we also abstracted studies based on whether they utilized annotated data, which may be utilized for supervised machine learning, and is invariably more useful than unannotated data. Finally, for the evaluation criterion, we categorized articles based on the type of evaluation performed to assess performance. At a high-level, this included determining if the studies presented qualitative or quantitative evaluations. For quantitative evaluations, we further categorized on the specific evaluation approaches used.
In this section, we provide details of our methodical survey of the literature. We first present a summary of the data collection process. Following that, we summarize our review of the selected literature using the criteria mentioned in the previous section. We elaborate on the studies in the Discussion section.
Our data search using the various search engines resulted in more than 1,500 citations, of which thirty-nine articles were retrieved and reviewed in full. The false positives consisted of a variety of topics including research on social media (e.g., trust and security), NLP approaches for social media mining, and pharmacovigilance studies focusing on non-social media data. We excluded articles for the various reasons mentioned in the Data search and selection subsection. Our final set consisted of twenty-two publications, which describe automatic methods for ADR detection from user posted data on social media. This set consists of journal articles, and conference and workshop proceedings. The earliest, pioneering work we identified was from 2010 , which employs a lexicon-based approach and manually annotated data for evaluation. Following this work, this research topic has received more attention with three publications in the years 2011 and 2012, four in 2013, and eleven in 2014.
We now present two tables summarizing some of the key information associated with the studies that we reviewed. In addition, we present some statistics and explanations regarding the contents shown in the table.
Table 1 summarizes crucial characteristics of the studies. In addition to the publication years, it shows the data sources, sub-domains of focus (if any), number of drugs involved in the studies, the sizes of the data used, and annotations and availability of the data. The table illustrates some key information regarding what pharmacovigilance research has covered over the years, and how research has evolved. The study by Leaman et al.  utilized data from the health related social network DailyStrength  and exploited expert annotated data. The number of drugs studied, however, is only six. Table 1 suggests that DailyStrength is a relatively popular source of health-related user posted data, and it is used by six studies in total. The table also suggests that early, exploratory research generally focused on a small number of drugs for ADR investigation. Prior to 2014, there is only one study that involved more than ten drugs for investigation. Very recent studies, tend to go beyond investigating ADRs associated with a small set of drugs, as depicted by the last few studies in the table. Furthermore, while some studies focused on specific domains of drugs (e.g., breast cancer, diabetes, etc.), most studies, particularly very recent ones, tend to concentrate on a range of drugs not specific to a domain.
In terms of data sizes, the studies presented in Table 1 can be divided into two important categories— large data sets without any expert annotations, and relatively small data sets which contain expert annotations. Among the twenty-two publications included in this review, fourteen (63.6%) utilized expert annotated data and eight did not. Among the fifteen papers published since 2013, eleven (73.3%) exploit annotated data. The table suggests that there is an increasing trend towards the use of annotated data for ADR detection. Some of the studies [40, 42, 45, 46] utilize very large volumes of data and derive statistics via unsupervised techniques. In contrast, studies that rely on annotated data, are capable of applying supervised approaches and also evaluation against goldstandards prepared by human experts. Perhaps as a consequence of the benefits of using annotated data, the recent efforts at creating annotated corpora have been executed. However, the public availability of annotated data is still a concern. We only found four data sets that have been made publicly available [50, 58, 38, 35, 41, 65], all of them published since 2013. The data set14 released by Yates and Goharian  contains only 247 posts containing ADRs. The data set15 released by Segura-Bedmar et al.  contains only 400 posts in Spanish. The latter data set, therefore, is unlikely to be suitable for future research tasks in English, but is the first of its kind in languages other than English. Both these data sets contain binary annotations only, and are also quite small, meaning that their use in supervised learning technique is likely to be minimal. The data set16 discussed in [38, 35] contains only binary annotations indicating whether a Twitter post contains an ADR or not, and includes over 7,000 instances (70% of the full set used in the study). While this data set, as published, is not suitable for supervised extraction of ADRs from text, it is suitable for training algorithms to detect ADR assertive text— a task that has already received attention within and outside of social media , and will be crucial to explore within this domain as well. In a more recent publication, span and concept normalization annotations for the same data set, containing over 1,500 instances, have been released to the public , and this data set can be utilized for ADR extraction tasks.
Table 2 provides a brief summary of the ADR detection/extraction approaches proposed by the studies, their primary research aims, and how the evaluations were designed. Note that in this context, ADR refers to adverse reactions only, as well as drug and adverse reaction pairs. The table follows on from the information provided in Table 1, and enables us to achieve an understanding of the success of different classes of approaches for ADR detection/extraction problems. The table illustrates that two of the most frequently addressed problems have been the detection of comments/sentences discussing ADRs, and the extraction of specific ADRs from sentences. This suggests that these two problems are perhaps the most important for systems attempting to propose end-to-end pharmacovigilance solutions. The evaluation approaches, however, vary more between systems. When annotated data is available, generally standard measures such as Recall, Precision and F-score are used. In the absence of annotated data, evaluation approaches and metrics tend to be varied. Later in this section, we discuss some of the evaluation methodologies mentioned in 2. For the readers, this will serve as a resource of compiled information for this task.
Our survey revealed that ADR lexicons and knowledge bases have been the most widely used resource for pharmacovigilance techniques from social media. These resources contain lists of ADR mentions, collected from various sources ranging from drug labels, clinical trials, caregivers, and even user posts on social media. Significant efforts have been made for the creation of new knowledge sources and the combination of existing ones. From the studies that utilized lexicons, we have compiled a list of resources containing ADR mentions, which is as follows:
A number of the studies we reviewed focus on the automatic classification of user posts to determine if ADRs are mentioned in the posts [40, 45, 48, 49, 38, 41]. The motivation for such classification approaches arises from the fact that most drug related posts on social media are not associated with ADRs, and thus, filtering out irrelevant posts is a crucial step in identifying ADRs. Such supervised classification approaches require manually annotated data, and large numbers of annotated posts are required to make reliable evaluations. The recent preparation of large, annotated data sets (e.g., the data set described in [38, 41]), will invariably be crucial future resources.
Research tasks have designed supervised classification efforts using broad categories of drugs as classes, and the comments associated with them for training (e.g., ). Very small training data have also been applied (e.g., ) with common machine learning algorithms such as Näive Bayes, Support Vector Machines and Maximum Entropy. One important challenge that has been constantly discussed in supervised learning tasks is the data imbalance in social media text [40, 38, 39]. The research by Ginn et al. , which is, to the best of our knowledge, the largest annotated data set from a generic social media website (in this case Twitter), suggests that only a very small amount of drug related posts contain ADRs (approximately 10%). Recent annotation work on health-related social networks  suggests that the proportion of ADR associated information in such networks is higher (approximately 20-25%). However, this imbalance is still a challenge from the perspective of machine learning, and this problem has been addressed in detail in very recent research . In the mentioned study, the authors employed a number of strategies including the use of weighted classifiers, incorporation of features from other text classification problems, and the combination of multiple corpora for training.
A majority of the papers that met our inclusion criteria focused on identifying specific ADR mentions from user posts and extracting them. Most of the approaches (54.5%) mentioned in Table 2 are lexicon-based, meaning that their primary technique is to identify ADRs using a list of precompiled ADR mentions [17, 42, 44, 46, 54, 57, 58, 35, 61, 62]. Considering the availability of several extensive ADR resources, applying lexicon-based NLP techniques can successfully identify a subset of the ADR mentions in user posts. However, pure lexiconbased approaches do not address some important challenges. Consumers do not always use technical terms found in the existing lexicons. Instead, they use creative phrases, descriptive symptom explanations, and idiomatic expressions. For example, the phrase ‘messed up my sleeping patterns’ was used to report the well-known ADR ‘sleep disturbance’ in the data set made available by [38, 35]. Even when a mention in a user sentence is matched with a lexicon term, it is not necessarily an adverse effect. The terms used to describe ADRs can also be used for indications, beneficial effects, or other mention types. Finally, the various properties of user generated text mentioned earlier (e.g., misspellings, abbreviations, and phrase construction irregularities) limit the performance of lexicon-based approaches.
In addition to extracting ADR mentions, some studies have focused on identifying the relations between ADRs and drugs. Following the study by Nikfarjam and Gonzalez , a popular approach for the discovery of drug-ADR pairs, in lexicon-based and other techniques, has been the use of association rule mining — a class of techniques by which associations between entities are discovered. In general, following the identification of ADRs and drugs, association rule mining is used to identify if a drug and ADR pair is associated or not. Frequent occurrence of drug-ADR pair mentions in close proximity within user posts are considered to be indications of ADRs associated with the drugs, and these associations are detected by association rule mining in unannotated data.
While most approaches use lexicons for detecting drug and ADR mentions in text, some attempt to discover patterns in texts which are likely to be indicative of ADRs [34, 50]. An advantage of pattern-based approaches over lexicon-based approaches is that they are capable of detecting inexact matches. This is particularly useful for mining social media where users frequently use colloquial terms and the texts contain misspellings. The hypothesis behind using pattern-based approaches for social media mining is that, although users tend to use highly informal language, they tend to use some converging patterns, which can be used to detect ADR mentions. One of the main drawbacks of such approaches, however, is the need for very large amounts of data for the generation of patterns. With the generation of annotated data in recent times, supervised learning approaches are becoming increasingly popular, and they have also shown promising performances in quantitative evaluations [62, 65].
Our review of the approaches for ADR detection/extraction techniques from social media suggests that despite the increasing interest in this research area, a common evaluation approach that can be applied across systems is still absent. This is primarily due to the absence of common data sets which can be utilized for performing comparative evaluations of systems. As such, research tasks generally design their own evaluation approaches, and either propose new evaluation techniques or use existing evaluation techniques compatible with their proposed approaches. We now briefly discuss some of the evaluation approaches that have been applied. We group them into two broad categories: Qualitative and Quantitative.25
The end goal of ADR detection from social media sources is to be able to identify drugs that are either frequently related to ADRs or those that are associated with serious ADRs. Therefore, some past research has focused on devising strategies for computing scores for drugs, with known ADRs, based on various criteria, and perform the final evaluation in a qualitative manner. For example, Chee et al.  use an ensemble classification approach to classify drugs into two predetermined categories: watchlist and normal. Following this, a score is computed based on the number of times a drug is classified as watch-list, and the scores are compared to drugs already withdrawn from the market for associated ADRs. The final evaluation is qualitative, with a comparison of the scores obtained by withdrawn drugs and some watchlist drugs, which, according to the authors should be scrutinized. Similar qualitative discussions accompany quantitative analyses in [45, 49, 39]. Patki et al.  utilize supervised machine learning to classify comments associated with drugs belonging to two categories blackbox26 and normal. For evaluation, the authors combine the classification probabilities of comments associated with each drug and suggest that the combined probabilities may act as indicators for the detection of drugs containing important ADRs. The evaluation, however, is not fully quantitative, and primarily compares and discusses the reasons for misclassified drugs.
As already mentioned, comparing all the different systems that have been reviewed in this paper is not possible, since most research tasks utilized inhouse data that have not been made available to the research community. Most research tasks have been designed such that evaluations could be performed using existing metrics such as Recall, Precision, F-score and Accuracy [17, 34, 42, 43, 45, 46, 50, 57, 58, 38, 39, 41, 62, 65]. In the absence of manually annotated data, these metrics have been computed using various gold standards, such as known adverse reactions from FDA product labels  or databases [44, 57]. We found ten studies that used manually annotated data for the evaluation of the drug-ADR extraction task. Table 3 presents the results, and illustrates the difficulty of comparing the various systems because of the use of different data sets of varying sizes. Considering the small amount of annotated data based on which most of these systems were built, it is likely that the overall performances will improve as more annotated data become available. Other metrics for quantitative evaluations have also been used, though less frequently. They include: lift, leverage, proportional reporting ratio ; and matching rate .
Our survey covers research efforts for automatic pharmacovigilance techniques from social media data. The review includes carefully selected articles, published over the last four years, starting with the pioneering work of Leaman et al. . The studies included in the study show the growing attention that the utilization of social media data is receiving. Moreover, while early research tasks have been mostly exploratory, recent approaches have illustrated the need and interest for structured standardized approaches and annotated data. All but six studies in our sample used data that is publicly unavailable for system development and evaluation. As such, at this point, performing a direct comparison of existing detection/extraction approaches is impossible. At the same time, evaluations of systems have also progressed in various directions, without the development of any standard evaluation criteria. A transition in research methodologies is however clearly visible, as large annotated data sets are gradually becoming available.
Most extraction approaches relied on using lexicons for identifying/extracting ADR mentions in text, while pattern-matching-based approaches have also been applied. Lexicon-based approaches face specific obstacles when applied to social media data, whereas pattern-based methods require large amounts of data for system development. Only recently, there has been a trend in supervised learning approaches that attempt to utilize annotated data, and it is likely that comprehensive supervised classification approaches will be used more frequently in the near future.
Building on this review, we propose a possible framework for future ADR detection efforts from social media. Considering the recent developments of annotated data and large-scale annotation efforts, much of future research will invariably attempt to utilize supervised learning approaches. In the proposed framework, we only referred to data that is publicly available for performing ADR detection from social media. Figure 2 presents a high level illustration of the framework.
The first step in working with social media data is the collection of the data. All the papers discussed in this review perform data collection from various sources. For health related social networks, such as Dailystrength, the collection of relevant data is generally easy since the data is categorized according to various criteria (e.g., drug name). For generic social networks, such as Twitter, the collection problem is harder. It is possible to collect posts by using drug names as search keywords, but drug names are often misspelt by users. To address this problem recent research [38, 35] has utilized phonetic spelling filters to generate common misspellings for drug names . These recent advances in NLP will aid future data collection processes.
Following data collection, the challenge is to filter data. As explained earlier, data imbalance is an important problem in ADR mining from social media text, which has resulted in various research tasks on classification of ADR assertive text [49, 48, 38, 39]. With the creation of recent publicly available corpora (e.g. [50, 38, 35, 58], learning algorithms can be trained and optimized to detect ADR assertive instances with high accuracies. Most classification research, however, have only used very basic linguistic features for classification (e.g., bag of words), and only very recent research has focused on exploring deep linguistic and semantic features and advanced machine learning techniques .
Effective filtering/classification techniques are likely to aid the process of ADR mention extraction by removing the majority of irrelevant information. We have discussed various ADR extraction approaches in the paper, the most popular being lexicon-based ones. Lexicon-based approaches have benefited from recent expansions and merging of existing lexicons, and the incorporation of colloquial terms. Recent release of publicly available annotated data  will inevitably popularize supervised learning approaches for this task.
The last step in the pipeline is to perform statistical analysis on extracted drug-ADR pairs to identify potentially harmful drugs. This step has hardly received any research attention till date, and we only identified some exploratory research attempting to perform this task on social media data [40, 39]. Progress in ADR extraction and classification research is likely to raise the research focus on the analysis of drug-ADR signals generated from social media data. Considering the rapid growth of social media data, this source of information is likely to have a massive impact on pharmacovigilance research.
We thank the NIH/NLM support for this project (Award: 1R01LM011176-01).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
2http://www.statisticbrain.com/twitter-statistics/. Accessed on: 26th August, 2014.
3Some resources for demographic information analysis: http://textalytics.com/core/userdemographics-info, http://www.demographicspro.com/, http://smallbiztrends.com/2013/04/research-twitter-followers-demographics.html.
4http://www.ncbi.nlm.nih.gov/pubmed. Accessed: 6/10/2014.
5The following social media sites were involved: gro.recnactsaerb, gro.nemok, gro.recnac.nsc, gro.troppuscb, moc.sdraobhtlaeh, moc.ssapmocrecnac, moc.dmbew, gro.htgnertsyliad, moc.htlaehnoitulover, moc.murofhtlaehe, moc.harpo
6This is the number of threads included, not the number of comments.
7Only 200 comments are annotated for evaluation.
8Only 10% of this data is annotated.
9Study also includes data from AERS and Medline.
10Includes 6 drugs from Leaman et al.
11This is the data that is obtained from the three sources mentioned. The study utilized additional non-social media data.
12Not unique drugs. The number of unique drugs is not mentioned.
13Study also includes a corpus from outside social media.
14ir.cs.georgetown.edu/data/adr/. Accessed on 06/12/2014.
15http://labda.inf.uc3m.es/doku.php?id=en:labda_spanishadrcorpus. Accessed on 04/12/2014.
16diego.asu.edu/downloads/. Accessed on 06/12/2014.
19http://bioportal.bioontology.org/ontologies/COSTART. Accessed: 06/13/2014.
20http://www.consumerhealthvocab.org/. Accessed 06/16/2014.
21http://www.hc-sc.gc.ca/dhp-mps/medeff/index-eng.php. Accessed on 06/13/2014.
22http://www.nlm.nih.gov/research/umls/. Accessed 06/16/2014.
23http://www.meddra.org/. Accessed 06/16/2014.
24http://sideeffects.embl.de/. Accessed: 06/13/2014.
25For evaluation approaches applying a mixture of quantitative and qualitative evaluations, we categorize them into one of the categories based on the primary evaluation ideology.
26Drugs containing boxed warnings regarding ADRs.
Abeed Sarker, Department of Biomedical Informatics, Arizona State University, Scottsdale, Arizona.
Rachel Ginn, Department of Biomedical Informatics, Arizona State University, Scottsdale, Arizona.
Azadeh Nikfarjam, Department of Biomedical Informatics, Arizona State University, Scottsdale, Arizona.
Karen O’Connor, Department of Biomedical Informatics, Arizona State University, Scottsdale, Arizona.
Karen Smith, Rueckert-Hartman College for Health Professions, Regis University, Denver, Colorado.
Swetha Jayaraman, School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, Arizona.
Tejaswi Upadhaya, School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, Arizona.
Graciela Gonzalez, Department of Biomedical Informatics, Arizona State University, Scottsdale, Arizona.