As shown in Table
, the participants were from six geographical regions in eleven countries. Greater geographical diversity was achieved among the Cochrane author sample than the non-Cochrane sample. Of the 32 respondents, 66% (21) had M.D. degrees, six of those had a joint medical and graduate research degree; 44% (14) were women.
Participant characteristics (n = 32)
Our study uncovered several major themes. First, study respondents had different understandings of what was meant by unpublished data. Second, contacting study authors was the most common method used to obtain unpublished data and the value of regulatory agencies as a data source was underappreciated. Third, using the data obtained was time consuming and labor intensive, and respondents described a variety of methods to organize, manage, and use the data in their reviews. Fourth, respondents had a shared belief that data should be accessible, but some had concerns about sharing their own data. Fifth, respondents believed that obtaining unpublished data for reviews has important public health implications. Lastly, there was widespread support for government intervention to ensure open access to trial data.
Definition of unpublished data
Most respondents distinguished between entire unpublished studies and data that were missing from published studies and considered both to be unpublished data. One respondent discussed this saying, ‘the main problem is that we can’t get access to unpublished studies. We don’t even know that they exist and for studies we know about, data are often missing.’ Another respondent noted ‘they’re [trials] never published, they’re never made available, they’re not out there to inform practice.’
Some respondents felt that unpublished data are any data not included in a peer reviewed journal article. Others believed that unpublished meant any data that were not publically available. Therefore, abstracts, conference proceedings, and other ‘grey literature’ were considered unpublished. Others disagreed and believed that abstracts were published data. For these participants, unpublished meant data that could not be publicly obtained in print. One respondent discussed why it is important to look outside of peer-reviewed journals, ‘we also want to be aware of what trials are out there and may be presented at meetings because, you know, journals may have long publication times, but if you’ve got a meeting abstract and you can work a little bit with the author, often you can include some of that data.’
Even when relevant studies were identified, respondents reported that they were missing key pieces of data to complete their reviews. In this instance, unpublished data were described as the specific outcomes that were not included in any publicly available published article of the drug trial.
Standard deviations were mentioned frequently as necessary pieces of data that were often missing from published reports. One respondent described unpublished data as being more than outcome data, and included ‘details of the methodology, data that would go in a Cochrane risk of bias table, and any results not mentioned in the report.’
Sources of unpublished data
The respondents’ choices of data sources are shown in Table
. Our analysis of the interview data suggests that these choices were influenced by how they defined unpublished data and whether they believed a source had the data and would be willing to share them. The most frequently mentioned source of unpublished data was study authors. All respondents contacted study authors, yet many did not go further because they did not know that the data might be available elsewhere or they did not believe they would be able to gain access to the data.
Advantages and disadvantages of sources of unpublished data described by interview respondents
Some respondents turned to trial registries to help identify entire unpublished trials. The benefit of the trial registry is that it can help a researcher identify trials for which no outcomes have been reported. The shortcomings are that they do not provide any data and there is no contact person listed.
Others sources that were less frequently mentioned were grey literature, which, depending on the respondent, might include non peer-reviewed publications, abstracts, conference proceedings, or dissertations. One respondent mentioned law firms involved in class action lawsuits as a data source, but noted that while a class action lawsuit can reveal all of the trial data, they do not occur very often and cannot be considered a source for many drugs. The media, press releases, and grant organizations were also mentioned as potential sources for unpublished data.
Strategies for obtaining the data
Contacting the study author
Respondents used similar processes to approach study authors about obtaining unpublished data and believed two methods were most successful. The first was keeping the initial e-mail to the study author descriptive, friendly, and concise. Making it easy on the study authors was important in receiving the data because, as one respondent stated, ‘it’s just a matter of understanding that people, even if they want to collaborate, they don’t have a lot of time to do that. So, if you make their life more complicated, they’ll say, “Okay, why do I need this problem?”’ The second method that increased response rates was to try to establish a personal connection with the author. Additionally, respondents noted that more senior authors were more likely to be successful in obtaining data. Respondents believed they would also have more success if their request came from a larger organization with which the researcher was affiliated (for example, a Cochrane Center or the World Health Organization) rather than contacting the author as ‘a lone wolf.’ Generally, respondents would only contact a study author twice if they received no response. However, others believed that persistence was important. One respondent contacted unresponsive study authors ‘every six months for two to three years, until the review was completed.’
Unresponsiveness or refusal to share data
There were three main reasons provided to our respondents as to why the study author(s) they contacted would not or did not provide the data our respondents were seeking. The first was that the study author had moved from one institution to another and no longer had access to the data. The second reason was that the study author did not have the resources to search for or gather the data. Two respondents discussed the need to obtain extra funding to get unpublished data. One paid study authors directly for the data, which he stated increased the response rate considerably.
The third reason noted by our respondents was that authors wanted to maintain control over the data for their own publications and professional advancement. One respondent described it this way, ‘clinical trialists are quite willing to share their data if they had published their own results.’ Although respondents did not agree with this practice, as academicians, they shared similar concerns. One respondent stated, ‘I could understand if you put a lot of energy and resources into gathering data that you don’t want to just turn it over to someone else to publish all of your papers with.’
Researchers who were unable to obtain the data from the study author believed that the potentially most fruitful next step would be to contact the pharmaceutical company that sponsored the trial. However, due to past negative experiences or perception, many respondents did not believe they would be granted access to the data and concluded that the attempt would not be worth the effort. While Cochrane authors were less likely to contact drug companies, there was a general distrust of the pharmaceutical industry by all respondents. One respondent stated, ‘an overture to a pharmaceutical company to get data from them is not likely to meet with any kind of success. You would have to – I would assume that anything I got back from them was not likely to be valid or that you couldn’t actually depend on it.’ Drug companies often responded with reasons as to why the data could not be released. One respondent stated that she was told by multiple company representatives that the ‘quality of the studies is just too poor to do meaningful analysis and based on that argument we did not get the data.’ Even when companies were required through legal action to make data available, most respondents believed it would be difficult to make the analysis free of industry influence since some settlements required that company employees still be involved as consultants.
One of the most informative and comprehensive documents that a reviewer can obtain from the pharmaceutical company is the clinical study report. The ‘clinical study reports contain hundreds to several thousand pages and at an aggregate level contain everything.’ However, the immense size of the reports makes the process of finding the needed data very laborious and time consuming. The size also allows unflattering data to be hidden among the thousands of pages. One respondent discussed how he had ‘no doubt’ that the large size of the clinical study report was a ‘deliberate tactic that drug companies use that they drown us with data. They submit so many thousands of pages on their clinical trials that no one in the whole world will ever read all this. And there are examples that they have hidden, even deaths, deeply inside a report. And there is not any big chance that anybody will ever find it.’
Despite the size of clinical study reports, one respondent described them as ‘formulaic’ and said that if one understands the formula, then one can easily navigate through the documents. For example, a respondent stated, ‘once you’ve got your head around the structure of one clinical trial report from a company, and a drug, and topic then by in large you know your way around most of the others…sometime it’ll be as simple as table twenty one is always telling you about death… almost all of them have got extensive indices at the front and/or back… almost all of them as well have very extensive sections dealing with the methodology, how missing data were dealt with and so on. And when it comes to significant and severe adverse events they’ll usually have individual patient narratives tucked away in the appendices and one can look and search for those as well.’
Another respondent conceded that these reports are helpful, and ‘your means of verification with the clinical study reports are far higher’ than in a peer-reviewed journal article. He did however, feel that ‘the clinical study report is a commercial document…it sells the drug to the regulators, so you shouldn’t be trusting it.’
Some respondents believed that if you didn’t work with companies you would be cut off from obtaining data. Respondents who had success with drug companies had usually been involved in a review supported by a drug company that agreed to make data available. But even those who had been involved in a review with a cooperating drug company may not have success obtaining data for another review.
While most of our respondents worked in academia, one who currently works for a pharmaceutical company had a different perspective: ‘the pharmaceutical companies in general do allow access to the data, but not for anyone. So, typically what would happen is there would be a protocol, it would be reviewed by an internal committee, and there would be an assessment of the scientific rigor of the work, and the capability of the investigator. To provide someone with data requires a fair bit of work. And, so there would be an assessment of is there a good trade off for the amount of work required to provide the data and the expected utility of the data once completed.’
Respondents were generally unaware of data from regulatory agencies. The Food and Drug Administration (FDA) and the European Medicines Agency (EMA) were the two regulatory agencies mentioned by respondents. Reviewers saw the FDA as a particularly valuable resource because the FDA has data and analyses that do not get published in peer-reviewed journals. One respondent outside of the United States had high praise for the FDA, stating, ‘the FDA is brilliant in the last few years because of what it does and the way it treats data… What happens now is that the FDA will perform its own analyses on the company’s data, they won’t just rely anymore on the company’s analyses of data and that analysis includes an analysis of the individual patient level and it includes an analysis, which we think is more appropriate where they use a variety of different techniques for dealing with data when patients drop out of studies.’
Since the regulatory agency has the documentation necessary for approval, one respondent believed that ‘having access to regulatory agency submissions is really the kind of gold standard of information on results because it’s the full report of all the analyses that have been done and it’s the full protocol that’s available through those submissions.’ However, respondents noted a number of major limitations of the data available from the FDA website: ‘a lot of it is redacted for proprietary reasons, and it’s not necessarily posted in the most timely fashion or in the most convenient format.’ In addition, the FDA website does not include raw data, complete data on harms, or all post-approval studies. Individual case report forms are not readily available, although summaries of these case reports are provided by the FDA.
The FDA’s site is not user friendly. One respondent found ‘it very hard when I go to the FDA site to find reports. I really have to seemingly dig around a long time, and I suspect a lot of people don’t even bother.’ One respondent felt that the site was so difficult to navigate, and the format of the data so unwieldy, that she preferred not to use it as a source of data. One reviewer who acknowledged that searching the FDA site is ‘a daunting task’ noted that experience made it easier and that reviewers could be trained to use the site efficiently. One respondent was optimistic about the direction the regulatory agencies were heading with regard to making data more available. He said, ‘they have become more accepting of the fact that it’s really illogical and unfair that they should have access to all the information and then to approve the drug or device, and then those of us who actually use or prescribe those interventions then only have access to a biased subset of that information.’
Using the data in reviews
Respondents discussed the processes they used to organize and manage the data that they obtained. One respondent discussed the collaboration required after receiving data from a drug company that were a ‘complete mess. I had no idea what was going on. Fortunately, it being [Academic institution], a few very clever people around, one of whom was an Associate who was an absolute whiz kid at spread sheets and artificial intelligence and he had mechanisms which allowed us to make sense of the data.’ Another respondent described the process of cleaning the data for use in the review as ‘a huge project. I have had two full-time researchers working on this for a full year now. So we are trying to develop the methods whereby we can digest these thousands of pages without as I told you having to read every word in them.’
The process for extracting usable data can span months or years. One respondent described the process with his research team whereby three of them ‘sat in my room with, I think, three or four computers, all big screens running simultaneously, looking through the data - checking and double checking, triple checking each other and entering data into our Pro forma sheets at the same time. So we just got it done while we were all concentrating on it.’ Another respondent, working under a deadline, had received all of the data, and then checked ‘data for 53 studies, 14,000 patients, then every study was checked by a statistician, I mean the complete IPD (individual patient data) data…we were working night and day, night and day because that was a special project.’
Sometimes the amount of data obtained was simply too much. One respondent was told he could receive 70 meters of paper from a regulatory agency and ‘gave up.’ Another respondent who had to manually go through paperwork felt that it was ultimately ‘not worth doing that. It was a good three days work for a table of data.’ The respondents felt that it was important to be discerning about time spent gathering certain data and weigh that against how critical those data are for that review.
Beliefs about unpublished data
The respondents expressed a variety of views on collaboration and willingness to share data with other researchers, language barriers, the role of systematic reviews and their impact on public health, and ideas about how the accessibility of data can be increased.
Most of the respondents would share data with other researchers when asked and believed that data should be available. However, some respondents who were also trialists were unwilling to share their own data or collaborate with other trialists. One respondent thought the main barrier to data sharing was a lack of collaboration among researchers, and he found it ‘singularly unhelpful to work with people outside [of his institution].’ Others felt ‘uncomfortable’ about sharing data because data could be misrepresented or inappropriately analyzed post hoc. While some believed that data should be available, they felt that the availability should be limited to qualified researchers who could demonstrate how the data would be used. Others felt that unpublished data should be widely available to the public with one suggesting that The Cochrane Collaboration establish a database of all unpublished data collected for Cochrane Reviews.
Language was viewed as a barrier. One respondent said that ‘for better or for worse, English probably is the common currency in terms of scientific publication, and, so most of the best scientists want their work published in English.’ Therefore, non-English language speakers are at a disadvantage when it comes to obtaining data from English-language publications.
There were differing opinions as to how the inclusion of non-English publications would affect the outcome of reviews. Some respondents did not think it was worth the effort to obtain data published in languages other than English, while others felt that it was important not to exclude those studies.
All of the respondents believed that their efforts to obtain unpublished data were important. Many respondents agreed that drug trial data should be more easily accessible to the public. One respondent stated that, ‘we should have the totality of the information available to us and if there are studies which have been done and they are reasonable studies, which could help make decisions either about the effectiveness or harms of medicines then they damn well ought to be in the public domain ‘cause it’s the public that are taking the tablets.’ Another respondent also believed these data should be public, and said ‘This idea that they contain material that the company should be able to keep private is absurd. Human lives were involved in these trials and I think that information should be public.’Another respondent noted that ‘patients would be shocked’ to know that data were unavailable to researchers.
Others stated that without access to unpublished data, the true harms and benefits of many drugs would never be known. Respondents noted that conclusions of drug reviews that do not include unpublished data are ‘dangerous,’ with one noting, ‘peer review journal articles represent a cherry-picked subset of what went on. And I think clinicians and researchers need to know the whole story and not just this sanitized cherry-picked version of the truth.’ One respondent felt that The Cochrane Collaboration was impeding progress by publishing reviews that do not have the full data on a drug, noting ‘we don’t expect Cochrane reviews to be biased towards a certain product, but they are.’ He suggested that Cochrane reviews should ‘come to the conclusion that they can’t get the data’ rather than a conclusion about the efficacy of a drug.
Some of the respondents described how their reviews changed prescribing practices after they exposed harms by including unpublished data. One respondent’s review caused the drug company to remove the drug from the market and cease all ongoing clinical trials. Some respondents felt that difficulty in obtaining data caused unacceptable delays of public health significance. One researcher who was finally able to obtain data from the FDA after three years of repeated Freedom of Information (FOIA) requests stated simply, ‘I think my review is ten years too late.’ The results of his review demonstrated harm at the high dose of the drug. Another respondent who has been a vocal critic of industry felt that his work had negatively impacted public health and believed, ‘I’ve probably increased the sales of drugs. Companies can use critics to increase sales of their drugs. I don’t think I’ve made any difference or whatever, maybe made things worse’.
Ways to increase data access
All of the respondents believed that trial data should be made available and that access should be increased but there were differing opinions as to the best way to make this happen. There was widespread support for government intervention and regulation. One respondent described how government intervention could help, ‘it’s very easy if you want to, to demand as a condition for having approval of doing the trial, you need to provide all the data within a certain amount of time… And if you don’t provide the totality of the data, then there could be sanctions like if it is a drug company, the sanction could be that they are not allowed to do anymore trials.’ Another respondent concurred that it ‘would be a great thing for the government to manage and make accessible.’ One US respondent felt that political will could facilitate access to data and found it necessary to ‘get a Senator or Congressman to get it for you.’
There were differing opinions, though, as to who could access the data. Some felt that it should be publicly available to anyone without any barriers. Others felt that the process should be more discerning, with data available to scientists and researchers who could understand and make sense of the data as well as demonstrate how they would use the data, which could involve submitting a protocol for the review.