|Home | About | Journals | Submit | Contact Us | Français|
The practice of research is full of ethical challenges, many of which might be addressed through the teaching of responsible conduct of research (RCR). Although such training is increasingly required, there is no clear consensus about either the goals or content of an RCR curriculum. The present study was designed to assess community standards in three domains of research practice: authorship, collaboration, and data management.
A survey, developed through advice from content matter experts, focus groups, and interviews, was distributed in November 2010 to U.S. faculty from 50 graduate programs for each of four different disciplines: microbiology, neuroscience, nursing, and psychology. The survey addressed practices and perceived standards, as well as perceptions about teaching and learning. Over 1,300 responses (response rate of 21%) yielded statistically significant differences in responses to nearly all questions. However the magnitude of these differences was typically small, leaving little reason to argue for community consensus on standards. For nearly all questions asked, the clear finding was that there was nothing approaching consensus. These results may be useful not so much to teach what the standards are, but to increase student awareness of the diversity of those standards in reported practice.
Over the past 20 years, education in responsible conduct of research (RCR) has become increasingly commonplace in American universities. This can largely be attributed to increasing requirements for such education first with the National Institutes of Health (NIH 1989) and more recently with the National Science Foundation (NSF 2010). Despite these efforts, it is clear that there is little or no agreement about what the goals of such programs are or should be (Kalichman & Plemmons 2007). It is not surprising that evidence for "effectiveness" of RCR education is mixed at best (Kalichman & Friedman 1991; Anderson et al. 2007; Antes et al. 2009).
What content should we expect to be covered in an RCR curriculum? At various times, the NIH or Public Health Service (PHS) has prescribed lists of topics (NIH 1989; PHS 2000; NIH 2009), many of which have been incorporated into RCR curricula (Mastroianni & Kahn 1999; Heitman and Bulger 2005; Steneck and Bulger 2007). However, merely having a list of topics such as authorship, collaboration, or data management begs the question: What exactly should be taught about those topics? In practice, that decision is left to the individual instructor.
For some topics, such as human subjects, animal subjects, and conflicts of interest, much of the curriculum is typically defined by historical anecdotes, existing guidelines, and regulations. However for most of the recommended topics, regulations are rarely relevant if they exist at all, and guidelines are typically unwritten. The absence of regulatory guidance is not necessarily accidental. Many questions, such as "who should be the first author on a manuscript?" can reasonably have more than one answer. Different researchers may take very different approaches. This doesn't mean choices have been made between right and wrong (e.g., someone choosing to commit research misconduct), but it does leave unanswered the question about what should be taught about these topics. Are there clear, widely accepted standards?
To provide a baseline for discussing responsible practices, we designed a study in which faculty were queried about a wide range of standards and practices in each of three domains: authorship, collaboration, and data management. In addition, we asked these faculty for their perceptions of how researchers learn those standards. In choosing this comprehensive approach, it was understood that a large number of questions would decrease the likelihood of a high response rate among the faculty respondents, but would facilitate a broad, first look at the extent of agreement about standards of conduct.
The proposed survey study was reviewed and approved by the UC San Diego Institutional Review Board (Protocol #101447SX). The final survey consisted of 132 distinct questions divided among 5 sections: authorship, collaboration, data management, teaching and learning, and demographics (Appendix). The domains of authorship, collaboration, and data management were selected as the three commonly taught topics that are most central to the practice of research, but which are also not so intertwined with regulations, as is the case for animal subjects, human subjects, and conflicts of interest. Four disciplines were chosen for study based in part on the areas of expertise of some of the individuals recruited for the expert panel, but primarily to represent different types of biomedical research. Microbiology and Neurosciences are examples of disciplines with professional societies that are among the largest in the world, and largely defined by what is commonly considered to be "bench research." Nursing was selected as a discipline that would be most likely to have a clinical dimension. Psychology was of interest because so much of the discipline is likely to fit into the category of social and behavioral research. Despite this broad outline of differences among the four selected disciplines, it was recognized that there are many instances of overlap in research questions, methods, and outcomes. For example, psychology certainly has a prominent clinical component, and clinical research is a part of both microbiology and neuroscience.
We assessed community consensus (agreement) with a series of X2 analyses to determine if response category frequencies significantly differed from one another. Because the goal for this initial analysis was to assess the possibility of consensus, we simplified the 5-point Likert items to a 3-point (Disagree-Neither-Agree) scale. Non-significant results indicated varying views on standards and therefore a lack of consensus. Significant differences, however, indicated only the possibility of consensus, in that significance meant only that not all response categories had similar frequencies. This was not necessarily synonymous with a finding that one response category contained the vast majority of responses. For instance, response category frequencies of 40% disagree, 14% neither, and 46% agree would be statistically different (p <0.001) but not an indication of clear community consensus. Therefore, questions with significant differences were further examined to determine to what degree a single response category was selected by respondents. Moderate consensus was defined as 70% or more of responses within a single category, and high consensus was defined as 90% or more of responses within a single category.
With 1,396 responses, the overall response rate was just over 21%.
Respondents were representative of a range of faculty positions including assistant, associate, and full professor (N=351, 394, and 596, respectively). This group was quite experienced based on self-reported medians of 10 years as Principal Investigators, 15 years as Faculty members, 35 published papers, 20 first or senior author papers, and responsibility for mentoring 2-8 undergraduate, graduate, or postdoctoral trainees. Nearly all identified themselves as having had little or no significant research training outside the U.S (1,358 reported receiving training in North America, followed by 161 in Europe, 49 in Asia, 13 in Central/South America, and 9 in Africa). Respondents predominantly self-identified as white (N=1275) and not Hispanic or Latino, and the majority were female (N= 728).
Overall statistical significance and consensus are summarized in Table 1. Response frequencies for all statements were significantly different (p<0.05), and in fact 98% were highly statistically significant (p<0.001). However, if the definition of community consensus required clustering of >70% of the responses (either in agreement or disagreement), then community consensus was found for only 41% of all questions. And if the bar was set as high as 90%, then community consensus dropped to just 16% of questions. Taken together, it is possible to suggest a common opinion for only a handful of questions, summarized in Table 2.
The statements for which consensus was greatest (i.e., >90%, Table 2), and the three statements for each topic resulting in the lowest levels of consensus (i.e., the greatest degree of disagreement among respondents) are summarized in Table 3. While no common theme ties together those statements eliciting high levels of agreement, it is noteworthy that 93-98% of respondents endorsed mentoring and personal experience as the ways in which standards are learned. Conversely, the roles for institutional guidelines, requirements, and formal training in teaching standards resulted in some of the lowest levels of consensus for any questions.
Advances in science depend on research, and fostering the integrity of that research is the basis for calls for an increased focus on responsible conduct of research (RCR) from the Institute of Medicine (IOM 1989), the Association of American Medical Colleges (AAMC 1982; AAMC 2006; AAMC 2008), and the National Institutes of Health (NIH 1989; PHS 2000; NIH 2009). However, stating the need to teach RCR is not the same as being clear about what should be taught. Assuming that the integrity of research depends on more than just following regulations about the use of animal and human subjects, it is important to be clear about just what can be said about some of the most fundamental aspects of the conduct of research: how credit is allocated (authorship), how researchers work with one another (collaboration), and how research records are created and maintained (data management). Gaining additional insight into these questions was the rationale for conducting this study.
The focus of this project was to assess possible consensus among faculty researchers working in each of four different disciplines. However it is important to underline that the goal here was not to define the "right" answer. First, it is possible that a high percentage, or even most faculty, might share a perspective on a standard of conduct that is arguably a wrong view. However, second, and more importantly, the nature of the questions being addressed is uniformly not about issues for which one can assume a priori that there is a "right" answer. This approach is to be contrasted with work of others seeking consensus, for example, among teachers of research ethics who might have a view about what is most important to be taught (DuBois & Dueker 2009).
In this project, community consensus was most strictly defined by looking for substantial (>90%) agreement among all respondents. By this standard, an argument for consensus could only be made for 16% of the questions asked. As is clear from Table 2, these areas of agreement were restricted to relatively few statements about order of authorship, criteria for authorship credit, criteria for successful collaborations, and a few select issues about data management. On the other hand, substantial agreement was found for 7 (more than a quarter) of the statements about roles of teaching and learning of standards of conduct. Even when the standard was dropped to require just 70% agreement, this occurred for only 41% of all questions. In the latter case, this means consensus was absent for a significant majority of questions, and even where consensus occurred, it still typically left approximately one-third of respondents in disagreement.
This study was not designed to determine why standards might vary, but it is possible to speculate based on anecdotal experience of the authors and other teachers of research ethics. In the classroom, we routinely find differences in experience and standards in discussions involving students and postdocs from diverse disciplines and research groups. These differences are rarely due to clear ethical or scientific failures (e.g., the willingness to falsify the research record); it is more often the case that conventions simply vary (e.g., what are the criteria for authorship?). The data from this project are consistent with these anecdotal findings.
Because responses about standards of practice varied widely for nearly all questions, it is worth considering the possibility that some of the resulting answers were inconsistent with existing guidelines. However, because the topics of authorship, collaboration, and data management were explicitly selected as areas in which clear guidelines were less likely to be found, relatively little can be said with confidence. The one prominent exception is for authorship criteria, which are addressed in detail in guidelines of the International Committee of Medical Journal Editors (ICMJE, 2013). Given that all four disciplines surveyed were biomedical, these guidelines are likely relevant to most if not all survey participants. However, one caution is important before discussing the extent of agreement between survey respondents and the ICMJE guidelines. Based on anecdotal experience of the authors, faculty are rarely aware of guidelines or regulations governing their research except in the most general terms (e.g., some know of the ICMJE guidelines, but almost none could reliably summarize those guidelines). This anecdotal impression was soundly verified by the focus groups and interviews in which it was rare that the faculty participants were aware of any guidelines other than in a vague sense that some sort of standards are laid out when they publish. Few knew of the ICMJE guidelines for authorship.
Whether or not faculty are aware of the ICMJE guidelines, to what extent did respondents offer perceptions inconsistent with those guidelines? At least 21 of the questions about authorship queried issues that are arguably covered under the ICMJE guidelines. The percent of respondents agreeing or strongly agreeing with each of those statements is summarized in Table 4. In answer to questions about perceptions of common practice, respondents agreed at rates of 41-55% (median=46%) with statements suggesting that authorship credit might be allocated for criteria arguably inconsistent with the ICMJE guidelines. When asked about whether these perceived practices should be acceptable, agreement rates ranged from 22-65% (median=24%). Finally, 11-91% (median=34%) of respondents registered opinions in agreement with statements about allocation of authorship for criteria that would be insufficient under ICMJE guidelines, Clearly a high percentage of respondents espoused views inconsistent with ICMJE guidelines. A simple conclusion might be that the respondents were staking out unethical positions. However a case also be made that such a conclusion is premature: Not only because many (most?) of these people aren't even aware of the content of the ICMJE guidelines, but because there may in fact be defensible historical, social, and cultural reasons for different approaches in different disciplines or research groups. To answer this question will require further study to investigate the rationales for choosing particular framings for what it means to be an author.
One other authorship item deserves comment. The statement "In my opinion, each person listed as an author should be capable of taking public responsibility for the project (explaining what was done, why it was done, and what it might mean)" resulted in a very high rate of agreement (86%). This is noteworthy because earlier this year, the ICMJE (2013) added a new element to criteria for authorship that is similar to this view: "Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved."
At first glance, the typical finding of non-consensus on such a high percentage of questions might be seen as problematic for teaching of responsible conduct of research. If there are no accepted standards, then how can standards be taught? However there is another, and perhaps more useful, interpretation of these findings. The fact that standards vary widely is in itself an important lesson to be taught. With research becoming increasingly collaborative, the chance for misunderstandings and disputes only increases if the diversity of approaches is not recognized and explicitly discussed by collaborators. By this argument, these data provide a teaching opportunity to illustrate, recognize, and embrace diversity and even ambiguity rather than uniformity. It is possible that the resulting discussions will reveal that some of those diverse answers are significant violations of robust ethical principles, but it is more likely that differences can be attributed to complex questions that simply have more than one right answer.
Finding diverse answers to so many questions raises the specter that those answers are diverse because a substantial percentage of respondents have adopted approaches that are simply wrong and clearly unethical. While that is possible, it is our impression, particularly from the focus groups and interviews leading up to the survey, that these are not typically questions for which there is a priori a right and wrong answer. The standards for defining authorship, working with collaborators, and handling data are not self-evident, nor is there necessarily one right choice about what those standards should be. It is likely that different research disciplines, research groups, or individuals may have developed standards that are simply different from those adopted by others. While there is always a risk in assuming that the way things are is the way things should be, it may be equally problematic to conclude that finding different standards means that some people are ethical, and others are not.
Differences in standards might be taken as evidence of having more than one "right" answer. Historical reasons based on culture, the nature of a research discipline, or the ways in which individual researchers operate that may have resulted in those different approaches (e.g., if a journal lists no more than 4 authors of multi-author publications, then the practice might develop that the head of the research group lists her or his name as second rather than last author on a manuscript with 5 or more authors). However, the possibility of more than one right answer does not mean that all answers are equal. One independently developed approach might be better and more sustainable than others. Such a possibility would be best addressed by having people talk to one another about their assumptions, their standards, and their expectations. And those conversations ideally need to be not just within a given research group, and not just among research groups in the same discipline, but across diverse research disciplines. This is arguably one of the greatest potential benefits of multi-disciplinary programs for research ethics education.
One other aspect of this study, the response rate, deserves consideration. Although a response rate of 21% for faculty to answer a survey of over 130 questions is in some senses remarkable, it is nonetheless clear that nearly 80% did not respond. Interestingly, despite the reasonable presumption that higher response rates might be better, an argument has been made that while non-response bias is plausible, it is not typically found. Eliciting higher response rates at best has been reported to add little additional accuracy (Groves 2006; Groves & Peytcheva 2008) and has sometimes been found to result in even less accuracy (Visser et al. 1996; Keeter et al. 2000; Keeter et al. 2006).
Nonetheless, it remains worth considering that faculty who view some of these standards as highly acceptable, or highly unacceptable, were less likely to respond. Whether or not such non-response bias can be completely discounted, it is noteworthy that the findings are not consistent with a failure to sample any particular viewpoints (i.e., nearly all Likert questions resulted in substantial percentages of responses both in the agree and in the disagree categories). However even if some perspectives were undersampled or oversampled, the strength of these results is that a sufficient number of responses were received to show that there is a high degree of variation within the research community for fundamental questions about standards of research conduct. Finally, it should be noted that the percentage of non-respondents were not proportionately different across disciplines (microbiology, neuroscience, nursing, and psychology), institution type (public, private), or program size (smallest to largest). That said, while this survey may be a useful benchmark to highlight the diversity of responses it should not be taken as a definitive description of precise rates of differing views.
Assuming that research misconduct occurs where standards are neglected, it is noteworthy that some of the central topics of responsible conduct of research courses (i.e., authorship, collaboration, and data management) do not lend themselves to simple stories of commonly accepted standards. While it is possible that the standards of some respondents to this survey were in some sense "wrong," it is certainly also plausible, and perhaps more so, that questions about how best to handle authorship, collaboration, and data management depend on institutional culture, disciplinary differences, and group or individual experience. Under these circumstances, it seems there is much to be gained by creating opportunities for meaningful, community-wide discussions and reflection on the topic of RCR. This is of course the purpose of research ethics education programs.
Many individuals provided invaluable perspectives at each stage of this study, but the authors particularly want to thank the following for their expertise and guidance: Daniel Cabrera (Northern Illinois University), Paul Friedman (UC San Diego), Elizabeth Heitman (Vanderbilt University), Francis Macrina (Virginia Commonwealth University), Joan Sieber (California State University East Bay), Connie Ulrich (University of Pennsylvania), David Urban (Virginia Commonwealth University), and Daniel Vasgird (West Virginia University). The authors also particularly thank Paul Friedman for his thoughtful and useful editing of the manuscript, and thank Tiffany Lagare and Kelli Wing for their assistance in collecting names and e-mail addresses for faculty surveyed in this study. This project was supported by NIH NR009962, UL1RR031980, and UL1TR000100.