|Home | About | Journals | Submit | Contact Us | Français|
The amount of biological data is increasing rapidly, and will continue to increase as new rapid technologies are developed. Professionals in every area of bioscience will have data management needs that require publicly available bioinformatics resources. Not all scientists desire a formal bioinformatics education but would benefit from more informal educational sources of learning. Effective bioinformatics education formats will address a broad range of scientific needs, will be aimed at a variety of user skill levels, and will be delivered in a number of different formats to address different learning styles. Informal sources of bioinformatics education that are effective are available, and will be explored in this review.
Bioinformatics has been defined as ‘…a field that solves problems in the biological sciences using computer science concepts and methods’ . A similarly broad definition of bioinformatics comes from an interview with Francis Collins in which he describes computational biologists as ‘…jointly trained in understanding biology in all of its complexities, but they’re also very capable at computational analysis of huge data sets….’ [2, 3] In many ways these are excellent descriptions of bioinformatics today, because of their inclusivity. Computer science, software development, statistics, data management and a wide-range of biological sciences—including computational, experimental, systems and theoretical biology—are all contained in these definitions.
Interest in bioinformatics education followed quickly behind the establishment of bioinformatics as a field of study. If one searches PubMed for the term ‘bioinformatics’ over 70000 articles are retrieved . The earliest of these are from the mid-1980s but as discussions, and then expectations, about the completion of the Human Genome Project increased, so too did the development of bioinformatics as a field [5–7]. The initial formats for bioinformatics education were often partnerships between forward thinking individuals—molecular biologists (able to write programming code) collaborating with computer scientists to determine best practices; a biology professor and an excited undergraduate student with experience in programming; a computer scientist with an interest in biology and a biologist with an aptitude for computers [6; J.M.Williams et al., personal experience]. By the mid 1990s formal bioinformatics classes were being designed and implemented at many universities and colleges. Today bioinformatics curricula are available from universities and colleges, many community colleges and even forward-thinking high schools, either online or on site [8–11]. Therefore to obtain a degree and credentials in the field of bioinformatics, one only needs to search for, select, and enroll in a program that best suits one’s personal goals.
We will not be focusing on such ‘formal’ bioinformatics education in this review. Instead we will focus on less formal sources of bioinformatics education, which are available to biomedical scientists interested in managing and analyzing any manner of biological data. The review will be presented from the perspective of OpenHelix, a company focused on assisting a wide-range of interested persons to better utilize public bioinformatics resources. Through years of experience, the members of OpenHelix have gained a unique understanding for the needs of bench biologists as well as the value of bioinformatics databases, tools and other resources. We appreciate that in the life cycle of biologists, the need for up-to-date training spans the student, the scholar, the staff, the faculty and the administrators of programs. Lifelong learning is crucial for career development, and staff that are new to projects require training continuously. Ongoing learning opportunities outside of formal settings are required. Our efforts allow us to interact with these resources, their biocurators, and others in the arena of informal bioinformatics education. In this review we hope to share our knowledge of a variety of these informal bioinformatics educational opportunities so that a broad range of bioscientists may utilize these resources more easily.
In its infancy, the field of Bioinformatics was largely synonymous with the need to generate computer code in order to analyze biological information [12, 13]. As a result, bioinformatics education implied a mechanism by which one could learn computer languages and the principles or best practices for creating code that would generate statistically significant, rigorous analyses of biological data. This need to generate code has the ability to alienate some biologists who do not envision themselves as being ‘computer programmers’. At live trainings and conferences bench scientists still express hesitancy to venture into the realm of bioinformatics to members of OpenHelix, and they tell us that the level of bioinformatics and programming support varies enormously at their different institutions. As the field of bioinformatics has developed and matured, databases, tools and other bioinformatics resources have become available to scientists that allow data to be manipulated through the use of these resources without code being generated by individual users. Researchers in scientific disciplines as diverse as population genetics, epidemiology, disease research, structural biology and environmental biology are finding that they need an understanding of basic bioinformatics techniques and specific bioinformatics applications in order to analyze the vast amounts of data being generated in their field. As new technologies enable the generation of larger amounts of data, it is likely all scientists will need to utilize bioinformatics in some form or another. It will not be enough to be current in the literature in one’s field because ‘big data’ research projects already overwhelm the traditional publication methods [14–16]. Only a fraction of the data from research consortia can be touched upon in their publications. Access to the rest of the data requires an understanding of the software resources, features and queries to make the data useful to biomedical researchers. For many bioscientists their bioinformatics requirements have shifted from a need to generate code to a need to understand the availability and functionality of public bioinformatics software, including databases, tools, algorithms and other resources, in order to mine, extract and effectively use the data within the resources [17–19] (Figure 1).
A new, less formal type of bioinformatics education is being developed for this type of need. It can be termed ‘outreach’, which is defined as ‘…the act of extending services, benefits, etc., to a wider section of the population, as in community work…’ . An important mandate of bioinformatics education has become to extend the services and benefits from the field of bioinformatics to a broader audience of bioscientists who predominantly are not making full use of the available tools [21, 22; J.M.Williams et al., personal communications].
To accomplish its outreach mandate, bioinformatics education needs to do a minimum of four things:
Ideally bioinformatics education will also be able to channel user feedback and suggestions back to the resource developers. This will ultimately improve existent resources by guiding the development of new features in existing tools as well as guiding the creation of new tools that possess features required by biomedical researchers. In this article we will discuss each of these aspects of informal bioinformatics education and review practical sources of each.
As discussed in the ‘Introduction’, there are thousands of publicly available bioinformatics resources, with additional resources being developed continuously. These resources include algorithms for aligning sequences or finding promoter motifs, comparative genomics browsers, pedigree analysis tools, animal colony management resources, databases containing genomic variations important to human disease, protein structure modeling prediction resources, and more. The data analysis needs of researchers today have changed drastically in just a few decades. Funding agencies are placing increasing emphasis on cross-disciplinary approaches to solving today’s most pressing research questions both in the field of medicine and basic research areas. As translational medicine continues to bridge the divide between basic research and clinical medicine, physicians and other health care providers will need quick sources for learning about genomics and bioinformatics resources. Scientists need to maintain an awareness of what is happening in a broad range of disciplines, including which ready-made tools and resources are at the disposal of the projects that they are involved in. Librarians and other support staff have told members of OpenHelix that one of their new functions is that they must be aware of a broad range of applications and sources in order to provide their users with the best assistance in their research [23–25; J.M.Williams et al., personal communications]. Professors, physicians and principle investigators who traditionally didn’t consider that they had bioinformatics needs suddenly are realizing that they do—to analyze the latest genomic information in their field, in order to be able to advise and direct their lab members, to answer their patients’ questions, to teach the bioscience courses that they are involved in, and to stay current with their graduate students.
There are now lists of available resources and entire journals devoted to making bioinformatics software available to anyone, whether they are a researcher, a student or an interested person from the general public [22, 26–29]. These lists and journals are an invaluable source of information. The journals, such as Bioinformatics, Briefings in Bioinformatics, Current Protocols, Database, Nucleic Acids Research, Nature Methods, as well as others, provide a venue by which resource developers and providers can publicize their efforts (Table 1).
The lists of databases, including the Online Bioinformatics Resources Collection (OBRC) at the Health Sciences Library System at the University of Pittsburgh, the annual Nucleic Acids Research online Database Collection, the Bioinformatics Links Directory, the Japanese National BioResource Project, the OpenHelix resource database, as well as others, provide users with collections of many available resources (Table 2).
The journals provide individual articles on specific resources, written by the resource provider, or review articles which may describe or compare a large number of resources within an area of science. The lists often organize resources by a set of disciplines or other categories, and usually provide at least a brief description of each resource.
Annually the journal Nucleic Acids Research (NAR) provides a unique combination of journal articles and a resource list. Each January there is an issue devoted to articles describing resources that the editors feel have high utility and interest to biomedical scientists . At the same time NAR releases its updated list of resources. NAR also publishes an annual July issue dedicated to web-based software resources . Both resource providers and users find these annual services valuable bioinformatics educational sources, as evidence by the number of articles submitted to the journal as well as the number of literature citations, downloads and web links from outside sources to the NAR resource issues [22, 30].
Another source for finding a broad range of resources is from large resource providers themselves. A few examples are the National Center for Biotechnology Information (NCBI) in the United States, the European Bioinformatics Institute (EBI) in Europe and the National Institute of Genetics in Japan [31–33]. Each of these large institutions is a national resource dedicated to creating and maintaining new information technologies that promote research and understanding of all areas of bioscience, including genetics, genomics and personalized medicine. Towards this goal they research data storage mechanisms for all types of information and create new bioinformatics resources for the handling of that data. Thus these resources often will have applications designed for a wide-range of subject areas.
Being aware of the vast array of bioinformatics resources available is not synonymous with knowing which of these tools is applicable to a specific research need. Researchers must be able to judge both the scope and functionality of potential resources in order to evaluate their applicability for specific research needs. For example, there are many databases developed to provide sequence variation information. Each provides unique, though possibly overlapping, data content and formats, analysis tools and overall features. One resource may provide desired data in a format easily incorporated into existing workflows while an alternative resource might provide the desired information, but in a format that mandates further manipulation, or additional analysis steps to be added to the existing project workflow [34–41]. Additionally some of the thousands of resources were developed years ago and have grown and progressed since then. Others have been unsupported since their creators have moved on to other projects. Some resources were developed for old technologies, others for technologies during their infancy, before standards had been created.
The many different responsibilities and demands on researchers leave little time for in depth searches for resources. Filtering lists of resources or browsing through journal indexes to find resources with a specific applicability can be a slow, labor intensive process. Some surveys have indicated that many researchers use Google to search for resources that they assume exist, with methods that are not especially efficient or effective . Although Google search results can indicate the popularity of a web site, they do not provide any other significant means of evaluating the site—searchers must open the site and explore it, to evaluate it. Tools allowing such resource evaluations to occur more easily and quickly likely would assist scientists to be more efficient in their research.
Many of the ‘awareness’ resources mentioned above also have search mechanisms associated with them. The content of any journal is searchable, as is the content of most resource lists. However, journal searches are aimed at finding content in the journal, rather than finding a tool applicable to a specific research project. Resource lists aim to be unbiased and comprehensive and often associate keywords with resource listings to help searchability. However, lists are hard to maintain—even when new content is added, old entries are not always updated or deleted over time. Additionally lists do not always provide the user with enough information about a resource to fully evaluate it. Each of the national resource providers above provides not only bioinformatics resources, but also search interfaces for locating the proper resource. NCBI’s Entrez system is an elegant single search interface that allows researchers to perform either simple or advanced searches, and to receive hits in NCBI’s many databases, tools and educational resources. However search mechanisms offered by research providers tend to provide access to a limited set of resources that it is associated with.
To address the need for a better mechanism for finding relevant resources to meet individual research needs, the National Human Genome Research Institute (NHGRI) at the National Institutes of Health (NIH) funded the development of a bioinformatics and genomics search portal by OpenHelix, LLC [29, 43, 44]. This free public search portal allows users to search hundreds of resource pages, tutorial scripts and bioinformatics blog posts with a single query (Figure 2).
Search results are organized by information source—resource page, tutorial script or blog post—and each hit is displayed in the context of the sentence it was found in. Searching hundreds of specific web pages within the bioinformatics resource provides the user with a view of what is contained within the resource. The tutorial scripts that are queried are the full scripts for hour-long introductory tutorials in which background information and terminology important to using a resource is explained, as is the use of all main functions of the resource. Searching the contents of hundreds of bioinformatics blog posts written by OpenHelix—including usage tips, new resource announcements and more—provides further resource information to searchers. Seeing search results in the context of the sentences where they were found, in each of these information sources, provides the user with extensive information with which to evaluate bioinformatics resource applicability and utility. This search portal was specifically designed to guide researchers to the best public resource for their research needs. Its development was funded by a phase II SBIR grant, and is freely available for use .
Another advantage of online search portals such as those provided by OpenHelix, the NAR database, OBRC, etc., is that researchers world-wide can utilize them.
Investigators at universities, corporations, or other large institutions may have the additional sources of guidance through their institution. It is advisable for researchers to investigate the availability of institution-specific resources, such as digital and other specialized librarians, bioinformatics or designated service departments. Often local support staff are excellent sources of information because they know what software is locally available, what their researchers need, and may be able to provide institutional access to external support, such as that offered by OpenHelix [45–51].
With regards to resource searches, it is important to note that the more granular a researcher’s idea is of their research needs, the more effectively these education sources can offer guidance. Often many excellent, partially overlapping resources exist. All might be generally applicable to the same research need, but each resource has its own specific specialties and features. It is therefore good to have at least some detailed concept of the types of information, operations or displays that are desirable. This information can then be used by which ever bioinformatics educator is chosen, in order to better select the best tool for the needs at hand.
In order for tools to be used, they must be identified, evaluated and determined to have the required utility by a potential user. Once this occurs, a user must then learn how to utilize the chosen system in order to accomplish the desired research tasks. That learning process must not be so costly that it out weighs the value of what the resource provides. If it takes hours of trial and error, or frustrating attempts to understand documentation that doesn’t directly reflect protocols of use, even the best resource will gain but a few users. After a researcher finds, and initially learns a resource, they must then either use the resource enough to not forget how to use it, or relearn it again with each use. Additionally resources rarely stay the same, with updates occurring on a regular basis, and new features and implementations being added. All of these changes must be understood and learned for a user to continue being able to utilize a resource.
Bioinformatics education materials must therefore be both available to a wide audience of users, and be engaging, clear, accurate and up-to-date in order to quickly and effectively teach potential database users how to obtain the information that they desire from a particular resource. These materials also need to bridge the theoretical framework for the tool with practical applications of its use.
Funding for the development of bioinformatics resources regularly includes mandates to the developers to document resource usage. To enable this usage it is common for developers to provide some level of documentation with the tool. However, the quality, format and currency of the documentation can vary greatly across different resources [52–56]. Even with the best documentation it can take time to gain an understanding of how information is being presented [J.M.Williams et al., personal experience and communications]. Expert users or resource developers who write documentation may overlook rudimentary usage details that new users require. Additionally new users often send usage questions to resource help desks, rather than obtaining answers to these basic questions from the provided documentation. This may be because of users’ unwillingness to read documentation, poor or lacking documentation, or other reasons, but the frequency and repetitive nature of these basic questions can consume valuable resources, such as staff hours, and can also be a source of frustration to staff [57–59].
Many resource providers have adopted short screencasts to demonstrate single actions to users, some of which are housed locally at a resource, and some are stored in centralized warehouses, such as SciVee [60, 61]. These can be effective, especially to inform experienced users of newly developed functions of the tool, but are not always in depth enough to provide a full understanding of the resource to new users. Another format for providing users with quick instructions, or reminders of resource functions are one page fact sheets, or quick reference cards designed to be kept near one’s computer and referred to during a user session at a resource [62–65].
Some resources are able to offer training workshops to their users, and these often obtain full registration and excellent attendance. However, these workshops are only able to serve those researchers who are able to travel to, and attend the workshop, which limits the total number of users that can be served [66–68]. Large resource providers such as EBI and NCBI have significant outreach efforts to teach researchers how to use their tools. For smaller resources user education is often part of the mandate for the curators who populate the database with information. Some resources choose to contract out some of their training and outreach functions to professional training providers, such as OpenHelix, who assure the quality and freshness of their training [69–72].
Many of the training caveats mentioned above, such as travel and size limitations, can be addressed through eLearning mechanisms. OpenHelix received Phase I SBIR funding to conduct research on effective eLearning mechanisms. The study evaluated the effectiveness of training biomedical scientists on the use of three publicly available genome browsers using three different training methods; on-site live lecture-type tutorials, live online web conference tutorials and pre-recorded self-directed online tutorials. Effectiveness was evaluated through recipient surveys administered prior, immediately after and two or more months after the training. The majority of the recipients selected either the live on-site seminars or online training formats with few selecting software training via web conferences. According to the trainee self-surveys, live on-site seminars and online training formats are both popular and effective mechanisms of learning how to use public genome browsers. OpenHelix has continued to develop its online training collection, which now contains training suites on over 90 different bioscience software resources.
Many other organizations also provide eLearning resources. A notable example of such a resource is the MIT OpenCourseWare effort, which publishes course content for virtually all course content of the Massachusetts Institute of Technology, and was initiated in 2001 . Other such efforts are provided by many other organizations such as the OpenCourseWare Consortium, and as demonstrated with a web search for the phrase ‘bioinformatics elearning’ [74, 75].
Foundations, funding agencies, universities and other large institutions often provide bioinformatics education resources for their members, as mentioned previously. Not only can researchers be guided to the proper tools, but they can also be taught their basic use as well. Even at the largest of institutions, the resources and cost required to create and update training on a large range of tools can be prohibitive. An alternative to the internal creation of training materials is to subscribe to a training service provider. These training centers provide up-to-date materials on a wide variety of resources, all in a standard, easy-to-absorb, online format which makes learning multiple resources quick and easy. In fact, such subscriptions are how some institutional education sources provide the widest range of training to their customers. In the experience of OpenHelix, a further advantage of receiving training from a service is that researchers often feel more comfortable asking questions to trainers, rather than someone that they feel more closely connected to as a colleague. Users might also be more likely to provide suggestions to trainers who are independent of the resource because they hesitate to appear too critical to the resource providers themselves [J.M.Williams et al., personal communication, personal experience].
Theoretical use of a tool is often different than practical use, and the needs of a beginning or basic user are different than those of experienced, or power users. Once a resource customer becomes comfortable with the basic use of a resource, they often desire tips for using the tool more cleverly, or for more sophisticated searches. At this point in use basic listings or descriptive journal articles may not meet their needs. This type of user may also not benefit from deep theoretical explanations on the resource algorithms, but instead needs clear answers to specific questions. Training exercises are designed to work, and demonstrate the utility of the resource, but may not always reflect real-world usage situations. Personal data is not always formatted correctly for easy use, or even though a researcher can acquire most of the data that they need from a resource, there may be one piece that they are still missing. Perhaps the researcher has been successfully using the resource for years but they are now in need of a new function or analyses.
Many times the educators we have already mentioned (librarians, bioinformatics staff, educators from third-party training providers) will be knowledgeable enough about a resource that they will be able to answer a large number of questions from advanced resource users. Resource providers are themselves often the educators that power users rely on, and many resources offer discussion mailing lists for the user community. Alternatives to educators and resource providers are user groups in which users network to collaboratively solve one-another’s use issues. A newer strategy of resource that assists the advanced use of bioinformatics tools are workflow applications such as Galaxy and Taverna, which incorporate more social-networking style aspects of users helping each other [69, 76]. Such resources allow users to retrieve data from one database, reformat it or other-wise alter it, and then either analyze it within the workflow software or to import the newly formatted data into additional resources for further analysis. Re-use of analysis pipelines for the frequent users’ tasks is very helpful, or construction of these for less-adept users can benefit service providers too. Users of workflow applications do need a basic understanding of how the underlying resources function, however, in order to make full, effective use of the networking and pipeline flows that these types of resources offer.
As Francis Collins recently discussed, it is crucial to have biologists who are ‘very capable at computational analysis of huge data sets.’ Collins indicated that they are in a special place today, and ‘They’re going to be the breakthrough artists’ of the future [2, 3]. In this review we have described many examples of the need for informal bioinformatics education that can raise the computational competency of biologists, and we have described a variety of sources from which researchers can obtain this training. Bioinformatics education is needed to raise awareness of the available resources, enable researchers to select the best resource(s) for their needs, lower the barrier between the awareness and the use of a resource, and to support the continuing educational needs of regular resource users. Outlets for such education include training directly from resource providers in the format of documentation, screencasts, seminars and publications. It also includes sources not associated directly with a resource, but instead associated with an institution, professional training provider or user group. Effective education should meet the needs of a variety of users and their learning styles—beginner through advanced—and can be provided via a variety of formats, including journal articles, live seminars and workshops, online tutorial movies and other eLearning venues, and more. These many methods, which are outside of the box of formal coursework, are required for the ongoing bioinformatics educational needs of bench scientist today.
In the opinion of OpenHelix this will require both formal and informal educational strategies to come together to bridge the big data world in which we find ourselves, and the traditional domain expertise of specialists in bench work to get the most out of the information . Ideally each of the educational resources that we have mentioned in this review will collaborate for the benefit of the end user, and ultimately the advancement of science. Resource providers who find it too difficult or time consuming to be involved deeply in the bioinformatics education of their users will rely on educational specialists such as training services, librarians and other departmental trainers. Educators with hands-on contact with users will collect user comments and requests and provide them to resource developers. It would benefit both the resource providers and the resource users to have more flow of communication between the two groups. Providers will have access to great ideas for software improvements that can be taken to grant agencies, and users will have tools that more effectively meet their access and analysis needs.
This work was supported by National Institutes of Health [SBIR Phase I 1R43GM7315-01 and Phase II 5R44HG004531]. Additionally OpenHelix's outreach efforts on some public resources are funded by sponsorships from the resources, including the UCSC Genome Browser, the VISTA resources at Genomics Division of Lawrence Berkeley National Laboratory, the Galaxy resource at the Center for Comparative Genomics and Bioinformatics at Pennsylvania State University, the Gramene resource at both Cornell University and Cold Spring Harbor, the PSI-Nature Structural Genomics Knowledgebase (SGKB), the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB-PDB), and the GeneMANIA resource from the Donnelly Centre for Cellular and Biomolecular Research at the University of Toronto.
The authors would like to thank the developers of many public databases for access to their resources and their permission to develop training on them. The authors would also like to thank all of the users of our materials who have provided us with invaluable input and support of our efforts.
Jennifer M. Williams received a BS in Molecular Genetics from the Ohio State University and her PhD from the University of Kentucky in Molecular and Cellular Biology. She worked many years as a curator for commercial and public databases, and now focuses her efforts on promoting the use of public bioscience resources through her position at OpenHelix LLC.
Mary E. Mangan obtained a BS in Microbiology, MA in Plant Cell Biology, and PhD in Cell, Molecular, and Developmental Biology. As a post-doc at The Jackson Laboratory, Mary transitioned to bioinformatics. Stints in the pharmaceutical and biotech software industry made her aware of the extensive need for training on biomedical computational resources, and she became a co-founder of OpenHelix to fill this niche.
Cynthia Perrault-Micale obtained her PhD from Brandeis University in Molecular and Cellular Biology. Most of her career has been focused on studying the properties of muscle proteins using a variety of biophysical and molecular techniques. Currently, she is devoted to helping OpenHelix LLC achieve their goals of providing the most comprehensive and current bioinformatics training materials.
Scott Lathe obtained a bachelors degree in Journalism from the University of Washington, and uses those communication skills in the context of project management and development and implementation of outreach strategies at OpenHelix. Scott is Chief Executive Officer of OpenHelix, and has experience in managing strategic marketing, marketing communications and worldwide distribution channels.
Neeraj Sirohi obtained a Bachelor’s Degree from Kumaon University, India. Neeraj has used her multi-media production skills to edit and produce OpenHelix training materials, and manages the online distribution of tutorial suite materials for OpenHelix.
Warren C. Lathe received a BS in Zoology, BA in Art History and an MS and PhD in Molecular Biology and Evolution. As a post-doc and researcher at the European Molecular Biology Laboratory (EMBL), Warren conducted genomics research into the evolution of bacterial gene order and the effect of genomic variation on protein structure. He has had extensive teaching experience at the University of Rochester, City College of San Francisco and University of Heidelberg. He is currently co-founder and chief scientific officer of OpenHelix.