|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: AD MAW MAH. Performed the experiments: AD MAH. Analyzed the data: AD MAH. Contributed reagents/materials/analysis tools: AD MAW MAH. Wrote the paper: AD MAH.
When reading bioscience journal articles, many researchers focus attention on the figures and their captions. This observation led to the development of the BioText literature search engine , a freely available Web-based application that allows biologists to search over the contents of Open Access Journals, and see figures from the articles displayed directly in the search results. This article presents a qualitative assessment of this system in the form of a usability study with 20 biologist participants using and commenting on the system. 19 out of 20 participants expressed a desire to use a bioscience literature search engine that displays articles' figures alongside the full text search results. 15 out of 20 participants said they would use a caption search and figure display interface either frequently or sometimes, while 4 said rarely and 1 said undecided. 10 out of 20 participants said they would use a tool for searching the text of tables and their captions either frequently or sometimes, while 7 said they would use it rarely if at all, 2 said they would never use it, and 1 was undecided. This study found evidence, supporting results of an earlier study, that bioscience literature search systems such as PubMed should show figures from articles alongside search results. It also found evidence that full text and captions should be searched along with the article title, metadata, and abstract. Finally, for a subset of users and information needs, allowing for explicit search within captions for figures and tables is a useful function, but it is not entirely clear how to cleanly integrate this within a more general literature search interface. Such a facility supports Open Access publishing efforts, as it requires access to full text of documents and the lifting of restrictions in order to show figures in the search interface.
The PubMed system from the National Library of Medicine (http://www.ncbi.nlm.nih.gov/pubmed/) is the primary tool used by biologists to search the literature. PubMed's interface has a number of useful features and popular innovations, most notably its facility for recommending articles related to a given article , , but currently search in PubMed is restricted to the title, abstract, and several kinds of metadata about the document.
On the Web, searching within the full text of documents has been standard for more than a decade, and much progress has been made on how to do this well. However, until recently, full text search of bio-science journal articles was not possible due to constraints on online availability and intellectual property restrictions. Recent developments in the opening up of the content of journal articles (including requirements for open access publishing by national funding agencies in both the U.S. and the U.K.) allow for improvements in the design of such interfaces.
It should be noted that freely accessible articles are not necessarily articles that can be crawled, stored, and indexed by researchers. Only the small subset found in the PubMedCentral Open Access collection of journals provides an unrestricted resource for scientists to experiment with for providing full text search (the license terms for PubMedCentral can be found at: http://www.pubmedcentral.gov/about/openftlist.html).
Full text availability allows for a re-thinking of how search should be done on bioscience journal articles. For instance, many researchers are using full text biology articles for information extraction (text mining), as seen in the BioCreative competition , . The results of text extraction can then be exposed in search interfaces, as done in systems like iHOP  and ChiliBot  (although both of these search only over abstracts).
Another question is how to adjust search ranking algorithms when using full text journal articles. For example, there is evidence that bioscience literature ranking should consider which section of an article the query terms are found in, and assign different weights to different sections for different query types , as seen in the TREC 2006 Genomics Track .
Another way to innovate with full text article search is to specialize the interface to correspond to the particular needs of a particular field or collection. For the last several years, Google Scholar (http://scholar.google.com) has provided search over the full text of journal articles from a wide range of fields, but with no special consideration for the needs of bioscience researchers. Google Scholar's distinguishing characteristic is its ability to show the papers that cite a given article, and rank papers by this citation count. This is an excellent feature for journal article search, and all such systems should use citation count as a metric. Unfortunately, citation count requires access to the entire collection of articles; something that is currently only available to a search system that has entered into contracts with all of the journal publishers.
This article focuses on another way to improve bioscience literature search: provide the user with the ability to search over full text, including figure captions, and display the associated figures alongside search results. This idea is based on the common observation that researchers, when reading bioscience articles, tend to start by looking at the title, abstract, figures, and captions. Figure captions can be especially useful for locating information about experimental results .
Our research group has developed a freely available search interface called BioText (Figure 1 and http://biosearch.berkeley.edu) for searching the literature and showing figures alongside search results . We conducted a pilot study exploring user reaction to searching caption text and incorporating figure display into a bioscience literature search interface . The participants in that study had strong positive reactions to the idea: 7 out of 8 said they would use a search system with this kind of feature if all of the literature were available in the collection. That study also found that participants were interested in searching the full text, not just the captions, and that some were interested in searching table text as well.
The study presented here is a follow-up to the pilot study, using a larger number of participants and incorporating a number of improvements and extensions to the interface, including a facility to search the full text (as opposed to searching only the abstract and caption text). This study was qualitative; 20 participants were asked to use the different views of the interface and response to it in terms of how likely they would be to use it, which aspects they did and did not like, and which missing features would they like to see added.
Several research projects have examined the automated analysis of text from captions. Srihari ,  did early work on linking information between photographs and their captions, to determine, for example, which person's face in a newspaper photograph corresponded to which name in the caption. Shatkay et al.  combined information from images as well as captions to enhance a text categorization algorithm.
Cohen, Murphy, Qian et al. have explored several different aspects of biological text caption analysis , including algorithms for parsing the structure of image captions, and techniques for extracting information relating to subcellular localization by automatically analyzing fluorescence microscope images of cells , .
Liu et al.  collected a set of figures and classified them according to whether or not they depicted schematic representations of protein interactions. They then allowed users to search for a gene name within the figure caption, returning only those figures that fit within the one class (protein interaction schematics) and contained the gene name.
Yu et al.  created a bioscience image taxonomy (consisting of Gel-Image, Graph, Image-of-Thing, Mix, Model, and Table) and used Support Vector Machines to classify the figures, using properties of both the textual captions and the images. Yu and Lee  developed algorithms to link sentences from an abstract to the figure caption content. They also developed and assessed a user interface called BioEx that shows a set of very small image thumbnails beneath each abstract, but the system described did not allow for searching over text corresponding to the figure caption and did not focus how to design a full text and caption search system in general.
More recently, Yu et al.  asked human judges to determine how much of the important information about a figure was present in the figure caption vs. the abstract and full text of the article. They found that having access to caption, title, and abstract alone led to less complete comprehension than having the full text of the article available.
Several approaches have processed the images themselves to extract meaningful information to use in search results. Christiansen et al.  used automatically-computed properties about the content of raster images within an article in order to improve relevance feedback techniques for the literature review task. That work also examined how to associate the caption text with the appropriate image when processing PDF documents. Deserno et al.  examined the potential impact of performing content analysis on the images in the figures in order to improve literature retrieval. In the Yale Image Finder, Xu et al.  developed algorithms to extract text strings from figures and provided a figure searching tool that queries over the extracted figure text as well as caption text. In addition to the figure containing the query term hits, the other articles from the same paper are shown as well as figures from other papers with similar image content.
On the Web, for information-centric queries, it has been shown that richer results listings can be more useful and preferred over short snippets , . However, efforts to use thumbnails of web pages have not been particularly successful at improving search results using standard metrics. A study by Czerwinski et al.  showed that after a brief learning period, blank squares were just as effective for search results as thumbnails, although the sub jective ratings for thumbnails were high. A subsequent study by Dziadosz et al.  found that thumbnails alone were much more error-prone than the other two conditions; also, the number of errors in text alone versus text plus thumbnails was nearly identical. Additionally, showing thumbnails alongside the text made the participants much more likely to assume the document was relevant (whether in fact it was or not). On the other hand, in some studies, the thumbnails may have been too small to be effective. Kaasten et al.  systematically varied the sizes of web page thumbnails shown, and found participants were able to more accurately recognize web pages when larger thumbnails were shown in combination with titles, than with titles alone. When thumbnails were smaller, participants relied on color and layout to recognize the page, and could only make out text at larger image sizes. Kaasten et al.  also found that in their study, 61% of the time, thumbnails were seen as very good or good representations of the underlying web page, and 86% were very good, good, or satisfactory.
One problem with using thumbnails is that they create an image from an entire page, which can end up showing only miniaturized text. By contrast, BioText uses figures extracted from articles. Although some figures from bioscience articles are not particularly distinguishing, in many cases the general information visible in the figures distinguishes the kind of information contained in the article. For instance, the figures associated with articles that are retrieved in response to a query on “lung” range from x-rays to histology images to schematic diagrams to flow charts to line graphs, and this kind of information can be highly indicative of whether or not the article is of interest to the scientist.
The BioText search engine indexes all Open Access articles available at PubMedCentral. To date, this collection contains more than 300 journals, 129,000 articles, 247,000 figures, and 104,000 tables. A script is run on a daily basis that checks the Open Access database for updates and adds new documents to the collection.
The main components of the interface are:
The main innovation of the BioText search interface that distinguishes it from PubMed and other bioscience literature search interfaces is its emphasis on showing the figures associated with the articles within the search results, and showing the full text context in which the query terms fall within the article, including the captions. (The version of the interface presented in  searched only over caption text, and the version in  searched only over abstracts and captions.)
Considerable effort went into the design of the display of the sections, to enhance legibility and reduce impressions of clutter. These design decisions surrounded choices in the relative font sizes, the font types, the heaviness of the border lines in the boxes surrounding each section, and the spacing between the text boxes and the figure displays. For instance, the labels for each text area (e.g., “Full-Text Excerpts”) is shown as small-caps in a more saturated color than the area background. The design was modified and evaluated among the authors over several iterations.
The current search interface uses what are known as search “verticals” for showing different views of the search results. Results from search engine literature suggest that most users do not switch away from the default view into verticals, and the major search engines are moving to “blended” or “universal” search in which hits from different parts of the vertical space are interwoven with one another and act as an entree into the more specific type of search.
For instance, a search on baseball at Google yields standard links to web pages but also a set of links to image results, with a hyperlink labeled “Image results for baseball”. This link moves the user into the image search vertical and raises their awareness of this option that is available from the site . Blended results might be a good way to introduce users to alternative views of search results, especially for the Figure Grid Display view. But for experimenting with the different views explicitly as done in this study, it is useful to retain the differentiation between the views.
Open access journal collections such as Highwire Press and PubMedCentral allow for search over the full text, but the search results listings only show where the hits fall within the title and abstract. PubMedCentral furthermore provide a facility, under its “Limits” option, to choose “search over figure/table caption” from a long list of options. However, the search results display does not show the caption text, nor do they show the corresponding figures or tables.
As discussed above, Google Scholar searches the full text of documents but only displays a very brief snippet of content for the search hit. BioText emphasizes the display of a richer document surrogate, thus facilitating the assessment of the relevance of a document directly within the retrieval results. This richer information is not necessary for all search tasks (for example, when searching for the home page of a website, a shorter surrogate is better), but has been shown to be useful for information-intensive tasks . Thus for a researcher who is scanning the contents of a journal to see what's new, the richer information may not be helpful, but for a researcher who wants to understand how an experimental method is used, this extra information can be quite useful, as seen in the results below.
The Lucene search engine (http://lucene.apache.org) is used to index, retrieve, and rank the documents, using Lucene's default statistical ranking functionality.
The articles are downloaded in XML format, and a script is run to separate out the different parts of the document. The figures are stored locally, and at different scales, in order to be able to present thumbnails quickly. The tables are also stored locally. Lucene's indexing facility is used to index several fields separately, including title, authors, abstract, caption text, full text, and table text. Lucene allows the user to assign different weights to these fields for use in the statistical ranking algorithm. For the Full text index (used in the Full Text Image Display View), the text in the title and abstract is assigned higher weight than the text in the rest of the article. This means that articles in which the query terms appear in the title and abstract are ranked higher than those in which the query terms appear only in the full text. Specifically, the relative weights for the Full Text with Figure Display index are:
These weights were arrived at by trial and error; users disliked a ranking in which articles without hits in the title and abstract were ranked higher than those with such hits. Ideally, a weighting would learned via machine learning over a large dataset of user behavior, as is currently done for commercial web search engines . (Comparing different section weighting algorithms would be a topic for a separate paper.)
For the Caption Text indexes (both Vertical List view and Grid view), all fields are weighted equally:
For the Table Text and Caption index, the weight on terms from the table caption is increased, text from author names is not included in the ranking:
Additionally, for the Full Text with Figure Display view, the Lucene Highlights package is used to extract excerpts from the full text of the retrieved documents, to extract up to six passages that contain query term hits, and to highlight the query term hits within the excerpts.
As mentioned above, based on the conclusions of our pilot study , we made several modifications to the interface. Those conclusions also led to the following hypotheses, which we investigated in the study reported here:
Hypotheses H1 and H2 are supported, as shown in Table 1. Participants were asked to state, at the end of the study session, how often they would be likely to use each interface view, assuming all of the bioscience literature were in the collection. 19 out of 20 participants said they would use the full text search with figure display interface either frequently or sometimes.
However, participants were not asked to explicitly state if the reason for this preference is the showing of the figures or the search over full text, or both. Given that, further support for H1 is that, when asked to comment in more detail about which aspects they did and did not like about each view, 11 participants volunteered that they liked to see the figure thumbnails, and no participants stated that they disliked seeing the figures, although 5 people said the default view should be only titles and metadata. Further support for H2 is that 5 participants explicitly stated that they liked being able to see the text excerpts, 2 stated they liked seeing the variety of information within the search results listing, and no participants stated that they disliked the full text search (again, keeping in mind that 5 said the default view should be only showing title and metadata information). Thus, as seen in “blended” results in web search, many participants in this study liked the combined output view.
Additional comments about the full text figure display view, along with the number of participants who mentioned these points, are shown in Table 2. Notable and frequent among the favorable comments are mentions of the intuitiveness, clarity, and compactness of the design and the layout, including numerous mentions of the use of color and query term highlighting. Unfavorable aspects included a subset of people who preferred less information by default and some issues about size of the screen and the figures.
Moving on to H3, 15 out of 20 participants said they would use one of the two caption search and figure display interfaces frequently or sometimes. Table 3 shows some of the detailed likes and dislikes as well as feature requests for these views. As seen in our pilot study, there are differences of opinion as to which of the caption view interfaces (vertical list or grid) is better. This difference seems to hinge on how much caption information is visible within each view. A suggested feature was to support the grouping of all figures and captions from one paper together in the figure views (ignoring the relevance ranking of the caption for the query).
As for H4, table search view was seen as less generally useful, with only 10 out of 20 saying they would use it frequently or sometimes, but with another 7 saying they would use it rarely, suggesting that it is anticipated to be useful only for specialized information needs. Table 4 shows other comments made specifically about the table view. Some participants liked that the table view gave them the opportunity to look for information that can be found in tables but nowhere else in the paper, especially for experimental results.
Participants were also asked to estimate how often they would use the different views for different types of tasks. Table 5 shows how often participants responded “frequently” or “sometimes” for each view type although it may be more difficult for people to accurately estimate usage at this fine-grained a level. (Because not every participant is interested in every type of task, only some answered the question for given task types.) When broken down by task type, table caption search and display were seen as potentially useful for certain tasks, such as finding new developments and learning about experimental methods.
Across the views, participants requested features that are outside the goals of this study but are often present in literature search engines. Two participants felt strongly enough about their way of searching the literature to describe their methods in detail. Two use email alerts to save time on looking up for updates in their areas of interest. Seven participants mentioned scanning titles as a fast and effective way to detect interesting literature. Author and journal names were also used during the scanning process. Display of accurate date was also important to one participant, to be sure of what is the latest news in their field (the interface currently displays and sorts by year, not by month and day).
Regarding the BioText interface, 11 participants mentioned, in a variety of ways, that they liked the convenience of going directly to the figures, and to the full text of the papers, without having to go through PubMed or an intermediating journal site. Participants also liked having the larger versions of the figures, along with their captions, shown in a new window, but two mentioned that the title and other citation information should be shown in addition.
Two participants said that they liked the table view because tables often summarize results in papers, and appreciated that each table was shown with the corresponding caption and title. However, two participants felt that the context was not enough when a table is isolated from the full paper. As with figures, some participants asked that tables from the same paper be grouped together, independent of the relevancy ranking scores. Two participants requested that the system find a way to display abbreviated versions of the tables along with the figure thumbnails in the full text search results screen, although it is unclear how to useful a table “thumbnail” would be.
Generally, participants were not aware of many of PubMed's features (including its support for Boolean queries). This finding is not surprising, because few people are trained in its use and the advanced features are hidden behind drop-down menus which are less likely to be experimented with. We told all the participants that the collection was small because it contained only Open Access articles. Two participants went out of their way to ask why our collection was not larger, and were surprised to hear about the difference between free and Open Access articles.
Usability studies serve as a valuable evaluation measure for search engines and influence the design of all commercial search systems. As biology is becoming an information science, numerous tools with search facilities have been made available to bioscientists, but unfortunately not many usability studies have been performed/published that investigate the idiosyncrasies of search and design needs in biology.
The BioText system offers a great platform for investigating how to display the different components of full biomedical journal papers to users according to their information needs. More such studies need to be published to get a better understanding of the bioscientists search preferences. There is a number of standard methods used to evaluate search interfaces , depending the aim of each study. There are informal usability studies that help determine the design of a search interface, formal studies (control experiments) that help determine how specific design elements work for certain tasks, longitudinal studies that allow to understand users behavior over time and other large scale and log analyses that allow studying how current users react to variations of the interface. Such findings would benefit the whole text mining community, providing better insight on how to design systems that aim to end users.
Since most participants named PubMed as their literature main search system, the results presented in this paper suggest that the study participants would prefer the kind of functionality provided by BioText to be present in PubMed, but it is not entirely clear how to cleanly integrate BioText's features within a more general literature search interface. In order to determine for certain if this kind of interface is superior to standard search, thousands of users need to use the site. An analysis of the server logs of the search engine found that starting from launch in September 2007, and analyzing up through November 2008, there were nearly 24,000 queries and views of results pages, originating from approximately 11,000 unique IP addresses. However, indexing only a subset of the literature is unacceptable for widespread use. In this study we had to ask participants to pretend that the system indexed everything in MEDLINE, but in order to make a truly competing interface, all articles must be present, as users blame the interface if they cannot find the documents they expect, and many of the high impact journals are missing from the Open Access collection. (A fully competitive interface also needs to provide other services such as citation linking, which also requires access to the full text.) Thus the results of this research cannot be fully tested until full text of all (recent) research articles is made freely available for search interfaces.
This study found strong support for the hypothesis that bioscience literature search systems such as PubMed should show figures from the articles alongside the search results. Additionally, it found evidence that full text and captions should be searched along with the article title, metadata, and abstract. Finally, for a subset of users, allowing for explicit search within captions for figures and tables is a useful function. We hope that with further support by the scientific community of the Open Access publishing model, more biomedical literature systems will take advantage of full text, figures, tables, and their captions, in their searches.
The goal of the BioText interface is to improve the literature search experience by better aligning that process with how the literature is read by biologists, and the goal of the study was to determine if this approach provides a better search experience over the standard. Thus, the method used was a qualitative study, and the main measure is self-reported likelihood to use the interface again. (Usage intention has been found in a number of studies to be a good proxy for actual usage behavior , .) A secondary goal is to determine which details of the interface design are satisfactory and which require improvement.
In this study we choose to evaluate subjective responses rather than standard information retrieval objective measures such as precision and recall or time to find a relevant article. Precision and recall can be valuable measures for comparing ranking algorithms, and may be useful to evaluate after a new interface design has been shown to be acceptable to users. However, experience with the design of novel search interfaces suggests that the most important measure is whether or not people will choose the system over the current standard . Most experimental systems do not pass this test, as evidenced by the fact that the most popular search interfaces have remained relatively unchanged over time (both for Web search and for bioscience literature search). An example of a case in which a qualitatively different style of search interface became the dominant one is found in the tremendous popularity of faceted navigation on e-commerce and digital library web sites . This paradigm was shown to be strongly preferred by study participants , and subsequently became accepted and preferred by both designers and users of such sites.
Another popular quantitative measure that can be used for evaluating search interfaces is that of time, in terms of time taken to find relevant documents. That measure is problematic in that it does not account for what is learned about the documents from the search results view. Interfaces that provide more information in the document surrogate tend to require more time per item viewed for users to peruse the richer information . However, as mentioned above, for information-centric queries, it has been shown that richer results listings can be more useful and preferred over short snippets , . Nonetheless, a follow up study to this one could measure if the figures helped participants distinguish among relevant documents, or if they were able to save time in the reading of relevant documents by using the summary view.
To conduct the study session, one of the experimenters had a one-on-one session with the participant, lasting approximately 45 minutes (see Table 6). First, the participant read and signed a consent form. Next, the participant was asked general information about their scientific background and their literature searching habits. Next, the experimenter asked the participant for a query, and the experimenter showed the results of running that query in each of the available views. The participant was then instructed to try every view, using the same query or additional queries of their own choice. As they used the system, participants were encouraged to “think aloud,” while the experimenter recorded their actions, reactions and comments. (The “think aloud” protocol is a standard procedure in the evaluation of user interfaces .) In the final step, participants were asked to answer a pre-defined set of questions aimed at evaluating each view, and were asked an open-ended question about any additional thoughts or reactions they might have. Participants responses were recorded on paper by the experimenter and then transferred to a spreadsheet.
The participants were recruited via flyers placed on campus as well as via personal contacts by the researchers and their associates. The final set of participants consisted of 20 bioscientists (6 graduate students, 6 postdocs, 1 faculty, 7 other). An effort was made to recruit primarily biologists who were not focused on bioinformatics, as people who study information processing tools may have different attitudes towards them than those who focus on the sciences. Most of the participants work in cell or molecular biology, genetics or genomics, biochemistry, evolutionary biology or bioinformatics.
Table 7 shows participants' responses to questions about their fields of interest, how often they search the research literature and which tools they prefer to use. Only one participant had prior experience with a version of the experimental interface. 7 participants said they search the literature daily; 10 said they search one to several times a week and 3 said they search monthly or rarely. All participants use PubMed, and for 16 participants, it is their primary literature searching source. 6 participants also use Google web search, 3 use Google Scholar and another 2 use the Web of Science. Email alerts from journals and myNCBI were also mentioned as well as searching in journal sites, Wikipedia and other specialized sources. Table 8 summarizes the types of queries issued and their frequencies. As found in other studies , queries on gene and protein names are the most common. Also common in this collection were queries on author names, species, cell compartments, and methods and analysis techniques. The latter might be more common in our query collection than in other studies due to the ability to search caption text.
We are grateful to the study participants for their invaluable contribution to this study.
Competing Interests: The authors have declared that no competing interests exist.
Funding: This research was supported by National Science Foundation (NSF) grant DBI-0317510. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.