As of July 2011, TAIR has established collaborations with the following 10 journals: Plant Physiology, Plant Journal, Plant Cell, Journal of Integrative Plant Biology, Journal of Experimental Botany, Plant Science, Environmental Botany, Plant Physiology and Biochemistry, Plant, Cell and Environment, and Molecular Plant. The journals belong to a variety of publishing houses: the American Society for Plant Biology, Elsevier, Wiley-Blackwell and Oxford University Press. The journals have all incorporated language in their manuscript submission process that refers to the effort with TAIR to collect functional information about Arabidopsis genes from authors at the time of manuscript acceptance.
In this study, community submissions were scored an average of 81% for completeness, 97.2% for experimental support and 93% for appropriate level of term specificity when compared with annotations that would have been made by trained biocurators based on the same publications. This is an encouraging result in light of the need to make literature curation cost-effective and scalable, and provides support for the idea that this could be accomplished by spreading out the large task of literature curation over the larger Arabidopsis and plant biology community. Additionally, although the present study was not designed to assess how frequently curators miss annotations that would be made by the author or other researcher with deep knowledge of the research area, it is possible that the loss of some curator annotations will be offset by additional community annotations.
The differences we found between annotations submitted by researchers and those by curators include both term selection and completeness of the annotations. We speculate that the term selection differences will diminish as the community becomes increasingly familiar with controlled vocabularies like GO and PO, especially with exposure to the controlled vocabularies through tools like TOAST. With respect to completeness of annotations, we found that researchers tended to submit annotations for genes considered as the primary focus of the article, whereas a curator was likely to annotate ‘secondary’ genes as well. Differences in term choice and completeness may also be due to differences in formal training in the use of controlled vocabularies for capturing gene-related information. Curators are trained to assign GO/PO terms from each category (function, process, component, plant structure, plant growth and developmental stages) if experimental support for such information is provided in the article being curated. Curators also have the added benefit of being very familiar with the ontologies and how to browse them when searching for the most appropriate term. Community members sometimes chose more general but still correct terms even though more specific terms that accurately described the result were available in the ontology. It should be noted that there is also variability in annotations made for the same paper from two different trained curators, based on the varying degrees of familiarity of the particular curator with the subject matter of the article at hand. TOAST could be modified to make the term definitions as well as the structure of the GO more accessible. Instructions also need to clearly indicate that only results from experiments presented in the article itself should be used to make annotations.
An area still in need of improvement is the degree of author participation in the submission process. For the period spanning September 2010 to May 2011, 75 Plant Physiology articles were tagged by their authors as having gene-related information and were confirmed by TAIR curators to contain information that could be integrated into TAIR. TAIR received annotations for 12 of these articles (16%). After sending email reminders to the corresponding author of each article, the total number of articles with community annotations rose to 40 (53%). For the dataset in our study, 74% of the submitters provided data spontaneously whereas 26% had to be reminded at least once. These results suggest that there is still ample room for improvement with respect to author awareness and participation in the community curation scheme. We need to be able to pinpoint the source of non-participation: (i) competing priorities and time pressure for researchers that may limit their participation; (ii) difficulty with learning how to use the submission tool; or (iii) a lack of understanding about the type of data that can be submitted. It may also be necessary to find better ‘carrots’ or bigger ‘sticks’ to spur participation.
A robust level of community participation is critical for the success of the journal and database collaborative model of annotation presented here. The benefits of community participation are clear: (i) the data in the community database is kept up to date and relevant for the research community; (ii) the workload of capturing annotations from articles is spread out over more people, making it possible to cover a larger portion of the research literature; and (iii) the community becomes familiar with the ontologies and can use them more effectively as research tools.
Review of community annotations by curators before acceptance is an integral part of the community submission process at TAIR. Although the curator does not read the article or search for the types of errors in accuracy, completeness or specificity presented in this study, curator review is helpful for three reasons: (i) the TOAST submission form allows submitters to enter free text in the term field if no appropriate term is found, necessitating the intervention of a curator to find an appropriate existing ontology term or request a new one; (ii) typographical and formatting errors can be caught by a quick review; and (iii) obvious out-of-scope annotations are detected. Until submission software advances to the point where error checks are sufficient to identify and address formatting or ontology term usage errors and the community is fully able to find correct ontology terms or request new ones as needed, we believe that a trained biocurator will need to review all submissions before integrating them into the database. As the submission error checking improves and the community gains more experience with ontologies, it may be possible to shorten the review process or eliminate it altogether.