Many shoulder and elbow abstracts presented at the American Academy of Orthopaedic Surgeons (AAOS) annual meeting are cited in the orthopaedic literature or are used to guide orthopaedic practice, but not all of these abstracts are submitted, survive peer review, or eventually are published. Presuming unpublished works have not been scientifically confirmed, one could question whether it is academically responsible to cite abstracts presented at the AAOS before they are peer-reviewed and published. To partly address this issue we determined the peer-reviewed publication rate for 558 abstracts (233 papers and 325 posters) presented at the shoulder and elbow sessions of the AAOS from 1999 to 2004. In April 2007, we searched the computerized database MEDLINE® and PubMed® for published articles based on these abstracts. We examined the published articles to assess publication rate, time to publication, change in contents, change in authors, and change in conclusions of abstracts. The overall publication rate in peer-reviewed journals was 58% (321 of 558), similar to other orthopaedic meetings and medical disciplines. We believe it is unacceptable to cite shoulder and elbow abstracts submitted to the AAOS because only slightly more than ½ (58%) of them are authenticated scientifically.
Searching PubMed for citations related to a specific cancer center or group of authors can be labor-intensive. We have created a tool, PubMed QUEST, to aid in the rapid searching of PubMed for publications of interest. It was designed by taking into account the needs of entire cancer centers as well as individual investigators. The experience of using the tool by our institution’s cancer center administration and investigators has been favorable and we believe it could easily be adapted to other institutions. Use of the tool has identified limitations of automated searches for publications based on an author’s name, especially for common names. These limitations could likely be solved if the PubMed database assigned a unique identifier to each author.
PubMed Databases; Bibliographic Medical Informatics Applications; Information Storage and Retrieval
Research results are primarily published in scientific literature and curation efforts cannot keep up with the rapid growth of published literature. The plethora of knowledge remains hidden in large text repositories like MEDLINE. Consequently, life scientists have to spend a great amount of time searching for specific information. The enormous ambiguity among most names of biomedical objects such as genes, chemicals and diseases often produces too large and unspecific search results. We present GeneView, a semantic search engine for biomedical knowledge. GeneView is built upon a comprehensively annotated version of PubMed abstracts and openly available PubMed Central full texts. This semi-structured representation of biomedical texts enables a number of features extending classical search engines. For instance, users may search for entities using unique database identifiers or they may rank documents by the number of specific mentions they contain. Annotation is performed by a multitude of state-of-the-art text-mining tools for recognizing mentions from 10 entity classes and for identifying protein–protein interactions. GeneView currently contains annotations for >194 million entities from 10 classes for ∼21 million citations with 271 000 full text bodies. GeneView can be searched at http://bc3.informatik.hu-berlin.de/.
Thesis is an important part of specialisation and doctorate education and requires intense work. The aim of this study was to investigate the publication rates of Turkish Public Health Doctorate Theses (PHDT) and Public Health Specialization (PHST) theses in international and Turkish national peer-review journals and to analyze the distribution of research areas.
List of all theses upto 30 September 2009 were retrieved from theses database of the Council of Higher Education of the Republic of Turkey. The publication rates of these theses were found by searching PubMed, Science Citation Index-Expanded, Turkish Academic Network and Information Center (ULAKBIM) Turkish Medical Database, and Turkish Medline databases for the names of thesis author and mentor. The theses which were published in journals indexed either in PubMed or SCI-E were considered as international publications.
Our search yielded a total of 538 theses (243 PHDT, 295 PHST). It was found that the overall publication rate in Turkish national journals was 18%. The overall publication rate in international journals was 11.9%. Overall the most common research area was occupational health.
Publication rates of Turkish PHDT and PHST are low. A better understanding of factors affecting this publication rate is important for public health issues where national data is vital for better intervention programs and develop better public health policies.
Bibliometrics; Mentor; Publishing; Research; Scientometrics; Turkey
Publication databases in biomedicine (e.g., PubMed, MEDLINE) are growing rapidly in size every year, as are public databases of experimental biological data and annotations derived from the data. Publications often contain evidence that confirm or disprove annotations, such as putative protein functions, however, it is increasingly difficult for biologists to identify and process published evidence due to the volume of papers and the lack of a systematic approach to associate published evidence with experimental data and annotations. Natural Language Processing (NLP) tools can help address the growing divide by providing automatic high-throughput detection of simple terms in publication text. However, NLP tools are not mature enough to identify complex terms, relationships, or events.
In this paper we present and extend BioDEAL, a community evidence annotation system that introduces a feedback loop into the database-publication cycle to allow scientists to connect data-driven biological concepts to publications.
BioDEAL may change the way biologists relate published evidence with experimental data. Instead of biologists or research groups searching and managing evidence independently, the community can collectively build and share this knowledge.
Summary: Effective access to the vast biomedical knowledge present in the scientific literature is challenging. Semantic relations are increasingly used in knowledge management applications supporting biomedical research to help address this challenge. We describe SemMedDB, a repository of semantic predications (subject–predicate–object triples) extracted from the entire set of PubMed citations. We propose the repository as a knowledge resource that can assist in hypothesis generation and literature-based discovery in biomedicine as well as in clinical decision-making support.
Availability and implementation: The SemMedDB repository is available as a MySQL database for non-commercial use at http://skr3.nlm.nih.gov/SemMedDB. An UMLS Metathesaurus license is required.
This study was undertaken to investigate the trends of orthopedic publications during the last decade, and to document the country of origin, journal, funding source, and language of contribution using PubMed.
Orthopedic articles published between 2000 and 2009 were retrieved from PubMed using the following search terms: "orthopaedic[Affiliation] AND ("2000/1/1"[PDAT]: "2009/12/31"[PDAT])" and "orthopedic[Affiliation] AND ("2000/1/1"[PDAT]: "2009/12/31"[PDAT])." The articles were downloaded in XML file format, which contained the following information: article title, author names, journal names, publication dates, article types, languages, authors' affiliations and funding sources. These information was extracted, sorted, and rearranged using the database's management software. We investigated the annual number of published orthopedic articles worldwide and the annual rate of increase. Furthermore, the country of publication origin, journal, funding source, and language of contribution were also investigated.
A total of 46,322 orthopedic articles were published and registered in PubMed in the last 10 years. The worldwide number of published orthopedic articles increased from 2,889 in 2000 to 6,909 in 2009, showing an annual increase of 384.6 articles, or an annualized compound rate of 10.2%. The United States ranked highest in the number of published orthopedic articles, followed by Japan, the United Kingdom, Germany, and the Republic of Korea. Among the orthopedic articles published worldwide during the last 10 years, 37.9% pertained studies performed in the United States. Fifty-seven point three percent (57.3%) of articles were published in journals established in the United States. Among the published orthopaedic articles, 4,747 articles (10.2%) disclosed financial support by research funds, of which 4,688 (98.8%) articles utilized research funds from the United States. Most articles were published in English (97.2%, 45,030 articles).
The number of published orthopedic articles has been increasing over the last decade. The number of orthopedic articles, journals publication, and funding sources were dominated by research conducted in the United States, while share and growth of Asian countries including Japan, the Republic of Korea, and China were notable.
Bibliometrics; Orthopedics; Research trend; Periodicals as topics
Evidence is emerging that obesity-associated cardiovascular disorders (CVD) show variations across regions and ethnicities. However, it is unclear if there are distinctive patterns of abdominal obesity contributing to an increased CVD risk in South Asians. Also, potential underlying mechanistic pathways of such unique patterns are not comprehensively reported in South Asians. This review sets out to examine both. A comprehensive database search strategy was undertaken, namely, PubMed, Embase and Cochrane Library, applying specific search terms for potentially relevant published literature in English language. Grey literature, including scientific meeting abstracts, expert consultations, text books and government/non-government publications were also retrieved. South Asians have 3-5% higher body fat than whites, at any given body mass index. Additional distinctive features, such as South Asian phenotype, low adipokine production, lower lean body mass, ethno-specific socio-cultural and economic factors, were considered as potential contributors to an early age-onset of obesity-linked CVD risk in South Asians. Proven cost-effective anti-obesity strategies, including the development of ethno-specific clinical risk assessment tools, should be adopted early in the life-course to prevent premature CVD deaths and morbidity in South Asians.
Abdominal obesity; cardiovascular risk; cardiovascular diseases; diabetes mellitus; South Asians
Although many interactions between HIV-1 and human proteins have been reported in the scientific literature, no publicly accessible source for efficiently reviewing this information was available. Therefore, a project was initiated in an attempt to catalogue all published interactions between HIV-1 and human proteins. HIV-related articles in PubMed were used to develop a database containing names, Entrez GeneIDs, and RefSeq protein accession numbers of interacting proteins. Furthermore, brief descriptions of the interactions, PubMed identification numbers of articles describing the interactions, and keywords for searching the interactions were incorporated. Over 100,000 articles were reviewed, resulting in the identification of 1448 human proteins that interact with HIV-1 comprising 2589 unique HIV-1-to-human protein interactions. Preliminary analysis of the extracted data indicates 32% were direct physical interactions (e.g., binding) and 68% were indirect interactions (e.g., up-regulation through activation of signaling pathways). Interestingly, 37% of human proteins in the database were found to interact with more than one HIV-1 protein. For example, the signaling protein mitogen-activated protein kinase 1 has a surprising range of interactions with 10 different HIV-1 proteins. Moreover, large numbers of interactions were published for the HIV-1 regulatory protein Tat and envelope proteins: 30% and 33% of total interactions identified, respectively. The database is accessible at http://www.ncbi.nlm.nih.gov/RefSeq/HIVInteractions/ and is cross-linked to other National Center for Biotechnology Information databases and programs via Entrez Gene. This database represents a unique and continuously updated scientific resource for understanding HIV-1 replication and pathogenesis to assist in accelerating the development of effective therapeutic and vaccine interventions.
Despite their obligation to do so, tobacco companies often failed to conduct product safety research or, when research was conducted, failed to disseminate the results to the medical community and to the public. The tobacco company lawyers' role in these actions was investigated with a focus on their involvement in company scientific research, claims of attorney‐client privilege and work‐product cover, document concealment, and litigation tactics.
Searches of previously secret internal tobacco industry documents located at Tobacco Documents Online. Additional searches included court transcripts, legal cases and articles obtained through Westlaw, PubMed, and the internet.
Tobacco company lawyers have been involved in activities having little or nothing to do with the practice of law, including gauging and attempting to influence company scientists' beliefs, vetting in‐house scientific research, and instructing in‐house scientists not to publish potentially damaging results. Additionally, company lawyers have taken steps to manufacture attorney‐client privilege and work‐product cover to assist their clients in protecting sensitive documents from disclosure, have been involved in the concealment of such documents, and have employed litigation tactics that have largely prevented successful lawsuits against their client companies.
Tobacco related diseases have proliferated partly because of tobacco company lawyers. Their tactics have impeded the flow of information about the dangers of smoking to the public and the medical community. Additionally, their extravagantly aggressive litigation tactics have pushed many plaintiffs into dropping their cases before trial, thus reducing the opportunities for changes to be made to company policy in favour of public health. Stricter professional oversight is needed to ensure that this trend does not continue.
atorney‐client priviledge; disease vector; lawyers; tobacco companies
PubMed contains a bibliography of articles published in around 4800 journals. It combines MEDLINE and OLDMEDLINE (articles from 1960, going back till the 1940s). PubMed is updated on a daily basis; to include both published and ahead of print references. As a radiologist, one can use PubMed to track several journals, track topics, search for specific topics, verify incomplete or incorrect references, store one's own publications, and save selected references; one can also create filters depending on one's own search needs for some regular topics. This article provides some key background knowledge on searching PubMed and also describes some features that are often left unexplored. The PubMed site has undergone many changes in the last few years and this article will update users on the current features.
Information retrieval; literature search; PubMed
The increase in the size of the scientific community created an explosion in scientific production. We have analyzed the dynamics of biomedical scientific output during 1957–2007 by applying a bibliometric analysis of the PubMed database using different keywords representing specific biomedical topics. With the assumption that increased scientific interest will result in increased scientific output, we compared the output of specific topics to that of all scientific output. This analysis resulted in three broad categories of topics; those that follow the general trend of all scientific output, those that show highly variable output, and attractive topics which are new and grow explosively. The analysis of the citation impact of the scientific output resulted in a typical longtail distribution: the majority of journals and articles are of very low impact. This distribution has remained unchanged since 1957, although the interests of scientists must have shifted in this period. We therefore analyzed the distribution of articles in top journals and lower impact journals over time for the attractive topics. Novelty is rewarded by publication in top journals. Over time more articles are published in low impact journals progressively creating the longtail distribution, signifying acceptance of the topic by the community. There can be a gap of years between novelty and acceptance. Within topics temporary novelty is created with new subtopics. In conclusion, the longtail distribution is the foundation of the scientific output of the scientific community and can be used to examine different aspects of science practice.
This editorial announces Algorithms for Molecular Biology, a new online open access journal published by BioMed Central. By launching the first open access journal on algorithmic bioinformatics, we provide a forum for fast publication of high-quality research articles in this rapidly evolving field. Our journal will publish thoroughly peer-reviewed papers without length limitations covering all aspects of algorithmic data analysis in computatioal biology. Publications in Algorithms for Molecular Biology are easy to find, highly visible and tracked by organisations such as PubMed. An established online submission system makes a fast reviewing procedure possible and enables us to publish accepted papers without delay. All articles published in our journal are permanently archived by PubMed Central and other scientific archives. We are looking forward to receiving your contributions.
The 2009 influenza A(H1N1) pandemic has generated thousands of articles and news items. However, finding relevant scientific articles in such rapidly developing health crises is a major challenge which, in turn, can affect decision-makers' ability to utilise up-to-date findings and ultimately shape public health interventions. This study set out to show the impact that the inconsistent naming of the pandemic can have on retrieving relevant scientific articles in PubMed/MEDLINE.
We first formulated a PubMed search algorithm covering different names of the influenza pandemic and simulated the results that it would have retrieved from weekly searches for relevant new records during the first 10 weeks of the pandemic. To assess the impact of failing to include every term in this search, we then conducted the same searches but omitted in turn “h1n1,” “swine,” “influenza” and “flu” from the search string, and compared the results to those for the full string.
On average, our core search string identified 44.3 potentially relevant new records at the end of each week. Of these, we determined that an average of 27.8 records were relevant. When we excluded one term from the string, the percentage of records missed out of the total number of relevant records averaged 18.7% for omitting “h1n1,” 13.6% for “swine,” 17.5% for “influenza,” and 20.6% for “flu.”
Due to inconsistent naming, while searching for scientific material about rapidly evolving situations such as the influenza A(H1N1) pandemic, there is a risk that one will miss relevant articles. To address this problem, the international scientific community should agree on nomenclature and the specific name to be used earlier, and the National Library of Medicine in the US could index potentially relevant materials faster and allow publishers to add alert tags to such materials.
Mineral trioxide aggregate (MTA) has been suggested for root-end filling, vital pulp therapy, apical plug, perforations repair, and root canal filling. Since the introduction of MTA in 1993, many studies about this material have been published. The aim of this survey was to illustrate statistical information about published articles in PubMed-index journals vis-à-vis the various aspects of this biomaterial.
Material and Methods
A PubMed search was performed to retrieve the relative articles from 1993 to August 2012. The data of each article including publication year, journal name, number of authors, first author name, affiliations and study design were recorded. Citation of each article till 2009 was obtained from Scopus and Google scholar databases. Data were analyzed to determine the related scientometric indicators.
In total, 1027 articles were found in PubMed-indexed journals which show considerable increase from 2 papers in 1993 to 139 in 2011. While ~62% of articles had no level of evidence, only ~5% could be classified as having the highest level of evidence (LOE1); however, the majority of LOE1 articles originated from Iran (~1%: n=10). Journal of Endodontics, as the top rank journal, published 31.7% of MTA related articles. The majority of articles were four-authored (19.6%). Most of the articles originated from USA (21.9%), Brazil (18.5%) and Iran (8.76%). The average number of citation for the top ten articles from Scopus was 231.
This data demonstrates that during the past two decades, research on this novel endodontic biomaterial had a rapid positive trend especially during the last 5 years. Further high-level evidence articles for the various clinical applications of MTA would result in superior clinical decision making and stronger scientific-based endodontic practice.
Biomaterial; Endodontic; Mineral trioxide aggregate; MTA; PubMed; Scientometric
The purpose of writing this review is to showcase the Molecular Imaging and Contrast Agent Database (MICAD; www.micad.nlm.nih.gov) to students, researchers and clinical investigators interested in the different aspects of molecular imaging. This database provides freely accessible, current, online scientific information regarding molecular imaging (MI) probes and contrast agents (CA) used for positron emission tomography, single-photon emission computed tomography, magnetic resonance imaging, x-ray/computed tomography, optical imaging and ultrasound imaging. Detailed information on >1000 agents in MICAD is provided in a chapter format and can be accessed through PubMed. Lists containing >4250 unique MI probes and CAs published in peer-reviewed journals and agents approved by the United States Food and Drug Administration (FDA) as well as a CSV file summarizing all chapters in the database can be downloaded from the MICAD homepage. Users can search for agents in MICAD on the basis of imaging modality, source of signal/contrast, agent or target category, preclinical or clinical studies, and text words. Chapters in MICAD describe the chemical characteristics (structures linked to PubChem), the in vitro and in vivo activities and other relevant information regarding an imaging agent. All references in the chapters have links to PubMed. A Supplemental Information Section in each chapter is available to share unpublished information regarding an agent. A Guest Author Program is available to facilitate rapid expansion of the database. Members of the imaging community registered with MICAD periodically receive an e-mail announcement (eAnnouncement) that lists new chapters uploaded to the database. Users of MICAD are encouraged to provide feedback, comments or suggestions for further improvement of the database by writing to the editors at: email@example.com
Molecular imaging probes; Contrast agents; Database; positron emission tomography (PET); single-photon emission computed tomography (SPECT); magnetic resonance imaging (MRI); x-ray/computed tomography (x-ray/CT); optical imaging (OI); ultrasound imaging
The assumption that a name uniquely identifies an entity introduces two types of errors: splitting treats one entity as two or more (because of name variants); lumping treats multiple entities as if they were one (because of shared names). Here we investigate the extent to which splitting and lumping affect commonly-used measures of large-scale named-entity networks within two disambiguated bibliographic datasets: one for co-author names in biomedicine (PubMed, 2003–2007); the other for co-inventor names in U.S. patents (USPTO, 2003–2007). In both cases, we find that splitting has relatively little effect, whereas lumping has a dramatic effect on network measures. For example, in the biomedical co-authorship network, lumping (based on last name and both initials) drives several measures down: the global clustering coefficient by a factor of 4 (from 0.265 to 0.066); degree assortativity by a factor of ∼13 (from 0.763 to 0.06); and average shortest path by a factor of 1.3 (from 5.9 to 4.5). These results can be explained in part by the fact that lumping artificially creates many intransitive relationships and high-degree vertices. This effect of lumping is much less dramatic but persists with measures that give less weight to high-degree vertices, such as the mean local clustering coefficient and log-based degree assortativity. Furthermore, the log-log distribution of collaborator counts follows a much straighter line (power law) with splitting and lumping errors than without, particularly at the low and the high counts. This suggests that part of the power law often observed for collaborator counts in science and technology reflects an artifact: name ambiguity.
Background and aims
This survey was conducted to provide statistical data regarding publications in PubMed-indexed journals from Tabriz University of Medical Sciences Faculty of Dentistry.
Materials and methods
The database used for this study was PubMed. The search was conducted using key words including the names of the heads of the departments. Papers published between January 1, 2005 and April 31, 2012 were considered. The retrieved abstracts were reviewed and unrelated articles were excluded. Data were transferred to Microsoft Excel software for descriptive statistical analyses.
A total of 158 papers matched the inclusion criteria, with the majority from the Department of Endodontics (49 articles). The highest proportion (48.3%) of papers was related to in vitro studies, followed by clinical trials, in vivo studies, and case reports. The number of publications showed a considerable increase over the studied period.
PubMed-indexed publications from different departments have increased steadily, suggesting that research has become an essential component in the evaluated institute.
Dental; faculty; medical; scientific publication; university
The CleanEx expression database (http://www.cleanex.isb-sib.ch) provides access to public gene expression data via unique gene names as well as via experiments biomedical characteristics. To reach this, a dual annotation of both sequences and experiments has been generated. First, the system links official gene symbols to any kind of sequences used for gene expression measurements (cDNA, Affymetrix, oligonucleotide arrays, SAGE or MPSS tags, Expressed Sequence Tags or other mRNA sequences, etc.). For the biomedical annotation, we re-annotate each experiment from the CleanEx database with the MeSH (Medical Subject Headings) terms, primarily used by NLM (National Library of Medicine) for indexing articles for the MEDLINE/PubMED database. This annotation allows a fast and easy retrieval of expression data with common biological or medical features. The numerical data can then be exported as matrix-like tab-delimited text files. Data can be extracted from either one dataset or from heterogeneous datasets.
The scientific production of institutions of higher education, as well as the dissemination and use of this published work by peer institutions, can be assessed by means of quantitative and qualitative measurements. This type of analysis can also serve as the basis of further academic actions. Variables such as the type of evaluation, the number of faculty members and the decision to include or exclude researchers who are not professors are difficult to measure when comparing different schools and institutions.
The purpose of this study was to assess the scientific production of tenured faculty from the Universidade de São Paulo, Faculdade de Medicina performed from 2001 to 2006.
Medline/PubMed database was considered and the Impact factors (IFs - Journal Citation Report, 2006) and the number of generated citations (Web of Science/ISI Thomson) were also evaluated.
The analysis of the scientific production of 66 full professors (level MS-6) revealed 1,960 scientific articles published in 630 scientific journals, of which 31.3% were Brazilian and 68.7% were from international sources. Among these, 47% of the articles were published in 62.9% of the journals with IFs above 10, although 16.4% of the journals did not have assigned IF values. We verified that 45% of the published articles received 9,335 citations (average of 11 + 17), with the majority of these (8,968 citations) appearing in international scientific journals.
Our results indicate that it is possible to analyze the scientific production of a learning institution by the number of papers published by full professors, taking into account not only their academic position and influence, but also the fact that publication is an opportunity to stimulate joint projects with other members of the same institution.
Scientific publication indicators; Research personnel/Statistics and numerical data; Medline/utilization; Impact factor; Bibliometric indicators
BioLit is a web server which provides metadata describing the semantic content of all open access, peer-reviewed articles which describe research from the major life sciences literature archive, PubMed Central. Specifically, these metadata include database identifiers and ontology terms found within the full text of the article. BioLit delivers these metadata in the form of XML-based article files and as a custom web-based article viewer that provides context-specific functionality to the metadata. This resource aims to integrate the traditional scientific publication directly into existing biological databases, thus obviating the need for a user to search in multiple locations for information relating to a specific item of interest, for example published experimental results associated with a particular biological database entry. As an example of a possible use of BioLit, we also present an instance of the Protein Data Bank fully integrated with BioLit data. We expect that the community of life scientists in general will be the primary end-users of the web-based viewer, while biocurators will make use of the metadata-containing XML files and the BioLit database of article data. BioLit is available at http://biolit.ucsd.edu.
Summary: Scientists, educators and the general public often need to know times of divergence between species. But they rarely can locate that information because it is buried in the scientific literature, usually in a format that is inaccessible to text search engines. We have developed a public knowledgebase that enables data-driven access to the collection of peer-reviewed publications in molecular evolution and phylogenetics that have reported estimates of time of divergence between species. Users can query the TimeTree resource by providing two names of organisms (common or scientific) that can correspond to species or groups of species. The current TimeTree web resource (TimeTree2) contains timetrees reported from molecular clock analyses in 910 published studies and 17 341 species that span the diversity of life. TimeTree2 interprets complex and hierarchical data from these studies for each user query, which can be launched using an iPhone application, in addition to the website. Published time estimates are now readily accessible to the scientific community, K–12 and college educators, and the general public, without requiring knowledge of evolutionary nomenclature.
Availability: TimeTree2 is accessible from the URL http://www.timetree.org, with an iPhone app available from iTunes (http://itunes.apple.com/us/app/timetree/id372842500?mt=8) and a YouTube tutorial (http://www.youtube.com/watch?v=CxmshZQciwo).
Supplementary information: Supplementary data are available at Bioinformatics online.
Many scientists now manage the bulk of their bibliographic information electronically, thereby organizing their publications and citation material from digital libraries. However, a library has been described as “thought in cold storage,” and unfortunately many digital libraries can be cold, impersonal, isolated, and inaccessible places. In this Review, we discuss the current chilly state of digital libraries for the computational biologist, including PubMed, IEEE Xplore, the ACM digital library, ISI Web of Knowledge, Scopus, Citeseer, arXiv, DBLP, and Google Scholar. We illustrate the current process of using these libraries with a typical workflow, and highlight problems with managing data and metadata using URIs. We then examine a range of new applications such as Zotero, Mendeley, Mekentosj Papers, MyNCBI, CiteULike, Connotea, and HubMed that exploit the Web to make these digital libraries more personal, sociable, integrated, and accessible places. We conclude with how these applications may begin to help achieve a digital defrost, and discuss some of the issues that will help or hinder this in terms of making libraries on the Web warmer places in the future, becoming resources that are considerably more useful to both humans and machines.
Event extraction following the GENIA Event corpus and BioNLP shared task models has been a considerable focus of recent work in biomedical information extraction. This work includes efforts applying event extraction methods to the entire PubMed literature database, far beyond the narrow subdomains of biomedicine for which annotated resources for extraction method development are available.
In the present study, our aim is to estimate the coverage of all statements of gene/protein associations in PubMed that existing resources for event extraction can provide. We base our analysis on a recently released corpus automatically annotated for gene/protein entities and syntactic analyses covering the entire PubMed, and use named entity co-occurrence, shortest dependency paths and an unlexicalized classifier to identify likely statements of gene/protein associations. A set of high-frequency/high-likelihood association statements are then manually analyzed with reference to the GENIA ontology.
We present a first estimate of the overall coverage of gene/protein associations provided by existing resources for event extraction. Our results suggest that for event-type associations this coverage may be over 90%. We also identify several biologically significant associations of genes and proteins that are not addressed by these resources, suggesting directions for further extension of extraction coverage.
Current scientific research takes place in highly specialized contexts with poor communication between disciplines as a likely consequence. Knowledge from one discipline may be useful for the other without researchers knowing it. As scientific publications are a condensation of this knowledge, literature-based discovery tools may help the individual scientist to explore new useful domains. We report on the development of the DAD-system, a concept-based Natural Language Processing system for PubMed citations that provides the biomedical researcher such a tool. We describe the general architecture and illustrate its operation by a simulation of a well-known text-based discovery: The favorable effects of fish oil on patients suffering from Raynaud's disease .