The document flow for this project is outlined in . Independent CTD-specific queries were made at PubMed to triage and retrieve 14,904 preliminary articles for the seven heavy metals. These articles were then processed by CTD's text-mining algorithm and assigned a DRS, which ranged from 2 to 398 (with higher numbers indicative of presumed increased relevancy). From this preliminary corpus, 1,020 of the articles were discovered to have been previously reviewed by CTD biocurators at an earlier time for different projects. These previously reviewed documents provided a test set to help validate the assigned DRS. We compared the DRS for the 1,020 articles against whether or not the article had been flagged as “curated” or “rejected” in CTD's Curation Tool by previous biocurators (). Of the 1,020 articles, 86% with a DRS ≥100 were found to be curatable for CTD relevant interactions, while only 10% of articles with a DRS ≤20 could be curated. There is a notably progressive decrease in the percentage of curated articles with a DRS between 21–99, indicating that articles with a DRS ≥100 are likely to be more relevant for curation in CTD while documents with a DRS ≤20 are more likely to be irrelevant. Subsequently, we arbitrarily refer to articles as being in one of three categories based upon their assigned DRS: high (≥100), medium (21–99), or low (1–20).
Test set of previously reviewed articles validates assigned DRS.
To test the effectiveness of DRS assignment, we next constructed a subset corpus representative of the 14,904 articles that could be feasibly reviewed by CTD biocurators. From this preliminary corpus, we selected all virgin articles (i.e., articles that had never been examined by CTD biocurators) with a high DRS ≥100 and all virgin articles with a low DRS ≤20 (, step 4), representing the two ends of the spectrum to allow for the best comparison. To this, we also added all the articles from the mercury subset with a DRS between 21–99 to give a good representation of the medium range; the mercury subset was chosen because it contained a reasonable number of articles that helped to balance the bins in the final assigned corpus. In total, this representative text-mined heavy metal corpus assigned to CTD biocurators for review contained 3,583 articles representing all three DRS categories: 1,981 with a DRS ≥100 (55% corpus), 879 with a DRS between 21–99 (25% corpus), and 723 with a DRS ≤20 (20% corpus). The distribution of articles representing different metals (and their DRS ranges) for this corpus were 28% mercury (DRS: 4–266), 27% copper (DRS: 2–360), 14% manganese (DRS: 4–318), 13% cadmium (DRS: 5–324), 8% cobalt (DRS: 4–284), 6% nickel (DRS: 3–282), and 3% lead (DRS: 8–310).
In eight weeks, five CTD biocurators reviewed 3,583 text-mined articles. During review, biocurators (blind to the DRS) decided if an article contained information relevant to CTD, defined as data that describes a chemical-gene, chemical-disease, or gene-disease interaction according to our established paradigm 
. If the document contained relevant information, it was curated following standard CTD procedures into our web-based Curation Tool 
. If the article did not contain any chemical-gene-disease interactions relevant to CTD, it was rejected and flagged as “not curatable” in the Curation Tool. CTD biocurators read and curated the significant points emphasized by the authors in the title and abstract. However, it was sometimes necessary for the biocurator to refer to the full text in order to resolve ambiguities found in the abstract, such as the correct species or gene identity. Once in the full text, the biocurator captured additional essential data not found in the abstract, including relevant information from supplementary tables (e.g
., microarray tables). Biocurators captured all relevant data for all referenced and resolvable chemicals, genes, and diseases; thus, curation was not restricted solely to the chemical for which the corpus was originally triaged; hence, in this project, interactions were curated for chemicals beyond the seven heavy metals. While entering interactions in the Curation Tool, biocurators designated the source of the interaction as either being derived from the “abstract” or the “full text”.
Of the 3,583 examined articles, 2,202 (61%) were curated and 1,381 (39%) were rejected (). We compared the DRS for these articles against their relevancy (i.e., “curated” or “rejected”) (). Similar to the test set of 1,020 documents (), there was also a dramatic progressive decrease in the percentage of curated articles with a DRS between 21–99 for these 3,583 articles (). Of the 1,981 articles with a high DRS, 1,685 of them (85%) were curated and only 15% rejected (). Alternatively, of the 723 articles with a low DRS, only 111 of them (15%) were curated and the other 85% rejected, an exact inverse of the high DRS articles. Of the 879 articles that have a medium DRS, 406 (46%) were curated and 473 (54%) were rejected. These metrics reflect the same pattern seen in the test corpus of 1,020 articles (), and they validate the DRS as a good indicator of an article's relevance for curation at CTD.
CTD manual curation metrics.
Curation of heavy metal corpus validates assigned DRS.
Rejected articles consumed only 6% of the biocurators' time (2,277 minutes out of 38,619 total minutes) and averaged 1.6 minutes per rejected article (). The bulk of a biocurator's time (94%) was spent curating articles, with an average curation rate of 16.5 minutes per curated article (). The curation rates correlated with the assigned DRS. Low and medium level DRS papers averaged 7.0–7.1 minutes per article, but documents with a high DRS averaged 19.4 minutes per article (). This progressive rate increase reflects the amount of data extracted from articles. In total, 41,208 interactions were manually curated from this corpus. Of those, 39,128 (95%) were curated exclusively from high DRS articles, and only 4% and a mere 0.8% were extracted from medium and low DRS articles, respectively ().
The number of interactions extracted per curated article also trends with the DRS, demonstrating that documents with a higher DRS have a greater density of curatable information as opposed to articles with a lower DRS (). Along these same lines, it is interesting to note that it took biocurators almost twice as long to reject a high DRS article compared to a low DRS article, as seen in the rejection rates of 2.7 vs. 1.5 minutes per rejected article, respectively (). We hypothesize that this ~2-fold difference may be possibly due to the increased density of chemical, gene, and disease actors found in the high DRS documents, requiring a biocurator additional time to sift through all the information before deciding that the article should be rejected.
DRS reflects the number of interactions per curated article.
Of the 2,202 curated articles, CTD biocurators composed interactions for the relevant information from just the abstracts for 1,381 (63%), from both the abstract and full-text for 670 (30%), and from solely the full text for 151 (7%) articles.
DRS is a better indicator than PMID for ranking the literature
Previous to text mining, PubMed abstracts slated for curatorial review at CTD were ranked solely by descending PMID, which generally reflects the publication date from newest to oldest paper. While this process works fine for small corpora (wherein all the articles will eventually be reviewed by a biocurator), it has major disadvantages for large corpora, since all the articles cannot possibly be reviewed due to limited time and resources; consequently, relevant papers may be missed simply because they have a lower numerical PMID published at an earlier time. Here, with the corpus of 3,583 articles completely vetted by biocurators, we can now retroactively compare the metrics and data content when viewed by either the DRS or PMID ranking methods for a variety of parameters, including (1) article relevance, (2) novel data content, (3) interaction yield rate, and (4) mean average precision (MAP).
For analysis and presentation, the 3,583 articles were first grouped into progressive quartiles (Q1–Q4), each containing 896 documents (except for Q4 which contained 895 articles), based upon either their descending PMID or their descending DRS. Thus, for PMID ranking, articles in Q1 have the highest numerical value (and generally represent the most recently published papers) while articles in Q4 have the lowest numerical PMID value. Similarly, for DRS ranking, documents in Q1 have the highest DRS, which in turn progressively decreases into Q4.
(1) Article relevance
Of the 3,583 articles reviewed, 2,202 (61%) were curated and 1,381 (39%) rejected (). When an article's relevance (i.e., “curated” vs. “rejected”) is viewed by both DRS and PMID ranking, the text-mining tool more effectively scored and ranked the relevant papers via DRS into Q1–Q2, compared to the less informed criteria of PMID, which instead distributed the papers equally across all quartiles ().
DRS effectively ranks articles for relevance.
(2) Novel data content
Of the 41,208 total interactions manually curated in this project, 38,118 of them (93%) were novel interactions not yet represented in CTD. The remaining 3,090 interactions (7%) repeated information and provided additional supporting evidence for data that had already been captured from other articles. Biocurators extract three types of information: chemical-gene, chemical-disease, and gene-disease interactions. Since we are interested in discovering new information to be included in CTD, we compared the distribution of the novel content for each type of interaction by both DRS and PMID ranking (). For all three types of interactions, the DRS more effectively identified and ranked the articles from whence novel interactions were ultimately curated for chemicals, genes, and diseases. Of the 35,385 novel chemical-gene interactions, 23,411 of them (66%) were ranked into Q1 by the DRS method, compared to only 10,617 (30%) by PMID (). For chemical-disease interactions, of the 1,549 novel interactions in total, the DRS ranked 1,007 (65%) into Q1 while PMID ranked only 349 (23%) of them (). Finally, of the 1,184 novel gene-disease interactions, DRS ranked 31% into Q1 while PMID ranked 22% of them (). The somewhat less pronounced differences between quartiles Q1, Q2, and Q3 for novel gene-disease interactions may be due to the more chemical-centric nature of the CTD ranking algorithm itself (). In sum, if curation had been restricted (due to limited resources) to only the first 896 documents (i.e., Q1), then ranking the corpus by DRS would have resulted in the collection of 24,780 novel interactions (65% of the possible 38,118 found in the complete corpus), while ranking by PMID would have only generated 11,232 (29%), demonstrating that simply ranking the corpus via text mining resulted in more than a 2-fold increase (65% vs. 29%) in novel data content for this project.
DRS effectively ranks articles for data content.
(3) Interaction yield rate
To help evaluate productivity at CTD, we calculate the interaction yield rate
, defined as the number of interactions curated per unit of time 
. The number of all interactions (i.e
., novel plus repeated interactions; ) is divided by the total time spent extracting them () to calculate the average interaction yield rate () for each quartile. Productivity for the first 896 documents in Q1 averaged 1.4 interactions per minute (when ranked by DRS) vs. 1.1 interactions per minute (when ranked by PMID), demonstrating that simply ranking articles by DRS over PMID boosts productivity by 27% for Q1, resulting in more interactions being curated per unit of time.
DRS effectively ranks articles for productivity.
(4) Mean average precision (MAP)
The MAP quantifies the ability of a ranking system to rank relevant documents more highly than non-relevant documents 
. For this study, the MAP for articles ranked by PMID was 62%, but increased to 85% when articles were instead ranked by the DRS method.
Other information-retrieval metrics were also calculated for this study. Gene/protein recall was 71%, chemical recall was 79%, and disease recall was 44% using macro-averaging. Recall scores were calculated by dividing the number of distinct curated gene, chemical, and disease actors identified by the text-mining tools (either by a synonym to the term or by the term itself) by the total number of distinct curated actors. It is important to note that precision (another standard information-retrieval metric) is not appropriate for calculation here. CTD is comprised of curated (rather than cited) genes/proteins, diseases, and chemicals within each abstract. There are many instances where valid, cited actors are not actually involved in curatable interactions, and other instances where curated actors reside only in the full text of the article. Consequently, the complete universe of valid, cited actors specifically resident within each abstract is unknown, preventing the calculation of an accurate precision metric.
Biological and toxicological interpretability of curated corpus
Although the performance of CTD's text-mining pipeline against the heavy metal corpus is the primary focus of analysis, interpretation of the biological and toxicological aspects of the resulting curation is worthy of note as well. The number of genes (and species from whence they were curated) for each heavy metal was vast, including 3,707 genes from 48 organisms for cadmium, 3,251 genes from 14 organisms for cobalt, 8,004 genes from 47 organisms for copper, 1,078 genes from 16 organisms for lead, 261 genes from 10 organisms for manganese, 462 genes from 45 organisms for mercury, and 1,171 genes from 9 organisms for nickel. The most common species for all seven heavy metals were Homo sapiens, Rattus norvegicus, and Mus musculus, but also prevalent for certain metals were Danio rerio (copper), Macaca fascicularis (manganese), and Daphnia magna (nickel), indicating that a wide range of taxons are used to study heavy metal toxicity and this breadth of research was captured in our pipeline.
CTD biocurators composed 441 unique types of chemical-gene interaction statements, the most prevalent (63%) describing how a heavy metal influenced the mRNA or protein expression of an interacting gene. The remaining 37% chemical-gene statements described heavy metal-gene/protein interactions involving methylation, binding, phosphorylation, activity, localization, secretion, splicing, stability, folding, import, export, cleavage, ubiquitination, chemical sensitivity/resistance, and numerous types of metabolic processing, amongst others. In total, 33 of the possible 55 action terms available in CTD's curation paradigm 
were used to compose interactions for the seven heavy metals, evincing a broad coverage of possible mechanisms of toxicity from the literature.
We next reviewed the gene sets associated with each heavy metal to gauge the biology and putative toxicity derived from this corpus. Gene lists for each metal were compared using CTD's “MyVenn” tool (http://ctdbase.org/tools/myVenn.go
) to look for shared and unique genes 
. Sixteen genes were common to all seven heavy metals: CASP3, CAT, GAPDH, HMOX1, IFNG, IGF1, JUN, MAPK1, MAPK3, MT1A, NFE2L2, NFKBIA, NOS2, PTGS2, TGFB1, and TNF. These genes were analyzed using CTD's “Gene Set Enricher” tool (http://ctdbase.org/tools/enricher.go
) to find enriched Gene Ontology (GO) biological processes 
. The most statistically significant enriched biological process was cellular response to chemical stimulus
(GO:0070887), supporting the toxicological relevance of this curated corpus. However, these 16 genes were also enriched for a wide array of other important biological processes, including gene expression
(GO:0010467; 14 genes), apoptotic process
(GO:0006915; 13 genes), regulation of immune system process
(GO:0002682; 11 genes), cell cycle
(GO:0007049; 9 genes), neurological system process
(GO:0050877; 8 genes), response to oxidative stress
(GO:0006979; 6 genes), and the cell signaling pathways MAPK cascade
(GO:0000165; 6 genes), Toll-like receptor signaling
(GO:0034130; 4 genes), and JAK-STAT cascade
(GO:0007259; 4 genes). This diversity suggests there are myriad ways for putative mechanisms of toxicity to be induced by heavy metals.
We also identified genes unique to each of the seven heavy metals curated in this corpus to look for putative metal-specific gene signatures. Of the 3,707 total genes associated with cadmium, 1,708 of them were unique to cadmium when compared against the gene sets for the other six heavy metals for this corpus. The unique gene sets for the other metals were 861 genes (out of 3,251) for cobalt, 4,512 genes (out of 8,004) for copper, 330 genes (out of 1,078) for lead, 30 genes (out of 261) for manganese, 99 genes (out of 462) for mercury, and 240 genes (out of 1,171) for nickel. Many of these refined unique lists were still too large or diverse to deduce any granular metal-specific GO biological processes, though the mercury gene set showed enrichment for cholinergic synaptic transmission (GO:0007271; 4 genes), while nickel indicated enrichment for cytokine-mediated signaling pathways (GO:0019221; 14 genes), suggesting putative mechanisms of neurotoxicity for the former and immunotoxicity for the latter.
Lastly, the disorders associated with each of the seven heavy metals included 41 diseases for cadmium, 19 diseases for cobalt, 70 diseases for copper, 28 diseases for lead, 24 diseases for manganese, 72 diseases for mercury, and 25 diseases for nickel. There were no specific diseases common to all seven heavy metals. However, to better visualize this landscape, and to look for shared types of diseases, we mapped these specific diseases to 21 generic disease categories using CTD's MEDIC-Slim disease vocabulary 
to look for common and unique disease classes amongst the metals (). Copper, lead, manganese, mercury, and cadmium showed a penchant for nervous system diseases, implying a shared toxicity end-point for many heavy metals. Other prevalent disease classes included digestive system disorders (cadmium, cobalt, and copper), urogenital disorders (cadmium, cobalt, and mercury), cardiovascular diseases (cobalt, copper, and manganese), and cancer (nickel, copper, cadmium, and cobalt). Nickel showed the most distinct distribution from the other six metals, with tendencies towards respiratory tract diseases and immune system disorders.
Disease category distribution for the seven heavy metals.