The third edition of the BioNLP Shared Task was held with the grand theme "knowledge base construction (KB)". The Genia Event (GE) task was re-designed and implemented in light of this theme. For its final report, the participating systems were evaluated from a perspective of annotation. To further explore the grand theme, we extended the evaluation from a perspective of KB construction. Also, the Gene Regulation Ontology (GRO) task was newly introduced in the third edition. The final evaluation of the participating systems resulted in relatively low performance. The reason was attributed to the large size and complex semantic representation of the ontology. To investigate potential benefits of resource exchange between the presumably similar tasks, we measured the overlap between the datasets of the two tasks, and tested whether the dataset for one task can be used to enhance performance on the other.
We report an extended evaluation on all the participating systems in the GE task, incoporating a KB perspective. For the evaluation, the final submission of each participant was converted to RDF statements, and evaluated using 8 queries that were formulated in SPARQL. The results suggest that the evaluation may be concluded differently between the two different perspectives, annotation vs. KB. We also provide a comparison of the GE and GRO tasks by converting their datasets into each other's format. More than 90% of the GE data could be converted into the GRO task format, while only half of the GRO data could be mapped to the GE task format. The imbalance in conversion indicates that the GRO is a comprehensive extension of the GE task ontology. We further used the converted GRO data as additional training data for the GE task, which helped improve GE task participant system performance. However, the converted GE data did not help GRO task participants, due to overfitting and the ontology gap.
bionlp; shared task; evaluation; information extraction; text mining; knowledge base; semantic web; resource description framework
AIM: To investigate the efficacy and safety of transarterial chemoembolization (TACE)-based multimodal treatment in patients with large hepatocellular carcinoma (HCC).
METHODS: A total of 146 consecutive patients were included in the analysis, and their medical records and radiological data were reviewed retrospectively.
RESULTS: In total, 119 patients received TACE-based multi-modal treatments, and the remaining 27 received conservative management. Overall survival (P < 0.001) and objective tumor response (P = 0.003) were significantly better in the treatment group than in the conservative group. After subgroup analysis, survival benefits were observed not only in the multi-modal treatment group compared with the TACE-only group (P = 0.002) but also in the surgical treatment group compared with the loco-regional treatment-only group (P < 0.001). Multivariate analysis identified tumor stage (P < 0.001) and tumor type (P = 0.009) as two independent pre-treatment factors for survival. After adjusting for significant pre-treatment prognostic factors, objective response (P < 0.001), surgical treatment (P = 0.009), and multi-modal treatment (P = 0.002) were identified as independent post-treatment prognostic factors.
CONCLUSION: TACE-based multi-modal treatments were safe and more beneficial than conservative management. Salvage surgery after successful downstaging resulted in long-term survival in patients with large, unresectable HCC.
Hepatocellular carcinoma; Multimodal treatment; Transarterial chemoembolization; Salvage surgery
Transcatheter arterial radioembolization (TARE) with Yttrium-90 (90Y)-labeled microspheres has an emerging role in treatment of patients with unresectable hepatocellular carcinoma. Although complication of TARE can be minimized by aggressive pre-evaluation angiography and preventive coiling of aberrant vessels, radioembolization-induced gastroduodenal ulcer can be irreversible and can be life-threatening. Treatment of radioembolization-induced gastric ulcer is challenging because there is a few reported cases and no consensus for management. We report a case of severe gastric ulceration with bleeding that eventually required surgery due to aberrant deposition of microspheres after TARE.
Gastrectomy; Gastric ulcer; Hepatocellular carcinoma; Radioembolization; Yttrium-90
AIM: To evaluate the significance of computed tomography (CT) findings in relation to liver chemistry and the clinical course of acute hepatitis.
METHODS: Four hundred and twelve patients with acute hepatitis who underwent enhanced CT scanning were enrolled retrospectively. Imaging findings were analyzed for the following variables: gallbladder wall thickness (GWT), arterial heterogeneity, periportal tracking, number and maximum size of lymph nodes, presence of ascites, and size of spleen. The serum levels of alanine aminotransferase, alkaline phosphatase, bilirubin, albumin, and prothrombin time were measured on the day of admission and CT scan, and laboratory data were evaluated every 2-4 d for all subjects during hospitalization.
RESULTS: The mean age of patients was 34.4 years, and the most common cause of hepatitis was hepatitis A virus (77.4%). The mean GWT was 5.2 mm. The number of patients who had findings of arterial heterogeneity, periportal tracking, lymph node enlargement > 7 mm, and ascites was 294 (80.1%), 348 (84.7%), 346 (84.5%), and 56 (13.6%), respectively. On multivariate logistic regression, male gender [odds ratio (OR) = 2.569, 95%CI: 1.477-4.469, P = 0.001], toxic hepatitis (OR = 3.531, 95%CI: 1.444-8.635, P = 0.006), level of albumin (OR = 2.154, 95%CI: 1.279-3.629, P = 0.004), and GWT (OR = 1.061, 95%CI: 1.015-1.110, P = 0.009) were independent predictive factors for severe hepatitis. The level of bilirubin (OR = 1.628, 95%CI: 1.331-1.991, P < 0.001) and GWT (OR = 1.172, 95%CI: 1.024-1.342, P = 0.021) were independent factors for prolonged cholestasis in multivariate analysis.
CONCLUSION: In patients with acute hepatitis, GWT on CT scan was an independent predictor of severe hepatitis and prolonged cholestasis.
Acute hepatitis; Cholestasis; Computed tomography; Prognosis; Gallbladder
AIM: To investigate retrospectively the long-term efficacy of various treatment strategies using adefovir dipivoxil (adefovir) in patients with lamivudine-resistant chronic hepatitis B.
METHODS: We included 154 consecutive patients in two treatment groups: the “add-on” group (n = 79), in which adefovir was added to ongoing lamivudine treatment due to lamivudine resistance, and the “switch/combination” group (n = 75), in which lamivudine was first switched to adefovir and then re-added later as needed. The “switch/combination” group was then divided into two subgroups depending on whether participants followed (group A, n = 30) or violated (group B, n = 45) a proposed treatment strategy that determined whether to add lamivudine based on the serum hepatitis B virus (HBV) DNA levels (< 60 IU/mL or not) after 6 mo of treatment (roadmap concept).
RESULTS: The cumulative probability of virologic response (HBV DNA < 60 IU/mL) was higher in group A than in the “add-on” group and in group B (P < 0.001). In contrast, the cumulative probability of virologic breakthrough was lower in the “add-on” group than in group B (P = 0.002). Furthermore, the risk of virologic breakthrough in the multivariate analysis was significantly lower in the “add-on” group than in group A (hazard ratio = 0.096; 95%CI, 0.015-0.629; P = 0.015).
CONCLUSION: The selective combination of adefovir with lamivudine based upon early treatment responses increased the odds of virologic breakthrough relative to the use of uniform combination therapy from the beginning of treatment.
Chronic hepatitis B; Lamivudine-resistant; Adefovir; Combination therapy; Roadmap concept
The Genia task, when it was introduced in 2009, was the first community-wide effort to address a fine-grained, structural information extraction from biomedical literature. Arranged for the second time as one of the main tasks of BioNLP Shared Task 2011, it aimed to measure the progress of the community since 2009, and to evaluate generalization of the technology to full text papers. The Protein Coreference task was arranged as one of the supporting tasks, motivated from one of the lessons of the 2009 task that the abundance of coreference structures in natural language text hinders further improvement with the Genia task.
The Genia task received final submissions from 15 teams. The results show that the community has made a significant progress, marking 74% of the best F-score in extracting bio-molecular events of simple structure, e.g., gene expressions, and 45% ~ 48% in extracting those of complex structure, e.g., regulations. The Protein Coreference task received 6 final submissions. The results show that the coreference resolution performance in biomedical domain is lagging behind that in newswire domain, cf. 50% vs. 66% in MUC score. Particularly, in terms of protein coreference resolution the best system achieved 34% in F-score.
Detailed analysis performed on the results improves our insight into the problem and suggests the directions for further improvements.
AIM: To evaluate the relationship between a positive family history of primary liver cancer and hepatocellular carcinoma (HCC) development in Korean HCC patients.
METHODS: We studied a total of 2242 patients diagnosed with HCC between January 1990 and July 2008, whose family history of primary liver cancer was clearly described in the medical records.
RESULTS: Of the 2242 patients, 165 (7.4%) had a positive family history of HCC and 2077 (92.6%) did not. The male to female ratio was 3.6:1, and the major causes of HCC were chronic hepatitis B virus (HBV) infection in 75.1%, chronic hepatitis C virus infection in 13.2% and alcohol in 3.1%. The median ages at diagnosis in the positive- and negative-history groups were 52 years (range: 29-79 years) and 57 years (range: 18-89 years), respectively (P < 0.0001). Furthermore, among 1713 HCC patients with HBV infection, the number of patients under 45 years of age out of 136 patients with positive family history was 26 (19.1%), whereas those out of 1577 patients with negative family history was 197 (12.5%), suggesting that a positive family history may be associated with earlier development of HCC in the Korean population (P = 0.0028).
CONCLUSION: More intensive surveillance maybe recommended to those with a positive family history of HCC for earlier diagnosis and proper management especially when HBV infection is present.
Liver cancer; Hepatocellular carcinoma; Family history; Epidemiology
Hepatocellular carcinoma (HCC) in the caudate lobe remains one of the most intricate locations where various treatments tend to pose problems with regard to the optimal approach. Surgical resection has been regarded as the most effective treatment; however, isolated resection of the caudate lobe is strenuous and associated with a high rate of early recurrence. Percutaneous ablation might be technically difficult or impossible to perform due to the deep location of tumors and adjacent large vessels. Treatment with drug-eluting beads (DEB) can potentially enhance the therapeutic efficacy for patients with unresectable HCC by drawing on the slower, more consistent drug delivery process. We described a case of a 62-year-old man with HCC in the caudate lobe who was successfully treated by DEB.
Carcinoma, Hepatocellular; Chemoembolization; Drug-eluting beads; Caudate lobe
Ischemic colitis is an uncommon complication in patients with systemic lupus erythematosus (SLE). In previously reported cases of colitis caused by SLE, intestinal vasculitis is implicated as the causative process, but is rarely confirmed histologically. We described a case of a 32-year-old man with increased activity of SLE, who presented with hematochezia and abdominal pain due to ischemic colitis with small vessel vasculitis which was proven by sigmoidoscopic biopsy. The clinical course of the patient was improved after steroid and conservative management.
Systemic lupus erythematosus; Ischemic colitis; Vasculitis
The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.
BioHackathon; Bioinformatics; Semantic Web; Web services; Ontology; Visualization; Knowledge representation; Databases; Semantic interoperability; Data models; Data sharing; Data integration
Glycoscience is a research field focusing on complex carbohydrates (otherwise known as glycans)a, which can, for example, serve as “switches” that toggle between different functions of a glycoprotein or glycolipid. Due to the advancement of glycomics technologies that are used to characterize glycan structures, many glycomics databases are now publicly available and provide useful information for glycoscience research. However, these databases have almost no link to other life science databases.
In order to implement support for the Semantic Web most efficiently for glycomics research, the developers of major glycomics databases agreed on a minimal standard for representing glycan structure and annotation information using RDF (Resource Description Framework). Moreover, all of the participants implemented this standard prototype and generated preliminary RDF versions of their data. To test the utility of the converted data, all of the data sets were uploaded into a Virtuoso triple store, and several SPARQL queries were tested as “proofs-of-concept” to illustrate the utility of the Semantic Web in querying across databases which were originally difficult to implement.
We were able to successfully retrieve information by linking UniCarbKB, GlycomeDB and JCGGDB in a single SPARQL query to obtain our target information. We also tested queries linking UniProt with GlycoEpitope as well as lectin data with GlycomeDB through PDB. As a result, we have been able to link proteomics data with glycomics data through the implementation of Semantic Web technologies, allowing for more flexible queries across these domains.
BioHackathon; Carbohydrate; Data integration; Glycan; Glycoconjugate; SPARQL; RDF standard; Carbohydrate structure database
While gastric variceal bleeding (GVB) is not as prevalent as esophageal variceal bleeding, it is reportedly more serious, with high failure rates of the initial hemostasis (>30%), and has a worse prognosis than esophageal variceal bleeding. However, there is limited information regarding hemostasis and the prognosis for GVB. The aim of this study was to determine retrospectively the clinical outcomes of GVB in a multicenter study in Korea.
The data of 1,308 episodes of GVB (males:females=1062:246, age=55.0±11.0 years, mean±SD) were collected from 24 referral hospital centers in South Korea between March 2003 and December 2008. The rates of initial hemostasis failure, rebleeding, and mortality within 5 days and 6 weeks of the index bleed were evaluated.
The initial hemostasis failed in 6.1% of the patients, and this was associated with the Child-Pugh score [odds ratio (OR)=1.619; P<0.001] and the treatment modality: endoscopic variceal ligation, endoscopic variceal obturation, and balloon-occluded retrograde transvenous obliteration vs. endoscopic sclerotherapy, transjugular intrahepatic portosystemic shunt, and balloon tamponade (OR=0.221, P<0.001). Rebleeding developed in 11.5% of the patients, and was significantly associated with Child-Pugh score (OR=1.159, P<0.001) and treatment modality (OR=0.619, P=0.026). The GVB-associated mortality was 10.3%; mortality in these cases was associated with Child-Pugh score (OR=1.795, P<0.001) and the treatment modality for the initial hemostasis (OR=0.467, P=0.001).
The clinical outcome for GVB was better for the present cohort than in previous reports. Initial hemostasis failure, rebleeding, and mortality due to GVB were universally associated with the severity of liver cirrhosis.
Gastric variceal bleeding; Rebleeding; Mortality; Cirrhosis
Current research has shown that major difficulties in event extraction for the biomedical domain are traceable to coreference. Therefore, coreference resolution is believed to be useful for improving event extraction. To address coreference resolution in molecular biology literature, the Protein Coreference (COREF) task was arranged in the BioNLP Shared Task (BioNLP-ST, hereafter) 2011, as a supporting task. However, the shared task results indicated that transferring coreference resolution methods developed for other domains to the biological domain was not a straight-forward task, due to the domain differences in the coreference phenomena.
We analyzed the contribution of domain-specific information, including the information that indicates the protein type, in a rule-based protein coreference resolution system. In particular, the domain-specific information is encoded into semantic classification modules for which the output is used in different components of the coreference resolution. We compared our system with the top four systems in the BioNLP-ST 2011; surprisingly, we found that the minimal configuration had outperformed the best system in the BioNLP-ST 2011. Analysis of the experimental results revealed that semantic classification, using protein information, has contributed to an increase in performance by 2.3% on the test data, and 4.0% on the development data, in F-score.
The use of domain-specific information in semantic classification is important for effective coreference resolution. Since it is difficult to transfer domain-specific information across different domains, we need to continue seek for methods to utilize such information in coreference resolution.
Transcatheter arterial chemoembolization (TACE) has been used widely to treat patients with unresectable hepatocellular carcinoma. However, this method can induce various adverse events caused by necrosis of the tumor itself or damage to nontumor tissues. In particular, neurologic side effects such as cerebral infarction and paraplegia, although rare, may cause severe sequelae and permanent disability. Detailed information regarding the treatment process and prognosis associated with this procedure is not yet available. We experienced a case of paraplegia that occurred after conducting TACE through the intercostal artery to treat hepatocellular carcinoma that had metastasized to the rib. In this case, TACE was attempted to relieve severe bone pain, which had persisted even after palliative radiotherapy. A sudden impairment of sensory and motor functions after TACE developed in the trunk below the level of the sternum and in both lower extremities. The patient subsequently received steroid pulse therapy along with supportive care and continuous rehabilitation. At the time of discharge the patient had recovered sufficiently to enable him to walk by himself, although some paresthesia and spasticity remained.
Hepatocellular carcinoma; TACE; Costal metastasis; Paraplegia
Term clustering, by measuring the string similarities between terms, is known within the natural language processing community to be an effective method for improving the quality of texts and dictionaries. However, we have observed that chemical names are difficult to cluster using string similarity measures. In order to clearly demonstrate this difficulty, we compared the string similarities determined using the edit distance, the Monge-Elkan score, SoftTFIDF, and the bigram Dice coefficient for chemical names with those for non-chemical names.
Our experimental results revealed the following: (1) The edit distance had the best performance in the matching of full forms, whereas Cohen et al. reported that SoftTFIDF with the Jaro-Winkler distance would yield the best measure for matching pairs of terms for their experiments. (2) For each of the string similarity measures above, the best threshold for term matching differs for chemical names and for non-chemical names; the difference is especially large for the edit distance. (3) Although the matching results obtained for chemical names using the edit distance, Monge-Elkan scores, or the bigram Dice coefficients are better than the result obtained for non-chemical names, the results were contrary when using SoftTFIDF. (4) A suitable weight for chemical names varies substantially from one for non-chemical names. In particular, a weight vector that has been optimized for non-chemical names is not suitable for chemical names. (5) The matching results using the edit distances improve further by dividing a set of full forms into two subsets, according to whether a full form is a chemical name or not. These results show that our hypothesis is acceptable, and that we can significantly improve the performance of abbreviation-full form clustering by computing chemical names and non-chemical names separately.
In conclusion, the discriminative application of string similarity methods to chemical and non-chemical names may be a simple yet effective way to improve the performance of term clustering.
Gastric epithelial dysplasia is considered a precancerous lesion with a variable clinical course. There is disagreement, however, regarding histology-based diagnoses, which has led to confusion in choosing a therapeutic plan. New objective markers are needed to determine which lesions progress to true malignancy. We measured LINE-1 methylation levels, which have been reported to strongly correlate with the global methylation level in gastric epithelial dysplasia and intramucosal cancer.
A total of 145 tissue samples were analyzed by two histopathologists. All tissues were excised by therapeutic endoscopic mucosal resection and paired with adjacent normal tissue samples. A modified long interspersed nucleotide elements-combined bisulfite restriction analysis (COBRA-LINE-1) method was used.
Gastric epithelial dysplasia and intramucosal cancer tissues had significantly lower levels of LINE-1 methylation than adjacent normal gastric tissues. High-grade dysplasia and intramucosal cancer were distinguishable from low-grade dysplasia based on LINE-1 methylation levels. Furthermore, the distinction could be determined with high sensitivity and specificity, as shown by the receiver operating characteristic (ROC) curve (AUC, 0.82; 95% confidence interval, 0.74 to 0.88).
LINE-1 methylation levels may provide a diagnostic tool for identifying high-grade dysplasia and intramucosal cancer.
LINE-1 methylation; Gastric epithelial dysplasia; Intramucosal cancer
Summary: Often, the most informative genes have to be selected from different gene sets and several computer gene ranking algorithms have been developed to cope with the problem. To help researchers decide which algorithm to use, we developed the analysis of gene ranking algorithms (AGRA) system that offers a novel technique for comparing ranked lists of genes. The most important feature of AGRA is that no previous knowledge of gene ranking algorithms is needed for their comparison. Using the text mining system finding-associated concepts with text analysis. AGRA defines what we call biomedical concept space (BCS) for each gene list and offers a comparison of the gene lists in six different BCS categories. The uploaded gene lists can be compared using two different methods. In the first method, the overlap between each pair of two gene lists of BCSs is calculated. The second method offers a text field where a specific biomedical concept can be entered. AGRA searches for this concept in each gene lists' BCS, highlights the rank of the concept and offers a visual representation of concepts ranked above and below it.
Availability and Implementation: Available at http://agra.fzv.uni-mb.si/, implemented in Java and running on the Glassfish server.
Transarterial chemoembolization (TACE) has long been used as a palliative therapy for unresectable hepatocellular carcinoma (HCC). High-dose hepatic arterial infusion chemotherapy (HAIC) has showed favorable outcomes in patients with intractable, advanced HCC. The aim of this study was to compare the effectiveness and safety of high-dose HAIC and conventional TACE using doxorubicin for advanced HCC.
The high-dose HAIC group comprised 36 patients who were enrolled prospectively from six institutions. The enrollment criteria were good liver function, main portal vein invasion (including vascular shunt), infiltrative type, bilobar involvement, and/or refractory to prior conventional treatment (TACE, radiofrequency ablation, or percutaneous ethanol injection), and documented progressive disease. Patients received 5-fluorouracil (500 mg/m2 on days 1~3) and cisplatin (60 mg/m2 on day 2 every 4 weeks) via an implantable port system. In the TACE group, 31 patients with characteristics similar to those in the high-dose HAIC group were recruited retrospectively from a single center. Patients underwent a transarterial infusion of doxorubicin every 4~8 weeks.
Overall, 6 patients (8.9%) achieved a partial response and 20 patients (29.8%) had stable disease. The objective response rate (complete response+partial response) was significantly better in the high-dose HAIC group than in the TACE group (16.7% vs. 0%, P=0.030). Overall survival was longer in the high-dose HAIC group than in the TACE group (median survival, 193 vs. 119 days; P=0.026). There were no serious adverse effects in the high-dose HAIC group, while hepatic complications occurred more often in the TACE group.
High-dose HAIC appears to improve the tumor response and survival outcome compared to conventional TACE using doxorubicin in patients with intractable, advanced HCC.
Carcinoma; Hepatocellular; Hepatic arterial infusion chemotherapy; Transarterial chemoembolization; Doxorubicin
A 53-yr-old man presented with a two-day history of odynophagia and a foreign body sensation. Two days before admission, the patient began to experience odynophagia and a foreign body sensation in the chest after swallowing several extremely hot pieces of solid food (prawn) in haste. Endoscopy revealed a huge longitudinal ulcer, typical of friable hyperemic mucosa with necrotic debris along the full length of the esophagus in the posterolateral region. Here we present the clinical course of serial endoscopy of an acute thermal injury of the esophagus caused by solid food.
Acute Thermal Injury; Esophagus; Solid Food
The number of corpora, collections of structured texts, has been increasing, as a result of the growing interest in the application of natural language processing methods to biological texts. Many named entity recognition (NER) systems have been developed based on these corpora. However, in the biomedical community, there is yet no general consensus regarding named entity annotation; thus, the resources are largely incompatible, and it is difficult to compare the performance of systems developed on resources that were divergently annotated. On the other hand, from a practical application perspective, it is desirable to utilize as many existing annotated resources as possible, because annotation is costly. Thus, it becomes a task of interest to integrate the heterogeneous annotations in these resources.
We explore the potential sources of incompatibility among gene and protein annotations that were made for three common corpora: GENIA, GENETAG and AIMed. To show the inconsistency in the corpora annotations, we first tackle the incompatibility problem caused by corpus integration, and we quantitatively measure the effect of this incompatibility on protein mention recognition. We find that the F-score performance declines tremendously when training with integrated data, instead of training with pure data; in some cases, the performance drops nearly 12%. This degradation may be caused by the newly added heterogeneous annotations, and cannot be fixed without an understanding of the heterogeneities that exist among the corpora. Motivated by the result of this preliminary experiment, we further qualitatively analyze a number of possible sources for these differences, and investigate the factors that would explain the inconsistencies, by performing a series of well-designed experiments. Our analyses indicate that incompatibilities in the gene/protein annotations exist mainly in the following four areas: the boundary annotation conventions, the scope of the entities of interest, the distribution of annotated entities, and the ratio of overlap between annotated entities. We further suggest that almost all of the incompatibilities can be prevented by properly considering the four aspects aforementioned.
Our analysis covers the key similarities and dissimilarities that exist among the diverse gene/protein corpora. This paper serves to improve our understanding of the differences in the three studied corpora, which can then lead to a better understanding of the performance of protein recognizers that are based on the corpora.
Associating literature with pathways poses new challenges to the Text Mining (TM) community. There are three main challenges to this task: (1) the identification of the mapping position of a specific entity or reaction in a given pathway, (2) the recognition of the causal relationships among multiple reactions, and (3) the formulation and implementation of required inferences based on biological domain knowledge.
To address these challenges, we constructed new resources to link the text with a model pathway; they are: the GENIA pathway corpus with event annotation and NF-kB pathway. Through their detailed analysis, we address the untapped resource, ‘bio-inference,’ as well as the differences between text and pathway representation. Here, we show the precise comparisons of their representations and the nine classes of ‘bio-inference’ schemes observed in the pathway corpus.
We believe that the creation of such rich resources and their detailed analysis is the significant first step for accelerating the research of the automatic construction of pathway from text.
Advanced Text Mining (TM) such as semantic enrichment of papers, event or relation extraction, and intelligent Question Answering have increasingly attracted attention in the bio-medical domain. For such attempts to succeed, text annotation from the biological point of view is indispensable. However, due to the complexity of the task, semantic annotation has never been tried on a large scale, apart from relatively simple term annotation.
We have completed a new type of semantic annotation, event annotation, which is an addition to the existing annotations in the GENIA corpus. The corpus has already been annotated with POS (Parts of Speech), syntactic trees, terms, etc. The new annotation was made on half of the GENIA corpus, consisting of 1,000 Medline abstracts. It contains 9,372 sentences in which 36,114 events are identified. The major challenges during event annotation were (1) to design a scheme of annotation which meets specific requirements of text annotation, (2) to achieve biology-oriented annotation which reflect biologists' interpretation of text, and (3) to ensure the homogeneity of annotation quality across annotators. To meet these challenges, we introduced new concepts such as Single-facet Annotation and Semantic Typing, which have collectively contributed to successful completion of a large scale annotation.
The resulting event-annotated corpus is the largest and one of the best in quality among similar annotation efforts. We expect it to become a valuable resource for NLP (Natural Language Processing)-based TM in the bio-medical domain.
Combined hepatocellular-cholangiocarcinoma is a rare form of primary liver cancer showing features of both hepatocellular and biliary epithelial differentiation. We report here on a case with collision tumor, which apparently was the coincidental occurrence of both hepatocellular carcinoma and cholangiocarcinoma underlying schistosomiasis. A 39-year-old-Philippine female was transferred to our hospital for evaluation of a liver mass that was found on ultrasonography at a local hospital. HBsAg and Anti-HCV were negative and serum alpha-fetoprotein (AFP) level was normal. The tumor mass was histologically diagnosed as adenocarcinoma by sono-guided biopsy before the operation. Partial lobectomy was performed and we histologically identified the concurrent occurrence of hepatocellular carcinoma and cholangiocarcinoma, (a "collision type carcinoma").
Carcinoma; Hepatocellular; Cholangiocarcinoma; Schistosomiasis