Biomedical literature mining is concerned with transforming free text into a structured, machine-readable format, to improve tasks such as information retrieval and extraction. Recent work indicates that there is much interest to also consider image information when mining research articles, as images often depict the results of experiments, and sum up a paper’s key findings. There are several obstacles when mining image information. First, there are many different types of images, such as graphs, gel electrophoresis and microscopy images, diagrams or heat maps. There exists no image publication standard, neither with regard to image resolution, or image file format (images are stored at different resolutions, and in a variety of file formats, such as jpeg, tiff etc). Also, there are no explicit image design guidelines, even though authors seem to follow some universally accepted norms when creating figures such as box plots, heatmaps or gel electrophoresis images.
A unifying element across all biomedical images is image text, i.e. text characters that are embedded in images. Text in images serves several purposes, such as labeling a graph, representing genes in a heat map images, or proteins in a pathway diagram. We have previously shown that extracting image text, and making it available to image search, improves biomedical image retrieval [1
]. In this work, we are concerned with optimizing the performance of a critical step in image text extraction — locating text regions in images, which is known as text detection
in studies on image processing and Optical Character Recognition (OCR).
Generally speaking, text detection is a crucial step in processing textual information in biomedical images. For example, properly finding the text regions is the first stage of a standard OCR pipeline for extracting image text. Determining the location of text is also important for high-level image content understanding, as it is the text location that indicates the meaning of certain image text element, such as the label of the x-versus y-axis in a graph. Practical applications aside, in this paper, we are exclusively concerned with optimizing the performance of text detection, which is a fundamental research problem in image text processing.
In this paper, we introduce a new text detection algorithm suited for biomedical images. We also discuss the methodological details in creating a gold standard biomedical image text detection corpus, and the use of the corpus for evaluating the performance of our algorithm. During the development of the corpus, we laid down clear guidelines on what exactly constitutes an image text region (or element) and how to manually mark the image region linked to the string. We then compared our algorithm against three existing state-of-the-art text detection methods. The evaluation results suggest the advantages of our algorithm for detecting text regions in biomedical images.
1.2. Related Work
1.2.1. Image Text Detection Algorithms
First, we are going to briefly look at prior work on image processing algorithms for image text detection, which is concerned with separating image text elements from other elements in an image. [2
] presented an algorithm for text detection from scene images. In their work, they first detect character components according to gray-level differences and then match the results to standard character patterns captured in a database. Their method is very robust to the font, size and intensity variation in the image texts, but is not able to deal with color and orientation changes. To address the text detection problem for color images, [3
] introduced a connected component-based method for locating texts in a complex color image. Their method analyzes the color histogram of the RGB space to detect text regions. [4
] introduced a neural network based approach for identifying text in color images. To attack the text detection problem for texts with different orientations and other distortions, [5
] describe the use of low level image features such as density and contrast to detect image texts, with the ability to deal with skew in the image text. [6
] also proposed a morphological approach for image text detection, which is robust to the presence of noise, text orientation, skew and curvature.
There is a body of work using advanced texture and graph segmentation methods to detect text in images. For example, [7
] introduce a method for learning texture discrimination masks for image text detection. [8
] used a learning based approach to detect image text through image texture analysis. [9
] introduced a system for image text detection and recognition, which adopts a multi-scale texture segmentation scheme. In their method, a collection of second-order Gaussian derivatives are used to detect candidate text regions, followed by a K-means clustering process and a multi-resolutional stroke generation, filtering and aggregation process to further refine the detected text region. [10
] proposed a graph-based image segmentation algorithm for efficiently separating textual elements from graphical elements in an image. Their algorithm can automatically adapt itself to the image structure variation. [11
] proposed a novel method for text detection and segmentation through using stroke filters for text polarity assessment in analyzing features in local image regions.
There also exists a growing collection of work on text detection from videos or motion images, which are closely related to the image text detection problem studied in this paper. For example, [12
] used a hybrid neural network and projection profile analysis based approach to detect and track text regions in a video. [13
] applied a variety of text detection methods and then fused the individual text detection results together to achieve a robust text detection for videos. [14
] introduced a support vector machine based approach for image text detection in videos. [15
] proposed a coarse-to-fine localization scheme for detecting texts in multilingual videos. Recently, [16
] proposed a discrete cosines transform coefficients based method for text detection in compressed videos. Despite the many commonalities between the video text and image text detection problems, one of the main differences between them is that frame images in a video demonstrate temporal coherence, which offer much useful information for text detection. Such clues are not present in still images, and hence make the image text detection problem more challenging than its counterpart in videos.
1.2.2. Biomedical Image Processing Algorithms and Systems
Our study is related to other projects in biomedical image processing. For example, [17
] used image features for text categorization. [18
] studied the use of natural language processing to index and retrieve molecular images. [19
] described an algorithmic system for accessing fluorescence microscopy images via image classification and segmentation.
In our own prior work [1
], we discussed a novel approach for biomedical image search based on OCR. We have shown that the approach offers additional advantages compared to searching over image captions alone, notably the retrieval of additional and relevant images. The current study is closely linked to that project, discussing the algorithmic details for detecting image text regions.