Title :
Robust segmentation of biomedical figures for image-based document retrieval
Author :
Lopez, Luis D. ; Yu, Jingyi ; Tudor, Catalina O. ; Arighi, Cecilia N. ; Huang, Hongzhan ; Vijay-Shanker, K. ; Wu, Cathy H.
Author_Institution :
Dept. of Comput. & Inf. Sci., Univ. of Delaware, Newark, DE, USA
Abstract :
Figures play an important role in illustrating concepts, methodology and results in biomedicai literature. However, figures in biomedicai literature are often composed of multiple subfigures (panels), which may illustrate diverse methodologies or results. Robust and accurate panel partitioning is crucial to support article categorization based on methods or experimental results and to provide the evidence source for derived assertions. But, it is a challenging task. In this paper, we present a comprehensive framework for harvesting multimodal panels in biomedicai literature, and demonstrate its application to protein-protein interaction (PPI)-related literature as a use case. A unique feature of our solution is that we combine pixel-level representations of images with figure captions. Our approach first analyzes figure captions to identify the label style used to mark panels. We then use pixel-level representations to partition a figure into a set of bounding boxes of connected components. We also perform a lexical analysis on the text within the figure to locate panel labels that match the caption analysis results. Finally, we estimate the optimal panel layout and use the layout to partition the figure. We tested our system on a dataset provided by the Molecular INTeraction database (MINT), and show that our approach surpasses pure caption-based and pure image-based approaches, achieving a 96.64% precision.
Keywords :
image segmentation; medical image processing; molecular biophysics; proteins; MINT; Molecular INTeraction database; PPI related literature; article categorization; biomedical figure segmentation; biomedical literature; image based document retrieval; panel partitioning; pixel level representation; protein-protein interaction; Databases; Image segmentation; Layout; Optical character recognition software; Protein engineering; Proteins; Robustness; Literature mining; biomedicai image analysis; database curation; figure panels; image processing; image segmentation; protein-protein interaction;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2012 IEEE International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
978-1-4673-2559-2
Electronic_ISBN :
978-1-4673-2558-5
DOI :
10.1109/BIBM.2012.6392706