DocumentCode :
3487761
Title :
Graphical Figure Classification Using Data Fusion for Integrating Text and Image Features
Author :
Beibei Cheng ; Stanley, R. Joe ; Antani, Sameer ; Thoma, George R.
Author_Institution :
Dept. of Electr. & Comput. Eng., Missouri Univ. of Sci. & Technol., Rolla, MO, USA
fYear :
2013
fDate :
25-28 Aug. 2013
Firstpage :
693
Lastpage :
697
Abstract :
This paper describes a multimodal (image + text) learning approach for automatically identifying three graphical figure types commonly found in biomedical literature, namely, diagrams, statistical figures and flow charts. The goal is to improve retrieval of figures from biomedical journal articles. In this article, we describe a data fusion approach to combine information from both text and image sources, believed to contain complementary information. Text information about the image is extracted from the figure caption. The data fusion process includes a hybrid of evolutionary algorithm (EA) and Binary Particle Swarm Optimization (BPSO) called method applied to find an optimal subset of extracted image features. Chi-square statistic and information gain metric are used to select the optimal subset of extracted text features, which along with image features are input to Multi-Layer Perceptron Neural Network classifiers, whose outputs are characterized as fuzzy sets to determine the final classification result. Evaluation performed on 1707 figure images extracted from a test subset of Biome Central® journals extracted from U.S. National Library of Medicine´s PubMed Central ® repository yielded classification accuracy as high as 96.1%.
Keywords :
electronic publishing; evolutionary computation; feature extraction; fuzzy set theory; image classification; image fusion; image retrieval; learning (artificial intelligence); medical information systems; multilayer perceptrons; particle swarm optimisation; statistical analysis; text analysis; BPSO; BioMedCentral® journals; Chi-square statistic; EA; US National Library of Medicine´s PubMed Central ® repository; automatic graphical figure type identification; binary particle swarm optimization; biomedical journal articles; biomedical literature; complementary information; data fusion process; diagrams; evolutionary algorithm; figure retrieval; flow charts; fuzzy sets; graphical figure classification; hybrid method; image feature extraction; image features; information gain metric; multilayer perceptron neural network classifier; multimodal learning approach; optimal subset; optimal subset selection; statistical figures; text feature extraction; text information extraction; Accuracy; Biomedical imaging; Data integration; Data mining; Feature extraction; Fuzzy logic; Graphics; Binary Particle Swarm Optimization (BPSO); Data Fusion; Evolutionary Algorithm (EA); Feature Selection; Fuzzy Set Intersection; Fuzzy Set Union; Image Processing; Multi-Layer Perceptron Neural Network (MLP-NN);
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
ISSN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2013.142
Filename :
6628707
Link To Document :
بازگشت