Title :
Utilizing image-based features in biomedical document classification
Author :
Kaidi Ma;Hogyeong Jeong;M V Rohith;Gowri Somanath;Ryan Tarpine;Kyle Schutter;Dorothea Blostein;Sorin Istrail;Chandra Kambhamettu;Hagit Shatkay
Author_Institution :
Computational Biomedicine
Abstract :
Images form a rich information source, which remains underutilized in biomedical document classification. We present here work that uses both image- and text-based features in order to identify articles of interest, in this case, pertaining to cis-regulatory modules in the context of gene-networks. Extending on our new idea, which we have recently introduced, of using OCR-based features to identify DNA contents in images, we combine image and text based classifiers to categorize documents as relevant or irrelevant to cis-regulatory modules. Using a set of hundreds of articles, marked by experts as relevant or irrelevant to cis-regulatory modules, we train/test image and text based classifiers, as well as classifiers integrating both. Our results indicate that the latter show the best performance with Recall, F-measure and Utility measures all above 0.9, demonstrating the significance of incorporating image data, and specifically OCR-based features, into the document categorization process. Moreover, the use of character distribution properties to represent images is directly relevant to other biomedical images containing text (e.g. RNA, proteins). Diagrams and other images containing text are also prevalent outside the biomedical domain, hence the work stands to be applicable and beneficial in other application areas.
Keywords :
"DNA","Biomedical imaging","Decision trees","Optical character recognition software","Vegetation","Context","Proteins"
Conference_Titel :
Image Processing (ICIP), 2015 IEEE International Conference on
DOI :
10.1109/ICIP.2015.7351648