Title :
FigSearch: using maximum entropy classifier to categorize biological figures
Author :
Liu, Fang ; Jenssen, Tor-Kristian ; Nygaard, Vegard ; Sack, John ; Hovig, Eivind
Author_Institution :
Norwegian Radium Hosp., Oslo, Norway
Abstract :
Figures in scientific papers represent an intuitive and concise way of knowledge presentation. With more attention being paid on full-text mining in bioinformatics, we initiated an effort of studying figures in full articles. FigSearch is a prototype figure legend indexing and classification system, using both text-mining and supervised machine learning. We defined schematic representations of protein interactions and signaling events as an interesting figure type. A maximum entropy classifier was used in categorizing each figure, by assigning an estimated likelihood, as being relevant/non-relevant according to our definition. One advantage of the maximum entropy principle is that it provides a probability of decision, instead of a binary assignment. In our pilot study, FigSearch showed satisfactory performance in a preliminary validation by domain experts. Such a system can be useful in applications such as for a publisher´s website, in bio-picture gallery constructions, or as an aid for other complicated text-mining projects.
Keywords :
biology computing; classification; data mining; entropy; indexing; learning (artificial intelligence); molecular biophysics; proteins; FigSearch; bioinformatics; biological figures; biopicture gallery constructions; classification system; figure legend indexing system; full-text mining; knowledge presentation; likelihood estimation; maximum entropy classifier; protein interactions; protein signaling events; publisher website; scientific papers; supervised machine learning; Bioinformatics; Cancer; Entropy; Hospitals; Indexing; Machine learning; Milling machines; Neoplasms; Proteins; Prototypes;
Conference_Titel :
Computational Systems Bioinformatics Conference, 2004. CSB 2004. Proceedings. 2004 IEEE
Print_ISBN :
0-7695-2194-0
DOI :
10.1109/CSB.2004.1332465