Title :
Image-based automated chemical database annotation with ensemble of machine-vision classifiers
Author :
Park, Jungkap ; Saitou, Kazuhiro ; Rosania, Gus
Author_Institution :
Dept. of Mech. Eng., Univ. of Michigan, Ann Arbor, MI, USA
Abstract :
This paper presents an image-based annotation strategy for automated annotation of chemical databases. The proposed strategy is based on the use of a machine vision-based classifier for extracting a 2D chemical structure diagram in research articles and converting them into standard chemical file formats, a virtual “Chemical Expert” system for screening the converted structures based on the level of estimated conversion accuracy, and a fragment-based measure for calculation intermolecular similarity. In particular, in order to overcome limited accuracies of individual machine-vision classifier, inspired by ensemble methods in machine learning, it is attempted to use of the ensemble of machine-vision classifiers. For annotation, calculated chemical similarity between the converted structures and entries in a virtual small molecule database is used to establish the links. Annotation test to link 121 journal articles to entries in PubChem database demonstrates that ensemble approach increases the coverage of annotation, while keeping the annotation quality (e.g., recall and precision rates) comparable to using a single machine-vision classifier.
Keywords :
chemistry computing; computer vision; document image processing; expert systems; image classification; learning (artificial intelligence); virtual reality; visual databases; 2D chemical structure diagram; PubChem database; annotation quality; annotation test; automated annotation; calculation intermolecular similarity; chemical databases; chemical similarity; conversion accuracy; ensemble methods; fragment-based measure; image-based annotation strategy; image-based automated chemical database annotation; machine learning; machine-vision classifiers; research articles; single machine-vision classifier; standard chemical file formats; virtual chemical expert system; virtual small molecule database; Accuracy; Chemicals; Classification algorithms; Databases; Expert systems; Optical character recognition software; Periodic structures;
Conference_Titel :
Automation Science and Engineering (CASE), 2010 IEEE Conference on
Conference_Location :
Toronto, ON
Print_ISBN :
978-1-4244-5447-1
DOI :
10.1109/COASE.2010.5584695