Title : 
Information extraction from nanotoxicity related publications
         
        
            Author : 
Lemin Xiao ; Kaizhi Tang ; Xiong Liu ; Hui Yang ; Zheng Chen ; Xu, Ruimin
         
        
            Author_Institution : 
Intell. Autom. Inc., Rockville, MD, USA
         
        
        
        
        
        
            Abstract : 
High-quality experimental data are important when developing predictive models for studying nanomaterial environmental impact (NEI). Given that raw data from experimental laboratories and manufacturing workplaces are usually proprietary and small-scaled, extracting information from publications is an attractive alternative for collecting data. We developed an information extraction system that can extract useful information from full-text nanotoxicity related publications. This information extraction system consists of five components: raw data transformation into machine readable format, data preprocessing, ontology-based named entity recognition, rule-based numerical attribute extraction from both tables and unstructured text, and relation extraction among entities and attributes. The information extraction system is applied on a dataset made of 94 publications, and results in an acceptable accuracy. By storing extracted data into a table according to relations among the data, a dataset that can be used to predict nanomaterial environmental impact is obtained. Such a system is unique in current nanomaterial community, and can help nanomaterial scientists and practitioners quickly locate useful information they need without spending lots of time reading articles.
         
        
            Keywords : 
data mining; medical computing; nanomedicine; numerical analysis; toxicology; data preprocessing; full-text nanotoxicity; information extraction system; machine readable format; nanomaterial community; nanomaterial environmental impact; nanotoxicity related publications; ontology-based named entity recognition; predictive models; raw data transformation; rule-based numerical attribute extraction; Data mining; Information retrieval; Nanoparticles; Ontologies; Pattern matching; Shape; XML; Nanoinformatics; data mining; information extraction; named entity recognition; nanotoxicity; relation extraction;
         
        
        
        
            Conference_Titel : 
Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
         
        
            Conference_Location : 
Shanghai
         
        
        
            DOI : 
10.1109/BIBM.2013.6732723