DocumentCode :
3337118
Title :
Multi-Inductive Learning approach for Information Extraction
Author :
Muludi, K. ; Widyantoro, Dwi H. ; Kuspriyanto, K. ; Santoso, O. Setiono
Author_Institution :
Comput. Center, Univ. of Lampung, Bandar, Indonesia
fYear :
2011
fDate :
17-19 July 2011
Firstpage :
1
Lastpage :
6
Abstract :
The vast amount of information in the Internet is not easy to find and use. Information Extraction technology is one of alternatives that can solve this problem. Conventional Natural Language Processing approach is hampered by its portability, scalability and adaptability. Introduction of Machine Learning into Information Extraction is one of solutions. Inductive Learning only needs annotated training examples. The problem is there is no performance consistency of algorithms on various information domains. Automatic and smart classifier selection from various machine learning algorithms is one of the best way to handle this problem. The goal of this paper is to propose a method for Information Extraction System based on Inductive Learning and Meta Learning that have good performance. In this paper Multi-Inductive Learning is developed to answer that question. Multi-Inductive Learning is consist of several Inductive Learning algorithms that have significant difference in their mechanism. This is to ensure there is bias variance in this method. Through k-fold cross validation on training document, Multi-Inductive Learning algorithm can choose the best classifier for each slot on a certain domain. These best classifiers then employ to do full extraction on testing document. The conducted experiment shows that Multi-Inductive Learning has better performance than that of single Inductive Learning algorithm-based Information Extraction systems. On Reuters Corporate Acquisition, Multi-Inductive Learning gives a score of 46.3 % and has the best performance among other state of the art information systems. Out of nine slots that should be extracted, six of them give the best performance. Multi-Inductive Learning also gives better performance on Job Posting dataset. Average performance of it gives 82.1 % and is the best among other state of the art of Information Extraction. Out of 17 slots that should be tested, nine of them are extracted with the best performance.
Keywords :
Internet; information retrieval; learning (artificial intelligence); pattern classification; Internet; Reuters corporate acquisition; bias variance; classifier selection; information extraction; job posting dataset; k-fold cross validation; machine learning; meta learning; multiinductive learning approach; Classification algorithms; Data mining; Feature extraction; Heuristic algorithms; Learning systems; Testing; Training; Information Extraction; inductive learning; meta learning; multi inductive learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electrical Engineering and Informatics (ICEEI), 2011 International Conference on
Conference_Location :
Bandung
ISSN :
2155-6822
Print_ISBN :
978-1-4577-0753-7
Type :
conf
DOI :
10.1109/ICEEI.2011.6021680
Filename :
6021680
Link To Document :
بازگشت