DocumentCode
519722
Title
Useful attributes identification for Unsupervised Information Extraction result set based on REAdaBoost Naïve Bayes
Author
Yin, Wenke ; Zhu, Ming
Author_Institution
Dept. of Autom., Univ. of Sci. & Technol. of China, Hefei, China
Volume
1
fYear
2010
fDate
21-24 May 2010
Abstract
Unsupervised Information Extraction has attracted great attentions in the literature. However, it is inevitable to include useless noise in the result set. Besides, the proportion of useful attributes and the noise in the result set is greatly imbalanced, and the importance of these two types of data is also different. So how to effectively identify the useful attributes becomes an open question. To address this problem, this paper proposes a revised AdaBoost algorithm-REAdaBoost. The weight coefficient of REAdaBoost is not only decided by the precision of useful attributes, but also correlates with the recall for rare attributes. We use Naïve Bayes as the base classifier, and then apply AdaBoost and REAdaBoost to boost it separately. The experiment results show that on the premise of not increasing the overall error rate, REAdaBoost has better performance than AdaBoost and Naïve Bayes in predicting both the useful attributes and the rare attributes.
Keywords
Bayes methods; data mining; pattern classification; AdaBoost algorithm; REAdaBoost naive Bayes; attributes identification; unsupervised information extraction; weight coefficient; 1f noise; Automation; Background noise; Data mining; Error analysis; Explosives; Internet; Large-scale systems; Web pages; Web sites; Classification; Imbalanced Class Distributions; InformationExtraction; REAdaBoost;
fLanguage
English
Publisher
ieee
Conference_Titel
Future Computer and Communication (ICFCC), 2010 2nd International Conference on
Conference_Location
Wuhan
Print_ISBN
978-1-4244-5821-9
Type
conf
DOI
10.1109/ICFCC.2010.5497739
Filename
5497739
Link To Document