Title :
FAQ Extracting and Domain Filtering Based on Improved Bayes
Author :
Yu, Zhengtao ; Zong, Huanyun ; Xu, Yangbo ; Guo, Jianyi ; Mao, Yu ; Meng, Xiangyan
Author_Institution :
Sch. of Inf. Eng. & Autom., Kunming Univ. of Sci. & Technol., Kunming, China
Abstract :
FAQ (frequently asked questions) is the basis of question answering system (QA) that oriented frequently asked questions database. For the FAQ is difficult to collect and organize, this paper proposed an automatic acquisition method of domain FAQ based on improved Bayes. Parsing HTML pages into DOM tree, combining with the restricted domain knowledge base, extracting the node information and structural characteristics of DOM tree as the classified feature, using the improved Bayesian classified learning algorithm, constructing the classification model, acquiring FAQ from the HTML page automatically and filtering out the domain FAQ , the experimental results of this method show that it has a remarkable effect.
Keywords :
Bayes methods; database management systems; information filtering; learning (artificial intelligence); automatic acquisition method; domain knowledge base; frequently asked questions database; improved Bayesian classified learning algorithm; node information; question answering system; structural characteristics; Classification tree analysis; Data engineering; Data mining; Databases; HTML; Information filtering; Information filters; Information systems; Internet; Space technology; FAQ Domain Filtering; FAQ Extracting; Improved Bayes; Question Answering Syste; Restricted domain;
Conference_Titel :
Web Information Systems and Mining, 2009. WISM 2009. International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3817-4
DOI :
10.1109/WISM.2009.30