Title :
An improved Naive Bayesian algorithm for Web page text classification
Author :
He Youquan ; Xie Jianfang ; Xu Cheng
Author_Institution :
Inf. Sci. & Eng. Dept., Chongqing Jiaotong Univ., Chongqing, China
Abstract :
This paper studies the process and methods of text classification. Based on Naive Bayesian algorithm and the semi-structured feature in Web page information, this paper proposes an improved Algorithm for Web page text Information classification which utilizes Html tag Information in classification. Experiments show that this algorithm is feasible and effective and can apply to information extraction in topic search engine, which can enhance the theme fitness of the search results and further improve the searching efficiency.
Keywords :
Bayes methods; Web sites; information retrieval; pattern classification; search engines; text analysis; HTML tag information; Naive Bayesian algorithm; Web page text Information classification; information extraction; search engine; semistructured feature; Accuracy; Algorithm design and analysis; Bayesian methods; Classification algorithms; Text categorization; Web pages; Naive Bayesian; Text classification; Web page;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-61284-180-9
DOI :
10.1109/FSKD.2011.6019801