Title :
Novel Method for Improving Web Text Classifiers Performance Through Machine Learning
Author :
Moradi, Parham ; Abdollahzadeh, Ahmad ; Shiri, Mohammad Ibrahim
Author_Institution :
Dept. of Comput. Sci., Amir Kabir Univ. of Technol., Tehran
Abstract :
Automatic text classification means assigning text documents to the categories automatically. Web documents are a kind of text documents but they differ in two ways. First, Web documents are structured documents. Second, Web documents have relationship with each other through hyperlinks. In this article we propose a novel method for Web text classification. Our proposed method enhances classifier performance in two steps. First, we try to use Web graph information to create a virtual page for target Web page and use it instead of target Web page. Then we learn classifiers with these virtual pages. Second, we use different classifier methods such as naive Bayes, decision tree, ripper rule learner and SVM and learn these classifiers with different virtual pages. Then we use meta classifier to get all classifier results then combine these results with voting methods. Our experiments show that meta classifier improves classifier performance
Keywords :
Web sites; classification; learning (artificial intelligence); text analysis; Web graph; Web mining; Web text classification; data mining; machine learning; Classification tree analysis; Computer science; Decision trees; Machine learning; Support vector machine classification; Support vector machines; Testing; Text categorization; Voting; Web pages; Data Mining; Machine Learning; Meta Classifier; Virtual Pages; Web Mining; Web Text Classification Web Documents;
Conference_Titel :
Information and Communication Technologies, 2006. ICTTA '06. 2nd
Conference_Location :
Damascus
Print_ISBN :
0-7803-9521-2
DOI :
10.1109/ICTTA.2006.1684427