Title :
Classification of Web documents using a graph model
Author :
Schenker, Adam ; Last, Mark ; Bunke, Horst ; Kandel, Abraham
Author_Institution :
Dept. of Comput. Sci. & Eng., South Florida Univ., Tampa, FL, USA
Abstract :
In this paper we describe work relating to classification of Web documents using a graph-based model instead of the traditional vector-based model for document representation. We compare the classification accuracy of the vector model approach using the k-nearest neighbor (k-NN) algorithm to a novel approach which allows the use of graphs for document representation in the k-NN algorithm. The proposed method is evaluated on three different Web document collections using the leave-one-out approach for measuring classification accuracy. The results show that the graph-based k-NN approach can outperform traditional vector-based k-NN methods in terms of both accuracy and execution time.
Keywords :
Web sites; classification; document handling; graph theory; Web document classification; document representation; graph model; k-nearest neighbor algorithm; leave-one-out approach; vector model; Classification algorithms; Computer science; Context modeling; Costs; Electronic mail; Frequency; Information systems; Natural languages; Organizing; Systems engineering and theory;
Conference_Titel :
Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
Print_ISBN :
0-7695-1960-1
DOI :
10.1109/ICDAR.2003.1227666