DocumentCode
1993483
Title
Classification of Web documents using a graph model
Author
Schenker, Adam ; Last, Mark ; Bunke, Horst ; Kandel, Abraham
Author_Institution
Dept. of Comput. Sci. & Eng., South Florida Univ., Tampa, FL, USA
fYear
2003
fDate
3-6 Aug. 2003
Firstpage
240
Abstract
In this paper we describe work relating to classification of Web documents using a graph-based model instead of the traditional vector-based model for document representation. We compare the classification accuracy of the vector model approach using the k-nearest neighbor (k-NN) algorithm to a novel approach which allows the use of graphs for document representation in the k-NN algorithm. The proposed method is evaluated on three different Web document collections using the leave-one-out approach for measuring classification accuracy. The results show that the graph-based k-NN approach can outperform traditional vector-based k-NN methods in terms of both accuracy and execution time.
Keywords
Web sites; classification; document handling; graph theory; Web document classification; document representation; graph model; k-nearest neighbor algorithm; leave-one-out approach; vector model; Classification algorithms; Computer science; Context modeling; Costs; Electronic mail; Frequency; Information systems; Natural languages; Organizing; Systems engineering and theory;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
Print_ISBN
0-7695-1960-1
Type
conf
DOI
10.1109/ICDAR.2003.1227666
Filename
1227666
Link To Document