• DocumentCode
    1993483
  • Title

    Classification of Web documents using a graph model

  • Author

    Schenker, Adam ; Last, Mark ; Bunke, Horst ; Kandel, Abraham

  • Author_Institution
    Dept. of Comput. Sci. & Eng., South Florida Univ., Tampa, FL, USA
  • fYear
    2003
  • fDate
    3-6 Aug. 2003
  • Firstpage
    240
  • Abstract
    In this paper we describe work relating to classification of Web documents using a graph-based model instead of the traditional vector-based model for document representation. We compare the classification accuracy of the vector model approach using the k-nearest neighbor (k-NN) algorithm to a novel approach which allows the use of graphs for document representation in the k-NN algorithm. The proposed method is evaluated on three different Web document collections using the leave-one-out approach for measuring classification accuracy. The results show that the graph-based k-NN approach can outperform traditional vector-based k-NN methods in terms of both accuracy and execution time.
  • Keywords
    Web sites; classification; document handling; graph theory; Web document classification; document representation; graph model; k-nearest neighbor algorithm; leave-one-out approach; vector model; Classification algorithms; Computer science; Context modeling; Costs; Electronic mail; Frequency; Information systems; Natural languages; Organizing; Systems engineering and theory;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
  • Print_ISBN
    0-7695-1960-1
  • Type

    conf

  • DOI
    10.1109/ICDAR.2003.1227666
  • Filename
    1227666