• DocumentCode
    3436280
  • Title

    An Approach for Text Categorization in Digital Library

  • Author

    Wang, Tao ; Desai, Bipin C.

  • Author_Institution
    Concordia Univ., Montreal
  • fYear
    2007
  • fDate
    6-8 Sept. 2007
  • Firstpage
    21
  • Lastpage
    27
  • Abstract
    Text categorization is a very effective way to organize enormous number of documents in Digital Libraries. Accurate classification of documents is able to not only enhance document search precision, but also facilitate browsing-by- topic functionality. It is, nonetheless, difficult to obtain a satisfactory categorization accuracy compared to the corresponding results given by professional catalogers. This is due largely to the complexity of the pre-defined large-scaled category hierarchies that makes it difficult for learning algorithms to distinguish among categories. This paper describes a top-down document classification approach which takes advantage of the hierarchical structure, more specifically, in two ways: identifying the number of independent local classifiers and guiding top-down classification procedure. We finally evaluate it within the CINDI Digital Library applying ACM Classification System as targeted hierarchy. Experimental results show the promise of this approach.
  • Keywords
    digital libraries; text analysis; browsing-by-topic functionality; digital library; document search precision; documents classification; large-scaled category hierarchies; learning algorithms; text categorization; Chromium; Classification tree analysis; Computer science; Learning systems; Neural networks; Probabilistic logic; Software libraries; Support vector machine classification; Support vector machines; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database Engineering and Applications Symposium, 2007. IDEAS 2007. 11th International
  • Conference_Location
    Banff, Alta.
  • ISSN
    1098-8068
  • Print_ISBN
    978-0-7695-2947-9
  • Type

    conf

  • DOI
    10.1109/IDEAS.2007.4318085
  • Filename
    4318085