• DocumentCode
    2370400
  • Title

    Ontologies improve text document clustering

  • Author

    Hotho, Andreas ; Staab, Steffen ; Stumme, Gerd

  • Author_Institution
    Inst. fur Angewandte Inf. und Formale Beschreibungsverfahren, Karlsruhe Univ., Germany
  • fYear
    2003
  • fDate
    19-22 Nov. 2003
  • Firstpage
    541
  • Lastpage
    544
  • Abstract
    Text document clustering plays an important role in providing intuitive navigation and browsing mechanisms by organizing large sets of documents into a small number of meaningful clusters. The bag of words representation used for these clustering methods is often unsatisfactory as it ignores relationships between important terms that do not cooccur literally. In order to deal with the problem, we integrate core ontologies as background knowledge into the process of clustering text documents. Our experimental evaluations compare clustering techniques based on pre-categorizations of texts from Reuters newsfeeds and on a smaller domain of an eLearning course about Java. In the experiments, improvements of results by background knowledge compared to a baseline without background knowledge can be shown in many interesting combinations.
  • Keywords
    data mining; distance learning; document handling; pattern clustering; Java elearning course; Reuters newsfeeds; data mining; information browsing; information navigation; ontology; text document clustering; Clustering algorithms; Clustering methods; Electronic learning; Information retrieval; Java; Knowledge management; Navigation; Ontologies; Organizing; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2003. ICDM 2003. Third IEEE International Conference on
  • Print_ISBN
    0-7695-1978-4
  • Type

    conf

  • DOI
    10.1109/ICDM.2003.1250972
  • Filename
    1250972