• DocumentCode
    3228734
  • Title

    Analyzing the Effect of Document Representation on Machine Learning Approaches in Multi-Class e-Mail Filtering

  • Author

    Berger, Helmut ; Dittenbach, Michael ; Merkl, Dieter

  • Author_Institution
    iSpaces Res. Group, E-Commerce Competence Center, Wien
  • fYear
    2006
  • fDate
    18-22 Dec. 2006
  • Firstpage
    297
  • Lastpage
    300
  • Abstract
    This paper reports on experiments in multi-class document categorization with supervised machine learning techniques. The document collection consists of of a set of personal e-mail messages. Two distinct document representation formalisms are employed to characterize these messages, namely a standard word-based approach and a character n-gram document representation. Based on these document representations, the categorization performance of five machine learning approaches is assessed and a comparison is given. In principle, both document representation yielded comparable results with the various classifiers. However, the results for the n-gram-based document representation were definitely better in case of an aggressive feature selection strategy
  • Keywords
    electronic mail; learning (artificial intelligence); text analysis; aggressive feature selection strategy; character n-gram document representation; document representation formalism; multiclass document categorization; multiclass e-mail filtering; personal e-mail message; supervised machine learning approach; word-based approach; Automation; Cities and towns; Classification algorithms; Electronic mail; Filtering; Filters; Machine learning; Research and development; Sorting; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    0-7695-2747-7
  • Type

    conf

  • DOI
    10.1109/WI.2006.41
  • Filename
    4061380