• DocumentCode
    245044
  • Title

    Senders, Receivers and Authors in Document Classification

  • Author

    Drummond, Anna ; Jermaine, Christopher

  • Author_Institution
    Rice Univ., Houston, TX, USA
  • fYear
    2014
  • fDate
    14-17 Dec. 2014
  • Firstpage
    791
  • Lastpage
    796
  • Abstract
    In many document classification problems, sets of people will be associated with the document. These sets might include document authors, or people who have read the document, or the sender of an electronic message, or the recipients of the message, or those carbon copied, or those blind carbon copied. It is obvious that these sets of people can constitute important information that can help to classify the document. In this paper, we propose a simple method for mapping the set of people in a sender or receiver category to a single, low dimensional vector in a latent space. There are many ways that this vector can be used to help with the document classification task, and in the paper we consider three distinct possibilities in detail. We find that mapping a set of senders or receivers to a latent space in this way and incorporating this mapping into a classifier can greatly boost classification accuracy on several real electronic discovery tasks.
  • Keywords
    document handling; electronic messaging; pattern classification; classification accuracy; document author; document classification; electronic discovery task; electronic message sender; message recipient; Bayes methods; Carbon; Electronic mail; Encoding; Receivers; Support vector machines; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2014 IEEE International Conference on
  • Conference_Location
    Shenzhen
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4799-4303-6
  • Type

    conf

  • DOI
    10.1109/ICDM.2014.149
  • Filename
    7023402