DocumentCode
245044
Title
Senders, Receivers and Authors in Document Classification
Author
Drummond, Anna ; Jermaine, Christopher
Author_Institution
Rice Univ., Houston, TX, USA
fYear
2014
fDate
14-17 Dec. 2014
Firstpage
791
Lastpage
796
Abstract
In many document classification problems, sets of people will be associated with the document. These sets might include document authors, or people who have read the document, or the sender of an electronic message, or the recipients of the message, or those carbon copied, or those blind carbon copied. It is obvious that these sets of people can constitute important information that can help to classify the document. In this paper, we propose a simple method for mapping the set of people in a sender or receiver category to a single, low dimensional vector in a latent space. There are many ways that this vector can be used to help with the document classification task, and in the paper we consider three distinct possibilities in detail. We find that mapping a set of senders or receivers to a latent space in this way and incorporating this mapping into a classifier can greatly boost classification accuracy on several real electronic discovery tasks.
Keywords
document handling; electronic messaging; pattern classification; classification accuracy; document author; document classification; electronic discovery task; electronic message sender; message recipient; Bayes methods; Carbon; Electronic mail; Encoding; Receivers; Support vector machines; Vectors;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining (ICDM), 2014 IEEE International Conference on
Conference_Location
Shenzhen
ISSN
1550-4786
Print_ISBN
978-1-4799-4303-6
Type
conf
DOI
10.1109/ICDM.2014.149
Filename
7023402
Link To Document