DocumentCode :
245044
Title :
Senders, Receivers and Authors in Document Classification
Author :
Drummond, Anna ; Jermaine, Christopher
Author_Institution :
Rice Univ., Houston, TX, USA
fYear :
2014
fDate :
14-17 Dec. 2014
Firstpage :
791
Lastpage :
796
Abstract :
In many document classification problems, sets of people will be associated with the document. These sets might include document authors, or people who have read the document, or the sender of an electronic message, or the recipients of the message, or those carbon copied, or those blind carbon copied. It is obvious that these sets of people can constitute important information that can help to classify the document. In this paper, we propose a simple method for mapping the set of people in a sender or receiver category to a single, low dimensional vector in a latent space. There are many ways that this vector can be used to help with the document classification task, and in the paper we consider three distinct possibilities in detail. We find that mapping a set of senders or receivers to a latent space in this way and incorporating this mapping into a classifier can greatly boost classification accuracy on several real electronic discovery tasks.
Keywords :
document handling; electronic messaging; pattern classification; classification accuracy; document author; document classification; electronic discovery task; electronic message sender; message recipient; Bayes methods; Carbon; Electronic mail; Encoding; Receivers; Support vector machines; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2014 IEEE International Conference on
Conference_Location :
Shenzhen
ISSN :
1550-4786
Print_ISBN :
978-1-4799-4303-6
Type :
conf
DOI :
10.1109/ICDM.2014.149
Filename :
7023402
Link To Document :
بازگشت