Title :
Effective topic modeling for email
Author :
Hiep Hong ; Teng-Sheng Moh
Author_Institution :
Dept. of Comput. Sci., San Jose State Univ., San Jose, CA, USA
Abstract :
Emails have been increasingly popular and have become an indispensible tool for communication and document exchange. Because of its convenience, people use emails every day at work, at school, and for personal matters. Consequently, the number of emails people receive daily keeps on increasing, causing them to spend more time organizing the emails. People often need to classify and move email into folders so that they can go back and read them later. Most email client tools available today allow the users to filter and organize emails by defining rules on how to handle incoming emails. However, this manual process requires users to know their expected emails very well, and to make good use of these tools users need to understand how filtering rules work and how to apply them correctly. In reality, most users do not know what their incoming emails will be. The work described in this paper aims to take the burden of organizing emails away from users by using the Latent Dirichlet Allocation (LDA) [10] to automatically extract topics from emails and group them into folders of common topics. Experiments have shown that the proposed method is able to correctly group emails in appropriate topics with 77% accuracy.
Keywords :
e-mail filters; LDA; automatic topic extraction; e-mail client tools; e-mail filter; e-mail grouping; e-mail handling; e-mail organization; latent Dirichlet allocation; topic modeling; Context; Electronic mail; Filtering; Manuals; Resource management; Training; computer applications; data mining; database systems; information filtering; natural language processing; probability; random variables; text analysis; text processing;
Conference_Titel :
High Performance Computing & Simulation (HPCS), 2015 International Conference on
Conference_Location :
Amsterdam
Print_ISBN :
978-1-4673-7812-3
DOI :
10.1109/HPCSim.2015.7237060