DocumentCode
3576311
Title
A Novel Approach for Email Clustering Based on Semantics
Author
Bin He ; Zefeng Li ; Nan Yang
Author_Institution
Sch. of Inf., Renmin Univ. of China, Beijing, China
fYear
2014
Firstpage
269
Lastpage
272
Abstract
An increasing interest has been recently devoted to clustering short documents. Short documents don´t contain enough text to compute similarities accurately by implementing the most widely used technique called Vector Space Model (VSM). Adding semantics to short documents clustering is one efficient way to solve this problem. However, real life collections are often composed of very short or long documents. For example, the length of email messages for each email user follows a power-law distribution. Long emails and short emails both appear in email corpus. Therefore, both state-of-the-art short documents and long document clustering approaches can´t get a high cluster quality or high efficiency in short and long documents clustering. In order to solve this problem, we propose a novel approach for email clustering based on semantics. Empirical validation shows that our method can obtain high cluster quality and high efficiency in real world email datasets.
Keywords
document handling; electronic mail; natural language processing; pattern clustering; statistical distributions; vectors; document clustering; email clustering; power-law distribution; semantics; vector space model; Algorithm design and analysis; Clustering algorithms; Clustering methods; Data mining; Electronic mail; Semantics; Vectors; conditional similarity; directed graph transformation; email clustering; semantics vector;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Information System and Application Conference (WISA), 2014 11th
Print_ISBN
978-1-4799-5726-2
Type
conf
DOI
10.1109/WISA.2014.56
Filename
7058025
Link To Document