• DocumentCode
    3576311
  • Title

    A Novel Approach for Email Clustering Based on Semantics

  • Author

    Bin He ; Zefeng Li ; Nan Yang

  • Author_Institution
    Sch. of Inf., Renmin Univ. of China, Beijing, China
  • fYear
    2014
  • Firstpage
    269
  • Lastpage
    272
  • Abstract
    An increasing interest has been recently devoted to clustering short documents. Short documents don´t contain enough text to compute similarities accurately by implementing the most widely used technique called Vector Space Model (VSM). Adding semantics to short documents clustering is one efficient way to solve this problem. However, real life collections are often composed of very short or long documents. For example, the length of email messages for each email user follows a power-law distribution. Long emails and short emails both appear in email corpus. Therefore, both state-of-the-art short documents and long document clustering approaches can´t get a high cluster quality or high efficiency in short and long documents clustering. In order to solve this problem, we propose a novel approach for email clustering based on semantics. Empirical validation shows that our method can obtain high cluster quality and high efficiency in real world email datasets.
  • Keywords
    document handling; electronic mail; natural language processing; pattern clustering; statistical distributions; vectors; document clustering; email clustering; power-law distribution; semantics; vector space model; Algorithm design and analysis; Clustering algorithms; Clustering methods; Data mining; Electronic mail; Semantics; Vectors; conditional similarity; directed graph transformation; email clustering; semantics vector;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Information System and Application Conference (WISA), 2014 11th
  • Print_ISBN
    978-1-4799-5726-2
  • Type

    conf

  • DOI
    10.1109/WISA.2014.56
  • Filename
    7058025