• DocumentCode
    2301361
  • Title

    Automatically Detecting Personal Topics by Clustering Emails

  • Author

    Yang, Huijie ; Luo, Junyong ; Yin, Meijuan ; Liu, Yan

  • Author_Institution
    Inf. Sci. & Technol. Inst., Zhengzhou, China
  • Volume
    3
  • fYear
    2010
  • fDate
    6-7 March 2010
  • Firstpage
    91
  • Lastpage
    94
  • Abstract
    Emails play an important role in our daily life. It has been recognized that clustering emails into meaningful groups can greatly save cognitive load to process emails. Mailbox user becomes more and more concerned about how to organize and manage the emails as well as how to mine the meaningful data conveniently and effectively. This paper proposes a novel personal topics detection approach using clustering algorithm. First preprocess the emails and construct the improved email VSM(vector space model) to label the email combining the body and subject in a new method, then adopt the advanced k-means algorithm to cluster the emails and design a kernel-selected algorithm based on the lowest similarity, afterwards we get the appropriate keywords to label the topic of each cluster. Finally, experiments on 20Newsgruops email dataset show the validity of our approach and the experimental results also well match the labeled human clustering result.
  • Keywords
    electronic mail; pattern clustering; automatic personal topic detection; clustering emails; email management; email organization; k-means algorithm; kernel-selected algorithm; vector space model; Algorithm design and analysis; Clustering algorithms; Computer science; Computer science education; Data mining; Educational technology; Humans; Information science; Natural languages; Speech recognition; Email VSM; email clustering; kernel-selected; topic detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Education Technology and Computer Science (ETCS), 2010 Second International Workshop on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-6388-6
  • Electronic_ISBN
    978-1-4244-6389-3
  • Type

    conf

  • DOI
    10.1109/ETCS.2010.238
  • Filename
    5459924