Title :
Automatically Detecting Personal Topics by Clustering Emails
Author :
Yang, Huijie ; Luo, Junyong ; Yin, Meijuan ; Liu, Yan
Author_Institution :
Inf. Sci. & Technol. Inst., Zhengzhou, China
Abstract :
Emails play an important role in our daily life. It has been recognized that clustering emails into meaningful groups can greatly save cognitive load to process emails. Mailbox user becomes more and more concerned about how to organize and manage the emails as well as how to mine the meaningful data conveniently and effectively. This paper proposes a novel personal topics detection approach using clustering algorithm. First preprocess the emails and construct the improved email VSM(vector space model) to label the email combining the body and subject in a new method, then adopt the advanced k-means algorithm to cluster the emails and design a kernel-selected algorithm based on the lowest similarity, afterwards we get the appropriate keywords to label the topic of each cluster. Finally, experiments on 20Newsgruops email dataset show the validity of our approach and the experimental results also well match the labeled human clustering result.
Keywords :
electronic mail; pattern clustering; automatic personal topic detection; clustering emails; email management; email organization; k-means algorithm; kernel-selected algorithm; vector space model; Algorithm design and analysis; Clustering algorithms; Computer science; Computer science education; Data mining; Educational technology; Humans; Information science; Natural languages; Speech recognition; Email VSM; email clustering; kernel-selected; topic detection;
Conference_Titel :
Education Technology and Computer Science (ETCS), 2010 Second International Workshop on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-6388-6
Electronic_ISBN :
978-1-4244-6389-3
DOI :
10.1109/ETCS.2010.238