DocumentCode :
2019080
Title :
Large-Scale SMS Messages Mining Based on Map-Reduce
Author :
Xia, Tian
Author_Institution :
Key Lab. of Data Eng. & Knowledge Eng., Renmin Univ. of China, Beijing
Volume :
1
fYear :
2008
fDate :
17-18 Oct. 2008
Firstpage :
7
Lastpage :
12
Abstract :
Mining the popular SMS messages in a short period of time is very valuable. However, traditional OLAP-based mining method is not suitable for this very large scale dataset. In this paper, we present a mining approach based on Map-Reduce parallel framework: Firstly, original dataset is pre-processed and grouped by the senders´ mobile numbers. Secondly, we do a transformation to regroup the dataset by the short content keys, and then extract the popular messages according to the count of different senders which have the same key. Furthermore, we propose a sentence similarity computation method and a novel Forward Merging and K-Neighbor Checking algorithm to merge the similar messages semantically. Experimental results show that the final dataset of popular messages is very small with high sending coverage ratio, and can meet the real requirements.
Keywords :
data mining; electronic messaging; merging; parallel programming; very large databases; forward merging; k-neighbor checking algorithm; large-scale SMS messages mining; map-reduce parallel framework; sentence similarity computation method; Computational intelligence; Data engineering; Data mining; Design engineering; File systems; Laboratories; Large-scale systems; Merging; Phased arrays; Search engines; Hadoop; Map-Reduce; SMS Messages Mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Design, 2008. ISCID '08. International Symposium on
Conference_Location :
Wuhan
Print_ISBN :
978-0-7695-3311-7
Type :
conf
DOI :
10.1109/ISCID.2008.9
Filename :
4725545
Link To Document :
بازگشت