Title :
CMPK: A High Accuracy Microblog User Classification Method for Professional Analysis
Author :
Ying Peng ; Haiquan Wang
Author_Institution :
Sch. of Comput. Sci. & Eng., Beihang Univ., Beijing, China
Abstract :
Analyzing and mining the massive data recorded in microblog in order to discover the characteristics and rules of individual behaviors, group behaviors and interactive behaviors is now the research hotspot of massive data mining and behavioral analysis area. However, the influence of social attributes, such as user´s occupation, to his behavior and social relations is always neglected in the existing researches. Concerning this issue, the paper proposed a high accuracy microblog user classification method for professional analysis - CMPK (Classification Method based on Professional lexicon and K-nearest neighbor algorithm), this method uses vector space model combined with the professional lexicon and KNN (K-Nearest Neighbor algorithm) classification algorithm to analyze the industry that the microblog user belongs to based on all kinds of information he put on the network. The experiment proved that the accuracy rate of CMPK is nearly 90% which is high precision.
Keywords :
Web sites; behavioural sciences computing; data analysis; data mining; learning (artificial intelligence); pattern classification; CMPK method; behavioral analysis; classification method based on professional lexicon and k-nearest neighbor algorithm; group behaviors; individual behaviors; interactive behaviors; massive data analysis; massive data mining; microblog; microblog user classification method; professional analysis; Classification algorithms; Computer architecture; Computers; Feature extraction; Industries; Support vector machine classification; Training; K-Nearest Neighbor algorithm; text mining; user classification; vector space model;
Conference_Titel :
Cloud and Service Computing (CSC), 2013 International Conference on
Conference_Location :
Beijing
DOI :
10.1109/CSC.2013.28