DocumentCode :
685908
Title :
Topic Detection Based on Group Average Hierarchical Clustering
Author :
Ni Gao ; Ling Gao ; Yiyue He ; Hai Wang ; Qian Sun
Author_Institution :
Dept. of Inf. Sci. & Technol., Northwest Univ., Xi´an, China
fYear :
2013
fDate :
13-15 Dec. 2013
Firstpage :
88
Lastpage :
92
Abstract :
Via analyzing characters of vast disaster news on the Internet, a new topic detection algorithm based on Group Average Hierarchical Clustering (GAHC), which is suitable for the processing of big data on the network, is proposed in this paper. The core idea of GAHC is to divide big data into smaller groups, and then cluster groups hierarchically to generate final topics. During the process of clustering, vector space modal is used to represent news documents, and a similarity calculation model based on weights of time and place is proposed. The new algorithm can automatically organize similar disaster news materials, generate news topics, furthermore provide personalized service for users and form the topic detection system for disaster news. Experimental results demonstrate that the performance of the algorithm is good.
Keywords :
Big Data; Internet; disasters; document handling; pattern clustering; Big Data processing; GAHC; Internet; automatic similar disaster news material organization; disaster news character analysis; group average hierarchical clustering; news document representation; news topic generation; personalized service; place weights; similarity calculation model; time weights; topic detection system; vector space modal; Algorithm design and analysis; Clustering algorithms; Heuristic algorithms; Internet; Rocks; Vectors; GAHC; big data; clustering; topic detection;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Cloud and Big Data (CBD), 2013 International Conference on
Conference_Location :
Nanjing
Print_ISBN :
978-1-4799-3260-3
Type :
conf
DOI :
10.1109/CBD.2013.38
Filename :
6824578
Link To Document :
بازگشت