Author_Institution :
Sch. of Comput. Eng. & Sci., Shanghai Univ., Shanghai, China
Abstract :
With the coming era of Big Data, online hot event discovery has emerged to mine the social hot spots on the large-scale web resources. Hot events are naturally evolved over time, and in the meantime, their inherent semantic relations are likely to change. As a result, traditional event detection approaches do not perform well on the dynamic web resources. To overcome these bottlenecks, this paper presents a novel hot event discovery framework to detect hot events online, containing three stages: 1) document preprocessing which selects significant features to represent document content, 2) threshold-resilient document classification, which classifies the incoming documents into topically related events considering event evolution, 3) adaptive splitting document clustering, which is used to timely cluster newly happened hot events. Using online data set from Baidu website, the experiments demonstrate the hot events discovery ability with respect to high accuracy, good scalability and short runtime.
Keywords :
Big Data; Internet; Web sites; data mining; document handling; feature selection; pattern classification; pattern clustering; semantic Web; Baidu Web site; Big Data; adaptive splitting document clustering; document content representation; document preprocessing; feature selection; incoming document classification; large-scale Web resources; online hot event detection; online hot event discovery; semantic relations; social hot spot mining; threshold-resilient document classification; Accuracy; Clustering algorithms; Clustering methods; Communities; Event detection; Semantics; Timing; adaptive splitting clustering; event discovery framework; online event detection; threshold-resilient classification;