Title :
Automatic Mining of Human Activity Attributes from Weblogs
Author :
The, Nguyen Minh ; Kawamura, Takahiro ; Nakagawa, Hiroyuki ; Tahara, Yasuyuki ; Ohsuga, Akihiko
Author_Institution :
Grad. Sch. of Inf. Syst., Univ. of Electro-Commun., Chofu, Japan
Abstract :
In this paper, we define an activity by five basic attributes: actor, action, object, time and location. The goal of this paper is to describe a method to automatically extract all attributes in each sentence retrieved from Japanese weblogs. Previous work had some limitations, such as high setup cost, inability of extracting all attributes, limitation on the types of sentences that can be handled, and insufficient consideration of interdependency among attributes. To resolve these problems, this paper proposes a novel approach that uses conditional random fields and self-supervised learning. This approach treats the activity extraction as a sequence labeling problem, and has advantages such as domain-independence, scalability, and does not require any hand-tagged data. Since it is unnecessary to fix the positions and the number of the attributes in activity sentences, this approach can extract all attributes by making only a single pass over its corpus. Additionally, by converting to simpler sentences, the proposed approach can deal with complex sentences retrieved from Japanese weblogs. In an experiment, this approach achieves high precision (activity: 88.87%, attributes: over 90%).
Keywords :
Web sites; data mining; information retrieval; learning (artificial intelligence); semantic networks; text analysis; Japanese Weblogs; activity extraction; automatic mining; human activity attribute; self-supervised learning; semantic network; sentence retrieval; sequence labeling problem; Data mining; Feature extraction; Logic gates; Markov processes; Syntactics; Testing; Training data; Conditional Random Fields; Human Activity; Self-Supervised Learning; Semantic Network; Web Mining;
Conference_Titel :
Computer and Information Science (ICIS), 2010 IEEE/ACIS 9th International Conference on
Conference_Location :
Yamagata
Print_ISBN :
978-1-4244-8198-9
DOI :
10.1109/ICIS.2010.44