DocumentCode :
892727
Title :
Mining Generalized Associations of Semantic Relations from Textual Web Content
Author :
Jiang, Tao ; Tan, Ah-Hwee ; Wang, Ke
Author_Institution :
Sch. of Comput. Eng., Nanyang Technol. Univ.
Volume :
19
Issue :
2
fYear :
2007
Firstpage :
164
Lastpage :
179
Abstract :
Traditional text mining techniques transform free text into flat bags of words representation, which does not preserve sufficient semantics for the purpose of knowledge discovery. In this paper, we present a two-step procedure to mine generalized associations of semantic relations conveyed by the textual content of Web documents. First, RDF (resource description framework) metadata representing semantic relations are extracted from raw text using a myriad of natural language processing techniques. The relation extraction process also creates a term taxonomy in the form of a sense hierarchy inferred from WordNet. Then, a novel generalized association pattern mining algorithm (GP-Close) is applied to discover the underlying relation association patterns on RDF metadata. For pruning the large number of redundant overgeneralized patterns in relation pattern search space, the GP-Close algorithm adopts the notion of generalization closure for systematic overgeneralization reduction. The efficacy of our approach is demonstrated through empirical experiments conducted on an online database of terrorist activities
Keywords :
Internet; data mining; natural language processing; RDF mining; WordNet; association rule mining; generalized association pattern mining algorithm; knowledge discovery; natural language processing; resource description framework metadata; semantic relation; text mining; textual Web content; Association rules; Data mining; Databases; Explosives; Information resources; Natural language processing; Resource description framework; Taxonomy; Text mining; Web sites; RDF mining; association rule mining; relation association; text mining.;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2007.36
Filename :
4039281
Link To Document :
بازگشت