Title :
Large Scale Relation Acquisition Using Class Dependent Patterns
Author :
Saeger, Stijn De ; Torisawa, Kentaro ; Kazama, Junichi ; Kuroda, Kow ; Murata, Masaki
Author_Institution :
Language Infrastruct. Group, Nat. Inst. of Inf. & Commun. Technol. (NICT), Seika, Japan
Abstract :
This paper proposes a minimally supervised method for acquiring high-level semantic relations such as causality and prevention from the Web. Our method learns linguistic patterns that express causality such as ¿x gave rise to y¿, and uses them to extract causal noun pairs like (global warming, malaria epidemic) from sentences like ¿global warming gave rise to a new malaria epidemic¿. The novelty of our method lies in the use of semantic word classes acquired by large scale clustering for learning class dependent patterns. We demonstrate the effectiveness of this class based approach on three large-scale relation mining tasks from 50 million Japanese Web pages. In two of these tasks we obtained more than 30,000 relation instances with over 80% precision, outperforming a state-of-the-art system by a large margin.
Keywords :
Internet; Web sites; causality; data acquisition; data mining; learning (artificial intelligence); natural language processing; pattern clustering; Japanese Web pages; World Wide Web; causality; class dependent patterns; high-level semantic relations; large scale clustering; large scale relation acquisition; large-scale relation mining tasks; linguistic patterns; semantic word classes; supervised method; Communications technology; Data mining; Diseases; Frequency; Global warming; Information retrieval; Large-scale systems; Sea measurements; Text mining; Web pages;
Conference_Titel :
Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4244-5242-2
Electronic_ISBN :
1550-4786
DOI :
10.1109/ICDM.2009.140