DocumentCode :
2772345
Title :
Large Scale Relation Acquisition Using Class Dependent Patterns
Author :
Saeger, Stijn De ; Torisawa, Kentaro ; Kazama, Junichi ; Kuroda, Kow ; Murata, Masaki
Author_Institution :
Language Infrastruct. Group, Nat. Inst. of Inf. & Commun. Technol. (NICT), Seika, Japan
fYear :
2009
fDate :
6-9 Dec. 2009
Firstpage :
764
Lastpage :
769
Abstract :
This paper proposes a minimally supervised method for acquiring high-level semantic relations such as causality and prevention from the Web. Our method learns linguistic patterns that express causality such as ¿x gave rise to y¿, and uses them to extract causal noun pairs like (global warming, malaria epidemic) from sentences like ¿global warming gave rise to a new malaria epidemic¿. The novelty of our method lies in the use of semantic word classes acquired by large scale clustering for learning class dependent patterns. We demonstrate the effectiveness of this class based approach on three large-scale relation mining tasks from 50 million Japanese Web pages. In two of these tasks we obtained more than 30,000 relation instances with over 80% precision, outperforming a state-of-the-art system by a large margin.
Keywords :
Internet; Web sites; causality; data acquisition; data mining; learning (artificial intelligence); natural language processing; pattern clustering; Japanese Web pages; World Wide Web; causality; class dependent patterns; high-level semantic relations; large scale clustering; large scale relation acquisition; large-scale relation mining tasks; linguistic patterns; semantic word classes; supervised method; Communications technology; Data mining; Diseases; Frequency; Global warming; Information retrieval; Large-scale systems; Sea measurements; Text mining; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on
Conference_Location :
Miami, FL
ISSN :
1550-4786
Print_ISBN :
978-1-4244-5242-2
Electronic_ISBN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2009.140
Filename :
5360308
Link To Document :
بازگشت