DocumentCode :
124138
Title :
Collecting Conceptualized Relations from Terabytes of Web Texts for Understanding Unknown Terms
Author :
Shirakawa, Masumi ; Nakayama, Keisuke ; Aramaki, Eiji ; Hara, Tenshi ; Nishio, Shojiro
Author_Institution :
Osaka Univ., Suita, Japan
Volume :
1
fYear :
2014
fDate :
11-14 Aug. 2014
Firstpage :
86
Lastpage :
93
Abstract :
This paper describes our attempt to extract various relations between super ordinate concepts from terabytes of Web corpus for human-like speculation of the meaning of unknown terms. In order to discover various conceptualized relations, we focus on Web-scale text corpora and introduce a simple string-matching method to process them. To derive relations between concepts, our method first extracts relations between terms and next replaces each term by appropriate concepts using Wikipedia, Word Net, and YAGO knowledge. We extracted over 10 million relations between concepts in a day from more than 10TB of Web texts using 100 machines. Experimental results revealed that extracted relations by our method contained much more meaningless relations than those by NLP-based methods. Nevertheless, they were useful in an application of speculating the meaning of unknown terms, improving the recall by more than 0.06 points and decreasing the accuracy by only 0.04 points (the improvement of the F1-measure was 0.03 points). We found from the results that the coverage of conceptualized relations is important to improve the precision in the application. This is because the lack of knowledge (conceptualized relations) leads to misunderstanding of the meaning of unknown terms, as we humans misunderstand things with our insufficient knowledge.
Keywords :
Internet; Web sites; natural language processing; string matching; text analysis; NLP-based method; Web corpus; Web texts; Web-scale text corpora; Wikipedia; Word Net; YAGO knowledge; conceptualized relations; human-like speculation; string-matching method; Accuracy; Companies; Educational institutions; Electronic publishing; Encyclopedias; Internet;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on
Conference_Location :
Warsaw
Type :
conf
DOI :
10.1109/WI-IAT.2014.20
Filename :
6927529
Link To Document :
بازگشت