Title :
Random walks on adjacency graphs for mining lexical relations from big text data
Author :
Shan Jiang ; Chengxiang Zhai
Author_Institution :
Dept. of Comput. Sci., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
Abstract :
Lexical relations, or semantic relations of words, are useful knowledge fundamental to all applications since they help to capture inherent semantic variations of vocabulary in human languages. Discovering such knowledge in a robust way from arbitrary text data is a significant challenge in big text data mining. In this paper, we propose a novel general probabilistic approach based on random walks on word adjacency graphs to systematically mine two fundamental and complementary lexical relations, i.e., paradigmatic and syntagmatic relations between words from arbitrary text data. We show that representing text data as an adjacency graph opens up many opportunities to define interesting random walks for mining lexical relation patterns, and propose specific random walk algorithms for mining paradigmatic and syntagmatic relations. Evaluation results on multiple corpora show that the proposed random walk-based algorithms can discover meaningful paradigmatic and syntagmatic relations of words from text data.
Keywords :
data mining; data structures; graph theory; graphs; natural language processing; text analysis; word processing; arbitrary text data; big text data mining; human languages; lexical relations mining; paradigmatic relations; random walk algorithms; syntagmatic relations; text data representation; vocabulary; word adjacency graphs; word semantic relations; Algorithm design and analysis; Data mining; Legged locomotion; Natural languages; Probabilistic logic; Semantics; Vocabulary;
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/BigData.2014.7004272