Title :
Extracting Part-Whole Relations from Unstructured Chinese Corpus
Author :
Cao, Xinyu ; Cao, Cungen ; Wang, Shi ; Lu, Han
Author_Institution :
Key Lab. of Intell. Inf. Process., Chinese Acad. of Sci., Beijing
Abstract :
An important problem in text mining is the automatic extraction of semantic relations. The paper provides a domain independent method for automatic extraction of part-whole relations in Chinese corpusa. The method consists of there phases. First, a set of lexico-syntactical patterns for part-whole relations are designed using known pairs of concepts encoding part-whole relations as seeds, and manually filtering the extracted sentences. Second, Pairs of concepts are extracted using the patterns from a training corpus, which may reflect part-whole relations. Finally, the extracted pairs of concepts are further confirmed using a set of heuristic rules generated based on an analysis of Chinese syntactical and semantic features. Based on a test corpus, the method achieves satisfactory results.
Keywords :
data mining; natural language processing; pattern recognition; text analysis; heuristic rules; lexico-syntactical patterns; semantic relations extraction; text mining; unstructured Chinese corpus; Computers; Data mining; Encoding; Filtering; Filters; Fuzzy systems; Hard disks; Information processing; Text mining; Zinc;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location :
Jinan Shandong
Print_ISBN :
978-0-7695-3305-6
DOI :
10.1109/FSKD.2008.142