A New Method of Computing Chinese Word Similarity Based on Statistics

Author

Zhang, Bo ; Hong, Lei ; Song, Shubin ; He, Liang ; Li, Guorong

Author_Institution

Dept. of Comput. Sci. & Technol., East China Normal Univ., Shanghai, China

fYear

2012

fDate

18-21 Aug. 2012

Firstpage

43

Lastpage

46

Abstract

Word semantic similarity is a very subjective concept and it is very difficult to get a similarity value close to human judgment. Chinese word semantic similarity research is relatively scarce due to its inherent complexity. This paper presents an approach to compute Chinese word semantic similarity based on statistical methods with word frequency contrast introduced (WFC-WS). Word semantic vectors are first obtained using co-occurrence and then extended with HIT-IR Tongyici Cilin (Extended). Word frequency contrast is introduced to filter the semantic vectors. Experiments show that the results of WFC-WS are closer to artificial standard compared with some similar methods.

Keywords

information filtering; natural language processing; statistical analysis; word processing; Chinese word semantic similarity; HIT-IR; WFC; WS; co-occurrence; statistical method; word frequency contrast; word semantic vector filtering; Dictionaries; Humans; Semantics; Standards; Vectors; Semantic similarity; Tongyici Cilin; co-occurrence;

fLanguage

English

Publisher

ieee

Conference_Titel

Business Intelligence and Financial Engineering (BIFE), 2012 Fifth International Conference on

Conference_Location

Lanzhou

Print_ISBN

978-1-4673-2092-4

Type

conf

DOI

10.1109/BIFE.2012.17

Filename

6305076