Title :
Similarity Computation of Low-frequency Chinese Words
Author :
Fan, Xinghua ; Chen, Ji
Author_Institution :
Coll. of Comput. Sci. & Technol., Univ. of Posts & Telecommun., Chongqing, China
Abstract :
This paper proposes a novel method on Chinese low-frequency word similarity computation. It adopts a combinational strategy to compute word similarity, which exploits dictionary Hownet and constructed corpus retrieved from Internet. It has 3 steps: (1) If both of two words exist in Hownet, the similarity between them is computed based on Hownet. (2) If either of two words a and b doesn´t exist in Hownet, we respectively use word a, word b and word pair a and b as a query to search on the Internet and construct a corpus with the search results. Similarity between two words is computed based on the context of words. (3) In order to guarantee that similarities computed based on different sources are comparable, the similarity computed based on constructed corpus is multiplied by a coefficient. Experimental results show that the proposed method has effectively solved the problem of computing low-frequency word similarity.
Keywords :
Internet; query formulation; Chinese low frequency word similarity computation; Internet; constructed corpus; dictionary Hownet; Computer science; Educational institutions; Frequency shift keying; Fuzzy systems; Internet; Paper technology; Search engines; Statistics; Taxonomy; Telecommunication computing; constructed corpus; low frequency; word similarity;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2009. FSKD '09. Sixth International Conference on
Conference_Location :
Tianjin
Print_ISBN :
978-0-7695-3735-1
DOI :
10.1109/FSKD.2009.476