DocumentCode
2539446
Title
Automatic Extraction of Multiword Expressions Combining Statistical and Similarity Approaches
Author
Xu, Jian ; Yu, Jingsong ; Wang, Huilin
Author_Institution
Dept. of Language & Inf. Eng., Peking Univ., Beijing, China
fYear
2010
fDate
13-15 Dec. 2010
Firstpage
256
Lastpage
259
Abstract
Multiword expressions (MWEs) are important for practical applications, such as machine translation (henceforth, MT), multilingual information retrieval, data mining and other natural language processing. A method of combining similarity measure and statistical tool is proposed for automatically extracting English MWEs from the corpus of Chinese government white papers and work reports from 1991 to 2010. Statistical approach is employed to calculate the co-occurrence affinity between two words. Besides, similarity measure is harnessed to compute the semantic relations between words for improving MWE coverage, thus aiming at obtaining higher precision and recall in extracting candidate multiword expressions. Experimental results showed the proposed technique improved MWE extraction efficiently.
Keywords
information retrieval; natural languages; statistical analysis; word processing; Chinese government; English; automatic extraction; multiword expression; semantic relation; similarity approach; statistical approach; white paper; work report; Computational linguistics; Conferences; Data mining; Government; Natural language processing; Pragmatics; Semantics; co-occurrence affinity; multiword expressions; similarity approach; statistical tool;
fLanguage
English
Publisher
ieee
Conference_Titel
Genetic and Evolutionary Computing (ICGEC), 2010 Fourth International Conference on
Conference_Location
Shenzhen
Print_ISBN
978-1-4244-8891-9
Electronic_ISBN
978-0-7695-4281-2
Type
conf
DOI
10.1109/ICGEC.2010.70
Filename
5715418
Link To Document