DocumentCode :
3278237
Title :
Extracting Chinese abbreviation-definition pairs from anchor texts
Author :
Xie, Li-xing ; Zheng, Ya-bin ; Liu, Zhi-yuan ; Sun, Mao-song ; Wang, Can-hui
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
Volume :
4
fYear :
2011
fDate :
10-13 July 2011
Firstpage :
1485
Lastpage :
1491
Abstract :
This paper proposes an automatic scheme to extract Chinese abbreviations and their corresponding definitions from large-scale anchor texts. This method is motivated by the observation that the more frequently two anchor texts point to the same web page, the more related they are. Since abbreviation-definition pairs are highly related, they can be extracted from these related words. Our method involves three steps. Firstly we utilize external statistical features to extract candidate abbreviation-definition pairs from anchor texts. Secondly we extract internal features from candidate pairs and adopt Conditional Random Fields (CRFs) to compute a score for each candidate pair. Finally we combine external and internal features to generate the final pairs. Experimental results show that this method can accurately extract Chinese abbreviation-definition pairs from anchor texts and combining both external and internal features is effective for extracting abbreviation-definition pairs.
Keywords :
Internet; natural language processing; statistical analysis; text analysis; CRF; Chinese abbreviation-definition pairs; anchor texts; conditional random fields; external statistical features; web page; Bipartite graph; Equations; Feature extraction; Labeling; Machine learning; Mathematical model; Training; Anchor texts; CRFs; Chinese Abbreviation-definition pairs;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2011 International Conference on
Conference_Location :
Guilin
ISSN :
2160-133X
Print_ISBN :
978-1-4577-0305-8
Type :
conf
DOI :
10.1109/ICMLC.2011.6016980
Filename :
6016980
Link To Document :
بازگشت