DocumentCode :
2836084
Title :
Auto-Identifying Terms Based on a Place-Extending Method
Author :
Zezhi Zheng
Author_Institution :
Dept. of Chinese Language & Literature, Xiamen Univ., Xiamen, China
fYear :
2011
fDate :
17-18 July 2011
Firstpage :
1
Lastpage :
5
Abstract :
The normalized relative frequency ratio is used as the domain differential degree to estimate the domain feature of a string; the sequence correlation coefficient is used to judge the stability of a string. The identifying process takes two steps. 1) Get term seeds. Extract adjacent character pairs from the domain corpus and the general corpus respectively. Then obtain term seeds by sifting the adjacency pairs with the domain differential degree, mutual information and the taboo character list jointly; 2) Gain terms. With strategy of verbatim extending, take the term seeds as anchor points. Then extend each seeds to its both sides verbatim. Leach every spread character with the sequence correlation coefficients, exceptional-correct rules and the taboo word list in turn. Take the terms with the character, as an example. The test showed that the precision and the recall rate of the algorithm reached 86.73% and 85.91%, respectively.
Keywords :
character recognition; correlation methods; feature extraction; sequences; string matching; domain differential degree; place extending method; sequence correlation coefficient; string stability; taboo word list; Correlation; Data mining; Feature extraction; Mutual information; Physics; Time frequency analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Circuits, Communications and System (PACCS), 2011 Third Pacific-Asia Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4577-0855-8
Type :
conf
DOI :
10.1109/PACCS.2011.5990133
Filename :
5990133
Link To Document :
بازگشت