DocumentCode :
3456051
Title :
Research on Definition Extraction Based on Over-Sampling Using Distance Distribution Information of Instances
Author :
Pan, Xu ; Gu, Hong-Bin ; Zhao, Zhi-Qing
Author_Institution :
Coll. of Civil Aviation, Nanjing Univ. of Aeronaut. & Astronaut., Nanjing, China
fYear :
2010
fDate :
21-23 Oct. 2010
Firstpage :
1
Lastpage :
6
Abstract :
For the purpose of extracting definitions of all terms from aviation professional corpus, we introduce a classification method. This method includes a novel approach to over-sampling minority instance using distance distribution information, building balanced training set using random under-sampling majority instance, constructing aggregating classifier with C4.5 decision tree. This method achieves the best score with 65% in F1-measure and 78% in F2-measure. In the end, we analyse the influence of feature selection method on classification results.
Keywords :
decision trees; feature extraction; pattern classification; sampling methods; text analysis; C4.5 decision tree; aviation professional corpus; balanced training set; classification method; definition extraction; distance distribution information; over sampling; over sampling minority; random under sampling majority instance; Bagging; Classification tree analysis; Electronic mail; Feature extraction; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (CCPR), 2010 Chinese Conference on
Conference_Location :
Chongqing
Print_ISBN :
978-1-4244-7209-3
Electronic_ISBN :
978-1-4244-7210-9
Type :
conf
DOI :
10.1109/CCPR.2010.5659148
Filename :
5659148
Link To Document :
بازگشت