DocumentCode :
2248133
Title :
A comparative study on two large-scale hierarchical text classification tasks´ solutions
Author :
Zhang, Jian ; Zhao, Hai ; Lu, Bao-Liang
Author_Institution :
Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
Volume :
6
fYear :
2010
fDate :
11-14 July 2010
Firstpage :
3275
Lastpage :
3280
Abstract :
Patent classification is a large scale hierarchical text classification (LSHTC) task. Though comprehensive comparisons, either learning algorithms or feature selection strategies, have been fully made in the text categorization field, few work was done for a LSHTC task due to high computational cost and complicated structural label characteristics. For the first time, this paper compares two popular learning frameworks, namely hierarchical support vector machine (SVM) and k nearest neighbor (k-NN) that are applied to a LSHTC task. Experiment results show that the latter outperforms the former in this LSHTC task, which is quite different from the usual results for normal text categorization tasks. Then this paper does a comparative study on different similarity measures and ranking approaches in k-NN framework for LSHTC task. Conclusions can be drawn that k-NN is more appropriate for the LSHTC task than hierarchical SVM and for a specific LSHTC task. BM25 outperforms other similarity measures and List Weak gains a better performance than other ranking approaches. We also find an interesting phenomenon that using all the labels of the retrieved neighbors can remarkably improve classification performance over only using the first label of the retrieved neighbors.
Keywords :
learning (artificial intelligence); patents; pattern classification; support vector machines; text analysis; BM25; ListWeak; feature selection strategies; hierarchical support vector machine; k nearest neighbor; large scale hierarchical text classification tasks; learning algorithms; patent classification; text categorization; Classification algorithms; Nearest neighbor searches; Patents; Support vector machines; Taxonomy; Text categorization; Training; Hierarchical SVM; Hierarchical text classification; Large-scale text classification; Ranking approach; Similarity measure; Text classification; comparative study; k-NN;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2010 International Conference on
Conference_Location :
Qingdao
Print_ISBN :
978-1-4244-6526-2
Type :
conf
DOI :
10.1109/ICMLC.2010.5580696
Filename :
5580696
Link To Document :
بازگشت