Title :
Evaluation of the new feature types for question classification with support vector machines
Author :
Skowron, Marcin ; Araki, Kenji
Author_Institution :
Graduate Sch. of Inf. Sci. & Technol., Hokkaido Univ., Sapporo, Japan
Abstract :
Question classification is of crucial importance for question answering. In question classification, the accuracy of machine learning algorithms was found to significantly outperform other approaches. The two key issues in classification with a ML-based approach are classifier design and feature selection. Support vector machines is known to work well for sparse, high dimensional problems. However, the frequently used bag-of-words approach does not take full advantage of information contained in a question. To exploit this information we introduce three new feature types: subordinate word category, question focus and syntactic-semantic structure. As the results demonstrate, the inclusion of the new features provides higher accuracy of question classification compared to the standard bag-of-words approach and other ML based methods such as SVM with the tree kernel, SVM with error correcting codes and SNoW. A classification accuracy of 84.6% obtained using the three introduced feature types is as of yet the highest reported in the literature.
Keywords :
classification; information retrieval; learning (artificial intelligence); support vector machines; classifier design; feature selection; machine learning algorithms; question answering; question classification; question focus; subordinate word category; support vector machines; syntactic-semantic structure; Code standards; Error correction codes; Humans; Information science; Internet; Kernel; Machine learning algorithms; Snow; Support vector machines; Taxonomy;
Conference_Titel :
Communications and Information Technology, 2004. ISCIT 2004. IEEE International Symposium on
Print_ISBN :
0-7803-8593-4
DOI :
10.1109/ISCIT.2004.1413873