Title :
Scaling Text Classification with Relevance Vector Machines
Author :
Silva, Catarina ; Ribeiro, Bernardete
Author_Institution :
Polytech. Inst. of Leiria, Leiria
Abstract :
Text classification (TC) is a complex ubiquitous task that handles a huge amount of data. Current research has recently proved that kernel learning based methods are quite effective in this problem. As opposed to support vector machines (SVM), the relevance vector machine (RVM) in particular yields a probabilistic output while preserving its accuracy. However, few research efforts have addressed the issue of scalability that arises when applying RVM to large scale problems like TC. We propose a new model which consists of a two-step RVM classifier able to (i) be competitive regarding processing time, (ii) use all available training elements and (iii) improve RVM classification performance. The paper also shows that a convenient similitude measure among documents can be defined on all the collection data, which does not only make the process swifter but also parallelizable. Using REUTERS-21578, we show that deployment of successful real-time applications is possible through reduction of the computational complexity and improvement of overall performance, obtained by the proposed model.
Keywords :
classification; computational complexity; probability; support vector machines; text analysis; computational complexity; kernel learning based method; probability; relevance vector machine; support vector machine; text classification; Bayesian methods; Cybernetics; Frequency conversion; Informatics; Kernel; Large-scale systems; Scalability; Support vector machine classification; Support vector machines; Text categorization;
Conference_Titel :
Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on
Conference_Location :
Taipei
Print_ISBN :
1-4244-0099-6
Electronic_ISBN :
1-4244-0100-3
DOI :
10.1109/ICSMC.2006.384791