Title :
Research on Large Scale Hierarchical Classification Based on Candidate Search
Author :
Li He ; Yan Jia ; Zhaoyun Ding ; Weihong Han
Author_Institution :
Sch. of Comput. Sci., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
Large scale hierarchical classification problem researches how to classify web documents into the categories among a class hierarchy. As the class hierarchy is very large that containing thousands or even tens of thousands of categories, the performance of the classification is still lower. While a reduce-and-conquer strategy has been proposed to make the problem tractable, candidate search is a bottleneck in classification. In this paper, we first analyze the computational complexity of category candidate search problem, and prove that it is an NP-hard problem. Then a candidate search algorithm which adopts a greedy strategy is proposed, and we prove that the proposed greedy strategy is a local optimum choice in the heuristic solving process. In the classification stage, we find that ancestor categories may help classification of candidates. Experiments are conducted on the dataset of web pages from the Chinese Simplified branch of the DMOZ directory. The results show that the proposed algorithm achieves a performance improvement for candidate search compared to existing methods, and further improves the classification accuracy of two-stage approaches.
Keywords :
Internet; computational complexity; greedy algorithms; information retrieval; optimisation; pattern classification; text analysis; Chinese Simplified branch; DMOZ directory; NP-hard problem; Web document classification; Web pages; ancestor categories; category candidate search problem; class hierarchy; classification performance improvement; computational complexity; greedy strategy; heuristic solving process; large scale hierarchical classification problem; local optimum choice; two-stage approach classification accuracy; Accuracy; Buildings; Classification algorithms; Heuristic algorithms; NP-hard problem; Search problems; candidate search problem; category candidate; class hierarchy; large scale hierarchical classification; text categorization;
Conference_Titel :
Web Information System and Application Conference (WISA), 2013 10th
Conference_Location :
Yangzhou
Print_ISBN :
978-1-4799-3218-4
DOI :
10.1109/WISA.2013.73