Title :
Genre identification of Chinese finance text using machine learning method
Author :
Xu, Jun ; Ding, Yuxin ; Wang, Xiaolong ; Wu, Yonghui
Author_Institution :
Shenzhen Grad. Sch., Dept. of Comput. Sci. & Technol., Harbin Inst. of Technol., Shenzhen
Abstract :
Document genre information is one of the most distinguishing features in information retrieval, which brings order to the search results. What the genre classification concerned is not the topic but the genre of document. In this paper, we examine the effectiveness of using machine learning techniques to solve genre classification of Chinese text with the same topic, viz. finance. Based on the likelihood ratio test, we present a new method for selecting feature terms, which can improve the performance clearly and perform better than others with up to 80% terms removal. In empirical results with SVMs classifier on the real world corpora, we find that this method can gain a better selecting effect and likelihood ratio is a reliable measure for selecting informative features.
Keywords :
financial data processing; learning (artificial intelligence); pattern classification; support vector machines; text analysis; Chinese finance text; SVMs classifier; genre classification; genre identification; likelihood ratio test; machine learning method; support vector machine; Computer science; Finance; Gain measurement; IEEE news; Information retrieval; Learning systems; Machine learning; Performance evaluation; Search engines; Testing; Genre Classification; Likelihood Ratio Test; Support Vector Machines;
Conference_Titel :
Systems, Man and Cybernetics, 2008. SMC 2008. IEEE International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-1-4244-2383-5
Electronic_ISBN :
1062-922X
DOI :
10.1109/ICSMC.2008.4811318