DocumentCode :
3109551
Title :
Genre identification of Chinese finance text using machine learning method
Author :
Xu, Jun ; Ding, Yuxin ; Wang, Xiaolong ; Wu, Yonghui
Author_Institution :
Shenzhen Grad. Sch., Dept. of Comput. Sci. & Technol., Harbin Inst. of Technol., Shenzhen
fYear :
2008
fDate :
12-15 Oct. 2008
Firstpage :
455
Lastpage :
459
Abstract :
Document genre information is one of the most distinguishing features in information retrieval, which brings order to the search results. What the genre classification concerned is not the topic but the genre of document. In this paper, we examine the effectiveness of using machine learning techniques to solve genre classification of Chinese text with the same topic, viz. finance. Based on the likelihood ratio test, we present a new method for selecting feature terms, which can improve the performance clearly and perform better than others with up to 80% terms removal. In empirical results with SVMs classifier on the real world corpora, we find that this method can gain a better selecting effect and likelihood ratio is a reliable measure for selecting informative features.
Keywords :
financial data processing; learning (artificial intelligence); pattern classification; support vector machines; text analysis; Chinese finance text; SVMs classifier; genre classification; genre identification; likelihood ratio test; machine learning method; support vector machine; Computer science; Finance; Gain measurement; IEEE news; Information retrieval; Learning systems; Machine learning; Performance evaluation; Search engines; Testing; Genre Classification; Likelihood Ratio Test; Support Vector Machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man and Cybernetics, 2008. SMC 2008. IEEE International Conference on
Conference_Location :
Singapore
ISSN :
1062-922X
Print_ISBN :
978-1-4244-2383-5
Electronic_ISBN :
1062-922X
Type :
conf
DOI :
10.1109/ICSMC.2008.4811318
Filename :
4811318
Link To Document :
بازگشت