DocumentCode :
2286467
Title :
Text representation and classification based on multi-instance learning
Author :
He, Wei ; Wang, Yu
Author_Institution :
Sch. of Manage., Dalian Univ. of Technol., Dalian, China
fYear :
2009
fDate :
14-16 Sept. 2009
Firstpage :
34
Lastpage :
39
Abstract :
In multi-instance learning, the training set comprises labeled bags which are composed of unlabeled instances, and the task is to predict the labels of unseen bags. In this paper, a text mining problem, i.e. text representation, is investigated from a multi-instance view. In detail, each text is regarded as a bag while each of its sentences is regarded as an instance. Bag can be labeled by its class label and its similarity is defined by sentence similarity. The text classification problem is translated into multi-instance learning problem. In order to solve this problem, a Chinese text classifier focusing on bag has been built by KNN algorithm and good average precision 92.12% and recall 92.01% have been achieved in the experiments.
Keywords :
classification; data mining; learning (artificial intelligence); natural languages; text analysis; Chinese text classifier; KNN algorithm; extended k-nearest neighbor algorithm; labeled bags sentence; multiinstance learning problem; text classification problem; text mining problem; text representation; training set; unlabeled instance; Conference management; Data mining; Engineering management; Helium; Information resources; Information technology; Management training; Technology management; Text categorization; Text mining; bag of sentences; multi-instance learning; text classification; text representation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Management Science and Engineering, 2009. ICMSE 2009. International Conference on
Conference_Location :
Moscow
Print_ISBN :
978-1-4244-3970-6
Electronic_ISBN :
978-1-4244-3971-3
Type :
conf
DOI :
10.1109/ICMSE.2009.5317537
Filename :
5317537
Link To Document :
بازگشت