مرکز منطقه ای اطلاع رساني علوم و فناوري - Text Categorization Based on Granular Partition

DocumentCode :

2740434

Title :

Text Categorization Based on Granular Partition

Author :

Fan, Xinghua ; Chen, Ji

Author_Institution :

Coll. of Comput. Sci. & Technol., Univ. of Posts & Telecommun., Chongqing, China

Volume :

fYear :

2009

fDate :

14-16 Aug. 2009

Firstpage :

305

Lastpage :

310

Abstract :

Two factors strongly influence the quality of text categorization: (1) the class ambiguity of texts, i.e., some texts in one category may have greater similarities with some other texts in another category, (2) the diversity of discriminability of different type of feature. A classification approach that exploits the same type of feature at all steps of classification, or performs a single level classification, would suffer from the problems related to these factors. To deal with these problems, this paper proposes a text categorization model based on granular partition. This approach transforms text categorization to an optimization problem: given n feature types, to search an optimal partition solution, in which the collection is partitioned into many sub-parts, when every sub-part is represented by features with the suitable feature type that ensures the sub-part has the highest categorization performance, the global categorization performance is the best one in all impossible partition solutions. To get an approximate solution of the proposed model, a multi-level segmentation algorithm is developed, which employs dimidiate strategy, i.e., it uses a classifier with a given feature type to classify the test collection, then divide the test collection into two parts according to the output of the classifier, the part that the classification result is reliable is assigned to the given feature type as a match sub-part, the other part is considered as a new test collection at the next level. The n sub-parts generated for n feature types are considered as an approximate optimal partition solution. The experiments show that the proposed method can consider effectively the two factors and achieve a better performance.

Keywords :

pattern classification; text analysis; granular partition; multilevel segmentation algorithm; optimization problem; text categorization; text categorization model; Computer science; Educational institutions; Fuzzy systems; Information filtering; Information filters; Machine learning; Natural languages; Partitioning algorithms; Testing; Text categorization; Granular Partition; Multi-level segmentation; optimization model; text classificatiopm;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Fuzzy Systems and Knowledge Discovery, 2009. FSKD '09. Sixth International Conference on

Conference_Location :

Tianjin

Print_ISBN :

978-0-7695-3735-1

Type :

conf

DOI :

10.1109/FSKD.2009.484

Filename :

5358576

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2740434