Title of article :
A novel framework for termset selection and weighting in binary text classification
Author/Authors :
Badawi، نويسنده , , Dima and Alt?nçay، نويسنده , , Hakan، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2014
Pages :
16
From page :
38
To page :
53
Abstract :
This study presents a new framework for termset selection and weighting. The proposed framework is based on employing the joint occurrence statistics of pairs of terms for termset selection and weighting. More specifically, each termset is evaluated by taking into account the simultaneous or individual occurrences of the terms within the termset. Based on the idea that the occurrence of one term but not the other may also convey valuable information for discrimination, the conventionally used term selection schemes are adapted to be employed for termset selection. Similarly, the weight of a selected termset is computed as a function of the terms that occur in the document under concern where a termset is assigned a nonzero weight if either or both of the terms appear in the document. This weight estimation scheme allows evaluation of the individual occurrences of the terms and their co-occurrences separately so as to compute the document-specific weight of each termset. The proposed termset-based representation is concatenated with the bag-of-words approach to construct the document vectors. Experiments conducted on three widely used datasets have verified the effectiveness of the proposed framework.
Keywords :
Co-occurrence features , Text Categorization , Termset weighting , Termset selection , document representation
Journal title :
Engineering Applications of Artificial Intelligence
Serial Year :
2014
Journal title :
Engineering Applications of Artificial Intelligence
Record number :
2126263
Link To Document :
بازگشت