DocumentCode :
1567047
Title :
Feature Selection For Text Categorisation Using Self-organising Map
Author :
Manomaisupat, Pensiri ; Ahmad, Khadher
Author_Institution :
Dept. of Comput., Surrey Univ., Guildford
Volume :
3
fYear :
2005
Firstpage :
1875
Lastpage :
1880
Abstract :
The categorisation of documents in large diverse collections poses a keen problem. The choice of a vector that may represent a document collection, and categories of documents within, is still an art form. We describe a study where four different types of term occurrence and document frequency metrices have been used with varying levels of success measured by classification accuracy statistics and average quantization error; TFIDF and its variant, term relevance, have been used together with a metric based on contrastive linguistics and another uses a finely-classified terminology data base. A novel method of term representation has been used - each element of the vector corresponds to the absence/presence of a set terms colocated within the element on the basis of frequency. In addition, we have defined a new baseline for comparison - a randomly selected set of terms for constructing a representative vector from within the collection. Categorisation was performed using the classic self-organising maps. We confirm that there is an optimum size of the input vector-c.100-200 terms- exists for each of the term-occurrence/document frequency metrices, and there appears to be a saturation point beyond that optimal limit
Keywords :
information filtering; pattern classification; self-organising feature maps; support vector machines; text analysis; average quantization error; classification accuracy statistics; document categorisation; feature selection; self-organising map; text categorisation; Art; Computer science; Error analysis; Filtering; Frequency measurement; Quantization; Routing; Terminology; Text categorization; Thesauri;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks and Brain, 2005. ICNN&B '05. International Conference on
Conference_Location :
Beijing
Print_ISBN :
0-7803-9422-4
Type :
conf
DOI :
10.1109/ICNNB.2005.1614991
Filename :
1614991
Link To Document :
بازگشت