Title :
Text classification and minimal-bias training vectors
Author :
Ahmad, Khurshid ; Bale, Tracey A. ; Burford, Darren
Author_Institution :
Surrey Univ., Guildford, UK
Abstract :
The categorisation of text using neural networks has been pursued in the context of emerging digital libraries. Frequency of key terms in a document set are used to create training vectors and a feature map can be trained such that texts are topologically ordered. The choice of training vectors and training regimes remains an open question in neural network research. In order to minimise training bias we have developed a method which uses the linguistic characteristics of domain-specific texts for the creation of training vectors. This method has been evaluated by classifying a standard free-text document set (TIPSTER´s SUMMAC AP news wire collection) using a Kohonen feature map. This method is particularly relevant to the domain of financial prediction, considering the large volume of news reports available to financial analysts
Keywords :
classification; financial data processing; learning (artificial intelligence); self-organising feature maps; text analysis; topology; Kohonen feature map; SUMMAC AP news wire collection; TIPSTER; digital libraries; domain-specific texts; financial analysts; financial prediction; key term frequency; minimal-bias training vectors; neural networks; text categorisation; text classification; topological ordering; training bias minimisation; training vectors; Drugs; Environmental factors; Frequency; Instruments; Neural networks; Software libraries; Terrorism; Text categorization; Vocabulary; Wire;
Conference_Titel :
Neural Networks, 1999. IJCNN '99. International Joint Conference on
Conference_Location :
Washington, DC
Print_ISBN :
0-7803-5529-6
DOI :
10.1109/IJCNN.1999.833528