DocumentCode :
2910172
Title :
Using genetic algorithms in word-vector optimisation
Author :
Smith, Peter W H
Author_Institution :
Dept. of Comput., City Univ. London, London, UK
fYear :
2010
fDate :
8-10 Sept. 2010
Firstpage :
1
Lastpage :
5
Abstract :
Word vectors and sets of words are used in a wide range of text-based applications. Yet these word sets are often chosen on an ad hoc basis. In this study, we examine two text-based applications that use word sets and in both cases find that classification performance can be optimised using a fairly simple genetic algorithm. The first study is in authorship attribution, the second one is sentiment analysis and in both cases classification precision can be improved using a genetic algorithm. In authorship attribution, in recent years the trend has been towards ever larger word vectors. We suggest that this might be a counter-productive step as it can easily lead to inaccuracy caused by overfitting or vector-space sparsity (the curse of dimensionality). In sentiment analysis precision is the main issue as rates of greater than 80-85% are not easy to achieve.
Keywords :
genetic algorithms; pattern classification; text analysis; word processing; authorship attribution; classification performance; classification precision; genetic algorithm; sentiment analysis; sentiment analysis precision; text based application; vector space sparsity; word set; word vector optimisation; word vectors; Accuracy; Classification algorithms; Euclidean distance; Frequency measurement; Optimization; Presses; Support vector machine classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence (UKCI), 2010 UK Workshop on
Conference_Location :
Colchester
Print_ISBN :
978-1-4244-8774-5
Electronic_ISBN :
978-1-4244-8773-8
Type :
conf
DOI :
10.1109/UKCI.2010.5625589
Filename :
5625589
Link To Document :
بازگشت