Title :
Genetic uxtraction of text category descriptions
Author :
Serrano, J.I. ; del Castillo, M.D.
Author_Institution :
Instituto de Automatica Industrial, CSIC. Ctra. Campo Real, km. 0.200. La Poveda, Arganda del Rey, 28500 Madrid, SPAIN
fDate :
June 28 2004-July 1 2004
Abstract :
This paper deals with a supervised learning method devoted to producing categorization models of text documents. The goal of the method is to use a suitable numerical measurement of example similarity to find centroids describing different categories of examples. The centroids are neither abstract nor statistical models, but rather consist of bits of examples. The centroid-learning method is based on a genetic algorithm, the GAT. The categorization system infers a model by applying the GAT to the set of preclassified documents. The models thus obtained arc the category centroids that are used to predict the category of a new document.
Keywords :
Genetic algorithms; Humans; Internet; Machine learning; Natural languages; Organizing; Predictive models; Supervised learning; Terminology; Text categorization; centroid; evolutionary learning; similarity function; text classification;
Conference_Titel :
Automation Congress, 2004. Proceedings. World
Conference_Location :
Seville
Print_ISBN :
1-889335-21-5