Title :
An Analysis of Constructed Categories for Textual Classification Using Fuzzy Similarity and Agglomerative Hierarchical Methods
Author :
Guelpeli, Marcus Vinicius C ; Garcia, Ana Cristina Bicharra
Author_Institution :
Dept. de Cienc. da Comput., Univ. Fed. Fluminense, Niteroi
Abstract :
Ambiguity is a challenge faced by systems that handle natural language. To assuage the issue of linguistic ambiguities found in text classification, this work proposes a text categorizer using the methodology of Fuzzy Similarity. The grouping algorithms Stars and Cliques are adopted in the Agglomerative Hierarchical method and they identify the groups of texts by specifying some time of relationship rule to create categories based on the similarity analysis of the textual terms. The proposal is that based on the methodology suggested, categories can be created from the analysis of the degree of similarity of the texts to be classified, without needing to determine the number of initial categories. The combination of techniques proposed in the categorizerpsilas phases brought satisfactory results, proving to be efficient in textual classification.
Keywords :
pattern classification; text analysis; agglomerative hierarchical methods; fuzzy similarity; grouping algorithms; text categorizer; textual classification; Content management; Data mining; Databases; Internet; Natural languages; Proposals; Signal analysis; Statistics; Text categorization; Text mining; Agglomerative Hierarchical; Fuzzy Similarity; Similarity Matrix; Text Mining;
Conference_Titel :
Signal-Image Technologies and Internet-Based System, 2007. SITIS '07. Third International IEEE Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-0-7695-3122-9
DOI :
10.1109/SITIS.2007.109