Title of article :
Representation of Textual Documents by the Approach Wordnet and N-grams for the Unsupervised Classification (Clustering) with 2D Cellular Automata: A Comparative Study
Author/Authors :
HAMOU Reda Mohamed، نويسنده , , Ahmed Lehireche and Abdellatif Rahmoun، نويسنده , , LOKBANI Ahmed Chaouki، نويسنده , , RAHMANI Mohamed، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2010
Abstract :
In this article we present a 2D cellular automaton (Class_AC) to solve a problem of text mining in the case of unsupervised classification (clustering). Before to experiment the cellular automaton, we vectorized our data indexing textual documents from the database REUTERS 21,578 by Wordnet approach and the representation of text documents by the method n-grams. Our work is to make a comparative study of two approaches to representation that is the conceptual approach (Wordnet) and the n-grams. Section 1 gives an introduction on the biomimetisme and text mining, Section 2 presents representation of texts based on Wordnet approach and the n grams, Section 3 describes the cellular automaton for clustering, Section 4 shows the experimentation and comparison results and finally Section 5 gives a conclusion and perspectives
Keywords :
Unsupervised classification , Biomimetic methods , Clustering and segmentation , Data classification , Data mining , cellular automata
Journal title :
Computer and Information Science
Journal title :
Computer and Information Science