Title :
CASIT: Content Based Identification of Textual Information in a Large Database
Author :
Guezouli, Larbi ; Essafi, Hassane
Author_Institution :
Comput. Sci. Dept., Batna Univ., Batna, Algeria
Abstract :
This paper describes CASIT model (CAlculation of SImilarity of Text). Starting from a coarse confrontation of text documents, based on the Latent Semantic Indexing model (LSI), CASIT method calculates in a finer way, the rate of similarity between model documents of text and others which are confronted to them. Our approach takes into account the neighbourhood of the words, which makes it possible to balance the words in the calculation of the score.
Keywords :
text analysis; CASIT model; calculation of similarity of text; content based identification; latent semantic indexing model; text documents; textual information; Application software; Computer science; Conferences; Databases; Filters; Frequency; Indexing; Information retrieval; Large scale integration; Matrix decomposition; CASIT; Component; LSI; textual research; vectorial model;
Conference_Titel :
Advanced Information Networking and Applications Workshops (WAINA), 2010 IEEE 24th International Conference on
Conference_Location :
Perth, WA
Print_ISBN :
978-1-4244-6701-3
DOI :
10.1109/WAINA.2010.133