DocumentCode :
2456961
Title :
Clustering of Short Strings in Large Databases
Author :
Kazimianec, Michail ; Mazeika, Arturas
Author_Institution :
Fac. of Comput. Sci., Free Univ. of Bozen-Bolzano, Bolzano, Italy
fYear :
2009
fDate :
Aug. 31 2009-Sept. 4 2009
Firstpage :
368
Lastpage :
372
Abstract :
A novel method CLOSS intended for textual databases is proposed. It successfully identifies misspelled string clusters, even if the cluster border is not prominent. The method uses q-gram approach to represent data and a string proximity graph to find the cluster. Contribution refers to short string clustering in text mining, when the proximity graph has multiple horizontal lines or the line is not present.
Keywords :
data mining; pattern clustering; string matching; text analysis; very large databases; CLOSS; cluster border; clustering of short strings; large databases; q-gram approach; string proximity graph; text mining; textual databases; Application software; Clustering methods; Computer science; Databases; Detection algorithms; Expert systems; Robustness; Smoothing methods; Tagging; Text mining; clustering; q-grams; short strings;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database and Expert Systems Application, 2009. DEXA '09. 20th International Workshop on
Conference_Location :
Linz
ISSN :
1529-4188
Print_ISBN :
978-0-7695-3763-4
Type :
conf
DOI :
10.1109/DEXA.2009.73
Filename :
5337105
Link To Document :
بازگشت