DocumentCode
2315561
Title
Exploring the use of fuzzy signature for text mining
Author
Wong, Kok Wai ; Chumwatana, Todsanai ; Tikk, Domonkos
Author_Institution
Sch. of Inf. Technol., Murdoch Univ., Murdoch, WA, Australia
fYear
2010
fDate
18-23 July 2010
Firstpage
1
Lastpage
5
Abstract
The classical approaches for the traditional problems of text mining, such as document indexing, document clustering or text classification, represent the text as bag-of-words. Words, the units of the representation, are determined by tokenization, using e.g. whitespace and punctuation characters as separator. The bag-of-word based methods face problem with non-segmented text typical for some Asian languages, since the tokenization based solution cannot be applied anymore to determine the representation units. Several solutions were proposed so far, among them frequent max substring mining is adopted here because of its language-independency and favourable speed and store requirements. We present in this paper a fuzzy signature based solution using frequent max substring for non-segmented document representation, and propose how it could be applied for some typical text mining tasks. We show how the flexibility of fuzzy signatures can be exploited for text mining tasks. With the use of this proposed concept, complex decision models in text mining may be constructed more effectively in future.
Keywords
data mining; fuzzy set theory; pattern classification; text analysis; document clustering; document indexing; fuzzy signature; punctuation characters; text classification; text mining; whitespace characters; Education; Fuzzy sets; Humans; Indexing; Lattices; Medals; Support vector machine classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems (FUZZ), 2010 IEEE International Conference on
Conference_Location
Barcelona
ISSN
1098-7584
Print_ISBN
978-1-4244-6919-2
Type
conf
DOI
10.1109/FUZZY.2010.5584873
Filename
5584873
Link To Document