DocumentCode :
1930762
Title :
Finding patterns in strings using suffixarrays
Author :
Stehouwer, Herman ; van Zaanen, M.
Author_Institution :
Tilburg Centre for Cognition & Commun., Tilburg Univ., Tilburg, Netherlands
fYear :
2010
fDate :
18-20 Oct. 2010
Firstpage :
505
Lastpage :
511
Abstract :
Finding regularities in large data sets requires implementations of systems that are efficient in both time and space requirements. Here, we describe a newly developed system that exploits the internal structure of the enhanced suffixarray to find significant patterns in a large collection of sequences. The system searches exhaustively for all significantly compressing patterns where patterns may consist of symbols and skips or wildcards. We demonstrate a possible application of the system by detecting interesting patterns in a Dutch and an English corpus.
Keywords :
data compression; natural language processing; string matching; Dutch corpus; English corpus; compressing patterns; interesting patterns; large data sets; strings; suffixarrays; Arrays; Buildings; Cognition; Natural languages; Software; Sorting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Information Technology (IMCSIT), Proceedings of the 2010 International Multiconference on
Conference_Location :
Wisla
ISSN :
2157-5525
Print_ISBN :
978-1-4244-6432-6
Type :
conf
DOI :
10.1109/IMCSIT.2010.5679928
Filename :
5679928
Link To Document :
بازگشت