Title :
Efficient mining of textual associations
Author :
Gil, Alexandre ; Dias, Gael
Author_Institution :
Comput. Sci. Dept., Beira Interior Univ., Covilha, Portugal
Abstract :
We describe an efficient implementation for mining textual associations from text corpora. In order to tackle real world applications, efficient algorithms and data structures are needed to manage, in reasonable time and space, the overgrowing volume of text data. For that purpose, we introduce a global architecture based on masks, suffix arrays and multidimensional arrays to implement the SENTA extractor (Dias, 2002). In particular, SENTA has shown great flexibility and accuracy for mining textual associations such as collocations, cognates, morphemes and chunks. Our solution shows O(h(F) N log N) time complexity and O(N) space complexity where N is the size of the corpus and h(F) is a function of the context window size.
Keywords :
computational complexity; data mining; data structures; natural languages; text analysis; SENTA software architecture; data structure; multidimensional array; natural language; space complexity; suffix array; text corpora; textual association mining; time complexity; Application software; Computer architecture; Computer science; Data mining; Data structures; Gas insulated transmission lines; Neural networks; Oceans; Testing; Text mining;
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7803-7902-0
DOI :
10.1109/NLPKE.2003.1275966