DocumentCode :
2665591
Title :
Efficient mining of textual associations
Author :
Gil, Alexandre ; Dias, Gael
Author_Institution :
Comput. Sci. Dept., Beira Interior Univ., Covilha, Portugal
fYear :
2003
fDate :
26-29 Oct. 2003
Firstpage :
549
Lastpage :
554
Abstract :
We describe an efficient implementation for mining textual associations from text corpora. In order to tackle real world applications, efficient algorithms and data structures are needed to manage, in reasonable time and space, the overgrowing volume of text data. For that purpose, we introduce a global architecture based on masks, suffix arrays and multidimensional arrays to implement the SENTA extractor (Dias, 2002). In particular, SENTA has shown great flexibility and accuracy for mining textual associations such as collocations, cognates, morphemes and chunks. Our solution shows O(h(F) N log N) time complexity and O(N) space complexity where N is the size of the corpus and h(F) is a function of the context window size.
Keywords :
computational complexity; data mining; data structures; natural languages; text analysis; SENTA software architecture; data structure; multidimensional array; natural language; space complexity; suffix array; text corpora; textual association mining; time complexity; Application software; Computer architecture; Computer science; Data mining; Data structures; Gas insulated transmission lines; Neural networks; Oceans; Testing; Text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7803-7902-0
Type :
conf
DOI :
10.1109/NLPKE.2003.1275966
Filename :
1275966
Link To Document :
بازگشت