مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

2665591

Title :

Efficient mining of textual associations

Author :

Gil, Alexandre ; Dias, Gael

Author_Institution :

Comput. Sci. Dept., Beira Interior Univ., Covilha, Portugal

fYear :

2003

fDate :

26-29 Oct. 2003

Firstpage :

549

Lastpage :

554

Abstract :

We describe an efficient implementation for mining textual associations from text corpora. In order to tackle real world applications, efficient algorithms and data structures are needed to manage, in reasonable time and space, the overgrowing volume of text data. For that purpose, we introduce a global architecture based on masks, suffix arrays and multidimensional arrays to implement the SENTA extractor (Dias, 2002). In particular, SENTA has shown great flexibility and accuracy for mining textual associations such as collocations, cognates, morphemes and chunks. Our solution shows O(h(F) N log N) time complexity and O(N) space complexity where N is the size of the corpus and h(F) is a function of the context window size.

Keywords :

computational complexity; data mining; data structures; natural languages; text analysis; SENTA software architecture; data structure; multidimensional array; natural language; space complexity; suffix array; text corpora; textual association mining; time complexity; Application software; Computer architecture; Computer science; Data mining; Data structures; Gas insulated transmission lines; Neural networks; Oceans; Testing; Text mining;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on

Conference_Location :

Beijing, China

Print_ISBN :

0-7803-7902-0

Type :

conf

DOI :

10.1109/NLPKE.2003.1275966

Filename :

1275966

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2665591