Title :
Compute the Term Contributed Frequency
Author :
Sung, Cheng-Lung ; Yen, Hsu-Chun ; Hsu, Wen-Lian
Author_Institution :
Dept. of Electr. Eng., Nat. Taiwan Univ.
Abstract :
In this paper, we propose an algorithm and data structure for computing the term contributed frequency (tcf) for all N-grams in a text corpus. Although term frequency is one of the standard notions of frequency in Corpus-Based Natural Language Processing (NLP), there are some problems regarding the use of the concept to N-grams approaches such as the distortion of phrase frequencies. We attempt to overcome this drawback by building a DAG containing the proposed data structure and using it to retrieve more reliable term frequencies. Our proposed algorithm and data structure are more efficient than traditional term frequency extraction approaches and portable to various languages.
Keywords :
data structures; directed graphs; information retrieval; natural language processing; text analysis; corpus-based natural language processing; data structure; directed acyclic graph; term contributed frequency; Buildings; Computer applications; Data mining; Data structures; Frequency; Information retrieval; Intelligent systems; Natural language processing; Natural languages; Rails; suffix array; term contributed frequency; term frequency;
Conference_Titel :
Intelligent Systems Design and Applications, 2008. ISDA '08. Eighth International Conference on
Conference_Location :
Kaohsiung
Print_ISBN :
978-0-7695-3382-7
DOI :
10.1109/ISDA.2008.152