DocumentCode :
1634320
Title :
Compute the Term Contributed Frequency
Author :
Sung, Cheng-Lung ; Yen, Hsu-Chun ; Hsu, Wen-Lian
Author_Institution :
Dept. of Electr. Eng., Nat. Taiwan Univ.
Volume :
2
fYear :
2008
Firstpage :
325
Lastpage :
328
Abstract :
In this paper, we propose an algorithm and data structure for computing the term contributed frequency (tcf) for all N-grams in a text corpus. Although term frequency is one of the standard notions of frequency in Corpus-Based Natural Language Processing (NLP), there are some problems regarding the use of the concept to N-grams approaches such as the distortion of phrase frequencies. We attempt to overcome this drawback by building a DAG containing the proposed data structure and using it to retrieve more reliable term frequencies. Our proposed algorithm and data structure are more efficient than traditional term frequency extraction approaches and portable to various languages.
Keywords :
data structures; directed graphs; information retrieval; natural language processing; text analysis; corpus-based natural language processing; data structure; directed acyclic graph; term contributed frequency; Buildings; Computer applications; Data mining; Data structures; Frequency; Information retrieval; Intelligent systems; Natural language processing; Natural languages; Rails; suffix array; term contributed frequency; term frequency;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Systems Design and Applications, 2008. ISDA '08. Eighth International Conference on
Conference_Location :
Kaohsiung
Print_ISBN :
978-0-7695-3382-7
Type :
conf
DOI :
10.1109/ISDA.2008.152
Filename :
4696352
Link To Document :
بازگشت