DocumentCode :
537583
Title :
Optimal Hash List for Word Frequency Analysis
Author :
Sheng-Lan Peng
Author_Institution :
Dept. of Inf. Eng., JDZ Ceramic Inst., Jingdezhen, China
Volume :
1
fYear :
2010
fDate :
23-24 Oct. 2010
Firstpage :
242
Lastpage :
245
Abstract :
Word frequency analysis plays an essential role in many data mining tasks of large-scale data set based on text corpus, and hash list is a very simple but efficient structure for frequent pattern discovering. In this paper, a Poisson approximation approach is exploited to analyze the space efficiency of hash list under different parameters on probability. Based on our theoretical model, an optimal parameter setting for hash list is given. Experimental result of real data shows that hash list with the optimal parameter can reach minimum or nearly minimum memory cost.
Keywords :
approximation theory; stochastic processes; text analysis; word processing; Poisson approximation approach; data mining tasks; frequent pattern discovery; hash list; text corpus; word frequency analysis; Poisson approximation; hash list; space efficiency; word frequency;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Information Systems and Mining (WISM), 2010 International Conference on
Conference_Location :
Sanya
Print_ISBN :
978-1-4244-8438-6
Type :
conf
DOI :
10.1109/WISM.2010.59
Filename :
5662319
Link To Document :
بازگشت