DocumentCode :
2944647
Title :
Coding of Sets of Words
Author :
Reznik, Yuriy A.
Author_Institution :
Qualcomm Inc., San Diego, CA, USA
fYear :
2011
fDate :
29-31 March 2011
Firstpage :
43
Lastpage :
52
Abstract :
We study the problem of coding of unordered sets of words, appearing in natural language processing, retrieval, machine learning, computer vision, and other fields. We note that this problem is different from the problem of coding of a particular sequence of same words, and show that up to log(m!) bits (where m is the number of words in the set) can be saved by specialized codes for sets. We propose one possible design of such codes, and prove its asymptotic optimality in the memoryless model.
Keywords :
encoding; memoryless systems; optimisation; set theory; word processing; asymptotic optimality; coding of sets of words; memoryless model; natural language processing; unordered sets; Algorithm design and analysis; Binary trees; Channel coding; Indexes; Source coding; asymptotic analysis; data compression; digital search tree; entropy; information theory; multisets; source coding; tree enumeration; unordered sets;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference (DCC), 2011
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
978-1-61284-279-0
Type :
conf
DOI :
10.1109/DCC.2011.12
Filename :
5749462
Link To Document :
بازگشت