Title :
Coding of Sets of Words
Author :
Reznik, Yuriy A.
Author_Institution :
Qualcomm Inc., San Diego, CA, USA
Abstract :
We study the problem of coding of unordered sets of words, appearing in natural language processing, retrieval, machine learning, computer vision, and other fields. We note that this problem is different from the problem of coding of a particular sequence of same words, and show that up to log(m!) bits (where m is the number of words in the set) can be saved by specialized codes for sets. We propose one possible design of such codes, and prove its asymptotic optimality in the memoryless model.
Keywords :
encoding; memoryless systems; optimisation; set theory; word processing; asymptotic optimality; coding of sets of words; memoryless model; natural language processing; unordered sets; Algorithm design and analysis; Binary trees; Channel coding; Indexes; Source coding; asymptotic analysis; data compression; digital search tree; entropy; information theory; multisets; source coding; tree enumeration; unordered sets;
Conference_Titel :
Data Compression Conference (DCC), 2011
Conference_Location :
Snowbird, UT
Print_ISBN :
978-1-61284-279-0
DOI :
10.1109/DCC.2011.12