DocumentCode
1965211
Title
Efficient fuzzy search enabled hash map
Author
Topac, V.
Author_Institution
Dept. of Comput. Sci. & Inf. Technol., Politeh. Univ. of Timisoara, Timisoara, Romania
fYear
2010
fDate
15-17 July 2010
Firstpage
39
Lastpage
44
Abstract
Hash maps are data structures widely used in modern programming languages like Java for their simplicity and efficiency. When fuzzy string search is needed (like in natural language processing) finding an approximate key match in a regular Java HashMap is a trivial task. It usually requires the brute force method of iterating trough the set of keys and use of string metrics methods. Although this approach works it is time consuming and loses the hashing advantage of the hash map. Another option is to use a different data structure like TreeMap, which is faster, but also have limitations on fuzzy string search. This article presents FuzzyHashMap, an extension to the regular Java HashMap data structure allowing highly efficient fuzzy string key search. Based on object oriented principles this extended hash map uses a custom key that enables different types of pre-hashing functions and different types of dynamic programming algorithms for approximate string matching. Customizable algorithms and settings bring flexibility to this new data structure, making it adaptable to each specific use case. Fuzzy string search performance comparison between FuzzyHashMap and the regular HashMap are presented for both accuracy and time consumption. Results show very good performance for FuzzyHashMap compared to the regular HashMap. Some real use cases for the extended hash map are listed. All the work described in this paper is released as open source, making it easy for the community to use and extend the capabilities of the current implementation.
Keywords
Java; data structures; dynamic programming; fuzzy set theory; public domain software; string matching; FuzzyHashMap; Java; TreeMap; data structures; dynamic programming algorithms; fuzzy string search; natural language processing; open source; programming languages; string matching; string metrics methods; Approximation algorithms; Communities; Hash Map; Java; approximate string matching; fuzzy search;
fLanguage
English
Publisher
ieee
Conference_Titel
Soft Computing Applications (SOFA), 2010 4th International Workshop on
Conference_Location
Arad
Print_ISBN
978-1-4244-7985-6
Type
conf
DOI
10.1109/SOFA.2010.5565628
Filename
5565628
Link To Document