Title :
Compressing string data in a database by using q-grams
Author :
Danesh, Amir Seyed ; Ahy, Abdollatif
Author_Institution :
Dept. of Software Eng., Univ. of Malaya, Kuala Lumpur, Malaysia
Abstract :
String data is ubiquitous, and its management has a particular importance. Managing string data, especially in databases is really prominent, for we see a large amount of this kind of data in usual databases and this issue persuade us to think about ways for compressing data with a good management. We can do this by using varied kinds of algorithms that nowadays we have. In this paper we present a new algorithm for compressing string data that based on approximate string processing. Commercial databases do not support approximate string queries directly, and it is a challenge to implement this functionality efficiently with user-defined functions. To do this we use small parts of each string that we call them q-gram, and processing them using standard methods available in the DBMS. We can implement this functionality on top of commercial databases by exploiting facilities already available in them.
Keywords :
data compression; database management systems; query processing; DBMS; approximate string processing; approximate string query; databases; q-grams; string data compression; string data management; user-defined functions; Computer science; Data engineering; Database systems; Engineering management; Information retrieval; Information technology; Protection; Software engineering; Technology management; DBMS; compressing; database; q-gram; string;
Conference_Titel :
Computer Engineering and Technology (ICCET), 2010 2nd International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-6347-3
DOI :
10.1109/ICCET.2010.5485484