DocumentCode :
2684762
Title :
Data compression using long common strings
Author :
Bentley, Jon ; McIlroy, Douglas
Author_Institution :
AT&T Bell Labs., Murray Hill, NJ, USA
fYear :
1999
fDate :
29-31 Mar 1999
Firstpage :
287
Lastpage :
295
Abstract :
We describe a precompression algorithm that effectively represents any long common strings that appear in a file. The algorithm interacts well with standard compression algorithms that represent shorter strings that are near in the input text. Our experiments show that some real data sets do indeed contain many long common strings. We extend the fingerprint mechanisms of our algorithm to a program that identifies long common strings in an input file. This program gives interesting insights into the structure of real data files that contain long common strings
Keywords :
data compression; data structures; string matching; text analysis; data compression; data file structure; data sets; fingerprint mechanisms; long common strings; precompression algorithm; string representation; text; Code standards; Compression algorithms; Computer science; Constitution; Data compression; Educational institutions; Fingerprint recognition; Plagiarism; Software libraries; Software systems;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 1999. Proceedings. DCC '99
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
0-7695-0096-X
Type :
conf
DOI :
10.1109/DCC.1999.755678
Filename :
755678
Link To Document :
بازگشت