Title :
Compressing Yahoo Mail
Author :
Bergman, Aran ; Zohar, Eyal
Author_Institution :
Dept. of Electr. Eng., Technion - Israel Inst. of Technol., Haifa, Israel
Abstract :
Yahoo mail servers have been receiving an enormous number of messages each day for the past 17 years. The vast majority of today´s messages are machine-generated (about 90% of the messages), based on a boilerplate with a small number of specific per-recipient changes. We show that the popular Zlib compression to gzip format fails to fully utilize the high similarity between these machine-generated messages. In this paper we analyze the data redundancy in Yahoo mail, and present methods to reduce its space requirements while using the standard Zlib library. Our results show we can further reduce the compressed data size by a factor of almost 2.5, compared to traditional gzip compression.
Keywords :
data compression; electronic mail; reliability; software libraries; Yahoo mail servers; Zlib compression; Zlib library; boilerplate; data redundancy; gzip compression; gzip format; machine-generated messages; specific per-recipient changes; Electronic mail; Libraries; Postal services; Redundancy; Servers; Size measurement; Standards; Compression; Deflate; Mail; Yahoo; Zlib; gzip;
Conference_Titel :
Data Compression Conference (DCC), 2015
Conference_Location :
Snowbird, UT
DOI :
10.1109/DCC.2015.15