• DocumentCode
    3227872
  • Title

    Data preprocessing by sequential pattern mining for LZW

  • Author

    Vergara-Villegas, Osslan O. ; García-Hernández, René A. ; Carrasco-Ochoa, J. Ariel ; Elías, Raul Pinto ; Martínez-Trinidad, José F.

  • Author_Institution
    Centro Nacional de Investigation y Desarrollo Tecnologico, Instituto Nacional de Astrofisica Optica y Electronica, Mexico
  • fYear
    2005
  • fDate
    26-30 Sept. 2005
  • Firstpage
    82
  • Lastpage
    87
  • Abstract
    LZW is a lossless data compression algorithm which has been incorporated as the standard of the Consultative Committee on International Telegraphy and Telephony. In addition, LZW is used to create GIF, TIFF and PDF files. In this paper, we propose an improvement to LZW using ideas from sequential pattern mining. The goal of this area is to find all the maximal frequent sequences (MFSs) which are sequences that appear at least β times and they are not subsequences of any other MFS. We preprocess the data using an algorithm for searching all the MFSs to manage the MFSs as part of the dictionary of LZW, according to the frequency of the MFS. This modification allows us to propose a new variant of LZW algorithm. Some experiments with text files, showing the compression rates of the proposed algorithm, were performed.
  • Keywords
    data compression; data mining; pattern recognition; Lempel-Ziv-Welch method; data preprocessing; graphic interchange format file; lossless data compression algorithm; maximal frequent sequence; portable document format file; sequential pattern mining; tagged image file format; text file; Bandwidth; Compression algorithms; Compressors; Data compression; Data preprocessing; Dictionaries; Frequency; Hard disks; Telegraphy; Telephony;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science, 2005. ENC 2005. Sixth Mexican International Conference on
  • ISSN
    1550-4069
  • Print_ISBN
    0-7695-2454-0
  • Type

    conf

  • DOI
    10.1109/ENC.2005.17
  • Filename
    1592204