• DocumentCode
    1563696
  • Title

    Approximate Searching on Compressed Text

  • Author

    Pérez, Carlos Avendaño ; Uribe, Claudia Feregrino

  • Author_Institution
    INAOE, Puebla, Mexico
  • fYear
    2005
  • Firstpage
    258
  • Lastpage
    261
  • Abstract
    The approximate searching problem on compressed text tries to find all the matches of a pattern in a compressed text, without decompressing it and considering that the match of the pattern with the text can have a limited number of differences. This problem has diverse applications in information retrieval, computational biology and signal processing, among others. One of the best solutions to this problem is to execute a multipattern search of a set of pieces of the pattern, followed by a local decompression and a direct verification in the decompressed areas. In this work an improvement to this solution concerning verification is presented, where instead of executing a decompression process and searching for the pattern, bit-parallel automata are constructed that recognize the pattern. In this way, we perform the entire searching process without decompressing the text and obtain competitive times, compared to decompressing text and searching it with the best existing algorithms.
  • Keywords
    data compression; search problems; string matching; text analysis; approximate searching; bit-parallel automata; compressed text; computational biology; direct verification; information retrieval; local decompression; multipattern search; pattern matching; signal processing; Automata; Biomedical signal processing; Computational biology; Concurrent computing; Dynamic programming; Information retrieval; Pattern matching; Pattern recognition; Signal processing algorithms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Electronics, Communications and Computers, 2005. CONIELECOMP 2005. Proceedings. 15th International Conference on
  • Print_ISBN
    0-7695-2283-1
  • Type

    conf

  • DOI
    10.1109/CONIEL.2005.23
  • Filename
    1488570