• DocumentCode
    2315602
  • Title

    An Efficient String Matching Algorithm Using Super Alphabets

  • Author

    Prasad, Rajesh ; Agarwal, Suneeta

  • Author_Institution
    LDC Inst. of Tech. Studies, Allahabad
  • fYear
    2008
  • fDate
    16-18 July 2008
  • Firstpage
    1181
  • Lastpage
    1186
  • Abstract
    In the parameterized string matching, a given pattern P is said to match with a substring t of the text T, if there exist a bijection from the symbols of P to the symbols of t. This problem has an important application in software maintenance, where we wish to find the equivalency between two sections of codes. Two sections of codes are said to be equivalent, if one can be transformed into the other by renaming the identifiers and variables. In this paper, we extend parameterized shift-or (PSO) string matching algorithm (Kimmo Fredriksson & Maxim Mozgovoy, 2006) by using the concept of super alphabets. This new algorithm we call as parameterized super alphabet shift-or (PSSO) algorithm. Implementation results show that by using a super alphabet of size s, the PSO algorithm is speeded-up by a factor of s, where s is the size of the super alphabet (i.e. s is the number of characters processed simultaneously). We also show the performance of PSSO algorithm with respect to duplicity present in the code. However, PSSO algorithm is applicable only when pattern length (m) is less than or equal to word length (w) of computer used (i.e. m les w).
  • Keywords
    string matching; parameterized super alphabet shift-or algorithm; pattern length; software maintenance; super alphabets; Application software; Automata; Pattern matching; Plagiarism; Software maintenance; Algorithm; bit-parallelism; finite automata; parameterized matching; shift-or; string matching; super alphabet;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Emerging Trends in Engineering and Technology, 2008. ICETET '08. First International Conference on
  • Conference_Location
    Nagpur, Maharashtra
  • Print_ISBN
    978-0-7695-3267-7
  • Electronic_ISBN
    978-0-7695-3267-7
  • Type

    conf

  • DOI
    10.1109/ICETET.2008.146
  • Filename
    4580083