Title :
On optimizing syntactic pattern recognition using tries and AI-based heuristic-search strategies
Author :
Badr, Ghada ; Oommen, B. John
Author_Institution :
Sch. of Comput. Sci., Carleton Univ., Ottawa, Ont., Canada
fDate :
6/1/2005 12:00:00 AM
Abstract :
This paper deals with the problem of estimating, using enhanced artificial-intelligence (AI) techniques, a transmitted string X* by processing the corresponding string Y, which is a noisy version of X*. It is assumed that Y contains substitution, insertion, and deletion (SID) errors. The best estimate X+ of X* is defined as that element of a dictionary H that minimizes the generalized Levenshtein distance (GLD) D(X,Y) between X and Y, for all X∈H. In this paper, it is shown how to evaluate D(X,Y) for every X∈H simultaneously, when the edit distances are general and the maximum number of errors is not given a priori, and when H is stored as a trie. A new scheme called clustered beam search (CBS) is first introduced, which is a heuristic-based search approach that enhances the well-known beam-search (BS) techniques used in AI. The new scheme is then applied to the approximate string-matching problem when the dictionary is stored as a trie. The new technique is compared with the benchmark depth-first search (DFS) trie-based technique (with respect to time and accuracy) using large and small dictionaries. The results demonstrate a marked improvement of up to 75% with respect to the total number of operations needed on three benchmark dictionaries, while yielding an accuracy comparable to the optimal. Experiments are also done to show the benefits of the CBS over the BS when the search is done on the trie. The results also demonstrate a marked improvement (more than 91%) for large dictionaries.
Keywords :
heuristic programming; learning (artificial intelligence); string matching; tree searching; AI-based heuristic-search strategy; CBS; DFS; GLD; clustered beam search; depth-first search; enhanced artificial-intelligence techniques; generalized Levenshtein distance; optimizing syntactic pattern recognition; string-matching problem; trie-based technique; Artificial intelligence; Computer science; Costs; Data structures; Dictionaries; Pattern matching; Pattern recognition; Approximate string matching; artificial intelligence (AI); local beam search (BS); noisy syntactic recognition using tries; trie-based syntactic pattern recognition (PR); Algorithms; Artificial Intelligence; Information Storage and Retrieval; Language; Natural Language Processing; Pattern Recognition, Automated; Speech Recognition Software;
Journal_Title :
Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
DOI :
10.1109/TSMCB.2005.861860