Title :
Applying Bayesian belief networks in approximate string matching for robust keyword-based retrieval
Author :
Schuller, Björn ; Muller, Ronald ; Rigoll, Gerhard ; Lang, Manfred
Author_Institution :
Inst. for Human-Machine Commun., Technische Univ. Munchen, Germany
Abstract :
We present a novel approach towards robust keyword-based retrieval. Bayesian belief networks are applied in a word-model based approximate string matching algorithm. Apart from a proven reliable performance in a working implementation on standard sources like digital text, wholly probabilistic modeling allows for integration of confidence measures and hypotheses obtained from preprocessing stages, like handwriting recognition or optical character recognition, respecting uncertainties on the lower levels. Furthermore, a flexible method to include the modeling of specific error types derived from humans and various input sources is provided. The remarkable performance of the algorithms presented was tested during extensive evaluation with respect to the Levenstein distance, which can be seen as the basis of state-of-the-art methods in this research field. The tests ran on a 14 K database containing common international music titles and four 10 K databases consisting of the most frequently used words in English, German, French and Dutch.
Keywords :
approximation theory; belief networks; information retrieval; natural languages; string matching; text analysis; 10 K; 14 K; Bayesian belief networks; Dutch; English; French; German; Levenstein distance; approximate string matching; confidence measures; digital text; handwriting recognition; international music titles; optical character recognition; probabilistic modeling; robust keyword-based retrieval; Bayesian methods; Character recognition; Databases; Handwriting recognition; Humans; Integrated optics; Measurement standards; Optical character recognition software; Robustness; Testing;
Conference_Titel :
Multimedia and Expo, 2004. ICME '04. 2004 IEEE International Conference on
Print_ISBN :
0-7803-8603-5
DOI :
10.1109/ICME.2004.1394655