Abstract :
As part of a framework for a multimodal mathematical formula editor which will support natural speech and handwriting interaction, a single stage speech understanding module is presented. It is based on a multilevel statistical, expectation driven approach. Completely spoken realistic formulas containing basic arithmetic operations, roots, indexed sums, integrals, trigonometric functions, logarithms, convolutions, fourier transforms, exponentiations, and indexing (among others) were examined. The speaker specific or formula specific structural recognition accuracies reach up to 90% or 100%, respectively. For visualization and postprocessing purposes, a transformation into Adobe(R) FrameMaker(R) documents is performed. An advanced variant of this architecture will further be utilized as the basis for a multimodal semantic decoder incorporating combined script and speech analysis. It will enclose a so-called multimodal probabilistic grammar which will be trained via multimodal usability tests
Keywords :
grammars; handwritten character recognition; mathematics computing; speech recognition; speech-based user interfaces; Adobe FrameMaker documents; arithmetic; handwriting recognition; multimodal mathematical formula editor; multimodal probabilistic grammar; multimodal semantic decoder; speech recognition; speech understanding; statistical expectation driven approach; usability; visualization; Decoding; Desktop publishing; Educational institutions; Indexing; Lungs; Man machine systems; Natural languages; Speech; Testing; Typesetting;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 2000. ICASSP '00. Proceedings. 2000 IEEE International Conference on