Title :
Towards spoken-document retrieval for the enterprise: Approximate word-lattice indexing with text indexers
Author :
Seide, Frank ; Yu, Peng ; Shi, And Yu
Author_Institution :
Microsoft Res. Asia, Beijing
Abstract :
Enterprise-scale search engines are generally designed for linear text. Linear text is suboptimal for audio search, where accuracy can be significantly improved if the search includes alternate recognition candidates, commonly represented as word lattices. We propose two methods to enable text indexers to approximately index lattices with little or no code change: "TMI" (Time-based Merging for Indexing) aims at lattice-index size reduction, and the "sausage"-like "TALE" (Time-Anchored Lattice Expansion) approximation requires no indexer-code or data-format changes at all. On four enterprise-type data sets (meetings, phone calls, lectures, and voicemail), TMI and TALE improve accuracy by 30-60% for multi-word phrase searches and by 130% for two-term AND queries, compared to indexing linear text.
Keywords :
indexing; information retrieval; text analysis; linear text; spoken-document retrieval; text indexers; time-anchored lattice expansion; time-based merging for indexing; word-lattice indexing; Broadcasting; Indexing; Information retrieval; Internet; Lattices; Merging; Search engines; Speech recognition; Videos; Voice mail; Audio indexing; keyword spotting; lattice; posterior;
Conference_Titel :
Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on
Conference_Location :
Kyoto
Print_ISBN :
978-1-4244-1746-9
Electronic_ISBN :
978-1-4244-1746-9
DOI :
10.1109/ASRU.2007.4430185