DocumentCode :
3108607
Title :
Answering English Queries in Automatically Transcribed Arabic Speech
Author :
Nwesri, Abdusalam F A ; Tahaghoghi, S.M.M. ; Scholer, Falk
Author_Institution :
RMIT Univ., Melbourne
fYear :
2007
fDate :
11-13 July 2007
Firstpage :
11
Lastpage :
16
Abstract :
There are several well-known approaches to parsing Arabic text in preparation for indexing and retrieval. Techniques such as stemming and stopping have been shown to improve search results on written newswire dispatches, but few comparisons are available on other data sources. In this paper, we apply several alternative stemming and stopping approaches to Arabic text automatically extracted from the audio soundtrack of news video footage, and compare these with approaches that rely on machine translation of the underlying text. Using the TRECVID video collection and queries, we show that normalisation, stopword- removal, and light stemming increase retrieval precision, but that heavy stemming and trigrams have a negative effect. We also show that the choice of machine translation engine plays a major role in retrieval effectiveness.
Keywords :
grammars; indexing; language translation; natural languages; speech recognition; text analysis; video retrieval; Arabic information retrieval; Arabic text; English query answering; audio soundtrack; automatically transcribed Arabic speech; indexing; machine translation engine; news video footage; parsing; Acoustic noise; Australia; Automatic speech recognition; Computer science; Data mining; Engines; Indexing; Information retrieval; Information technology; Shape; Arabic information retrieval; Cross-language; Machine translation.; information retrieval;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Science, 2007. ICIS 2007. 6th IEEE/ACIS International Conference on
Conference_Location :
Melbourne, Qld.
Print_ISBN :
0-7695-2841-4
Type :
conf
DOI :
10.1109/ICIS.2007.61
Filename :
4276350
Link To Document :
بازگشت