DocumentCode :
2547955
Title :
Experiment analysis in newspaper topic detection
Author :
Brun, Annelle ; Smaili, Kamel ; Haton, Jean-Paul
Author_Institution :
LORIA INRIA-Lorraine, Vandoeuvre-les-Nancy, France
fYear :
2000
fDate :
2000
Firstpage :
55
Lastpage :
64
Abstract :
We present several methods for topic detection on newspaper articles, using either a general vocabulary or topic-specific vocabularies. Specific vocabularies are determined manually or statistically. In both cases, we aim at finding the most representative words of a topic. Several methods have been experimented, the first one is based on perplexity, this method achieves a 100% topic identification rate, on large test corpora, when the two first propositions are taken into account. Other methods are based on statistical counts and achieve 94% of identification on smaller test corpora. The major challenge of this work is to identify topics with only few words in order to be able, during speech recognition, to determine the best adequate language model
Keywords :
natural languages; speech recognition; vocabulary; experiment analysis; language model; large test corpora; newspaper topic detection; perplexity; representative words; speech recognition; statistical counts; vocabulary; Acoustic testing; Automatic speech recognition; Character recognition; History; Natural languages; Predictive models; Speech recognition; Stochastic processes; Text recognition; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
String Processing and Information Retrieval, 2000. SPIRE 2000. Proceedings. Seventh International Symposium on
Conference_Location :
A Curuna
Print_ISBN :
0-7695-0746-8
Type :
conf
DOI :
10.1109/SPIRE.2000.878180
Filename :
878180
Link To Document :
بازگشت