DocumentCode :
477745
Title :
A Hybrid Statistical Language Model Applied to the Domain Specific Information Retrieval
Author :
Wang, Wei ; Lin, Kunhui ; Zhou, Changle
Author_Institution :
Software Sch., Xiamen Univ., Xiamen
Volume :
2
fYear :
2008
fDate :
18-20 Oct. 2008
Firstpage :
3
Lastpage :
7
Abstract :
The traditional language model takes the multi-topics document corpus as the research target. In order to avoid the interference brought by the multi-topics problem, this paper focuses on the domain specific Information Retrieval (IR). In domain specific IR, different terms are considered to take different contribution degrees to the final query result. So the terms in a document can be divided into different categories according to their contribution degrees. And the statistical information of a term, mainly its probabilities, is computed by different methods and smooth strategies according to its category. This paper proposed an improved hybrid statistical language model used in the Domain Specific IR. This new model has about 9%~10% performance increment in the experimental result. In the end, some challenges and research orientation of the statistical language model research are presented.
Keywords :
information retrieval; probability; query languages; statistical analysis; domain specific information retrieval; multitopics document corpus; probability; query language; smooth strategy; statistical information; statistical language model; Computer science; Fuzzy systems; Handwriting recognition; Information retrieval; Interference; Natural languages; Probability distribution; Space technology; Speech recognition; Statistics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location :
Shandong
Print_ISBN :
978-0-7695-3305-6
Type :
conf
DOI :
10.1109/FSKD.2008.240
Filename :
4666069
Link To Document :
بازگشت