DocumentCode
477745
Title
A Hybrid Statistical Language Model Applied to the Domain Specific Information Retrieval
Author
Wang, Wei ; Lin, Kunhui ; Zhou, Changle
Author_Institution
Software Sch., Xiamen Univ., Xiamen
Volume
2
fYear
2008
fDate
18-20 Oct. 2008
Firstpage
3
Lastpage
7
Abstract
The traditional language model takes the multi-topics document corpus as the research target. In order to avoid the interference brought by the multi-topics problem, this paper focuses on the domain specific Information Retrieval (IR). In domain specific IR, different terms are considered to take different contribution degrees to the final query result. So the terms in a document can be divided into different categories according to their contribution degrees. And the statistical information of a term, mainly its probabilities, is computed by different methods and smooth strategies according to its category. This paper proposed an improved hybrid statistical language model used in the Domain Specific IR. This new model has about 9%~10% performance increment in the experimental result. In the end, some challenges and research orientation of the statistical language model research are presented.
Keywords
information retrieval; probability; query languages; statistical analysis; domain specific information retrieval; multitopics document corpus; probability; query language; smooth strategy; statistical information; statistical language model; Computer science; Fuzzy systems; Handwriting recognition; Information retrieval; Interference; Natural languages; Probability distribution; Space technology; Speech recognition; Statistics;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
Conference_Location
Shandong
Print_ISBN
978-0-7695-3305-6
Type
conf
DOI
10.1109/FSKD.2008.240
Filename
4666069
Link To Document