DocumentCode :
507588
Title :
Text Surface Features for Genre Request in Information Retrieval
Author :
Xiong, Wenxin
Author_Institution :
Nat. Res. Center for Foreign Language Educ., Beijing Foreign Studies Univ., Beijing, China
Volume :
1
fYear :
2009
fDate :
Nov. 30 2009-Dec. 1 2009
Firstpage :
287
Lastpage :
290
Abstract :
Traditional information retrieval focuses on topic relevance by computing the similarity between query and texts using content-based bag of words (BOW) strategy. This approach cannot handle the terms expressing genre request in user´s query, which may not occur in target texts with the same form or other derivative forms. We propose a post text classification on the results returned by search engines to meet the request expressed by genre vocabulary in queries. Some statistics of text surface features for genre detection, i.e. average sentence length, number of specific part of speech and punctuations are examined. An experiment on identifying narrative texts and commentaries about a news event by using the above-mentioned variables has been conducted, and yielded an encouraging result.
Keywords :
content-based retrieval; search engines; text analysis; commentary identification; content-based bag of words strategy; genre request; genre vocabulary; information retrieval; narrative text identification; news event; post text classification; query; search engines; text surface features; topic relevance; Concrete; Content based retrieval; Frequency; Information retrieval; Knowledge acquisition; Libraries; Search engines; Statistics; Text categorization; Vocabulary; genre; information retrieval; query; text surface features;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Knowledge Acquisition and Modeling, 2009. KAM '09. Second International Symposium on
Conference_Location :
Wuhan
Print_ISBN :
978-0-7695-3888-4
Type :
conf
DOI :
10.1109/KAM.2009.263
Filename :
5362188
Link To Document :
بازگشت