Title :
Text Surface Features for Genre Request in Information Retrieval
Author_Institution :
Nat. Res. Center for Foreign Language Educ., Beijing Foreign Studies Univ., Beijing, China
fDate :
Nov. 30 2009-Dec. 1 2009
Abstract :
Traditional information retrieval focuses on topic relevance by computing the similarity between query and texts using content-based bag of words (BOW) strategy. This approach cannot handle the terms expressing genre request in user´s query, which may not occur in target texts with the same form or other derivative forms. We propose a post text classification on the results returned by search engines to meet the request expressed by genre vocabulary in queries. Some statistics of text surface features for genre detection, i.e. average sentence length, number of specific part of speech and punctuations are examined. An experiment on identifying narrative texts and commentaries about a news event by using the above-mentioned variables has been conducted, and yielded an encouraging result.
Keywords :
content-based retrieval; search engines; text analysis; commentary identification; content-based bag of words strategy; genre request; genre vocabulary; information retrieval; narrative text identification; news event; post text classification; query; search engines; text surface features; topic relevance; Concrete; Content based retrieval; Frequency; Information retrieval; Knowledge acquisition; Libraries; Search engines; Statistics; Text categorization; Vocabulary; genre; information retrieval; query; text surface features;
Conference_Titel :
Knowledge Acquisition and Modeling, 2009. KAM '09. Second International Symposium on
Conference_Location :
Wuhan
Print_ISBN :
978-0-7695-3888-4
DOI :
10.1109/KAM.2009.263