Mining Relevant Text Features for Retrieving Web Information

Author

Pipanmaekaporn, Luepol ; Kamolsantiroj, Suwatchai

Author_Institution

Dept. of Comput. & Inf. Sci., King Mongkut´s Univ. of Technol. North Bangkok, Bangkok, Thailand

fYear

2014

fDate

Aug. 31 2014-Sept. 4 2014

Firstpage

447

Lastpage

452

Abstract

It is a big challenge to develop effective methods that can discover high quality and useful features in text documents. Most existing information retrieval and text mining methods focuses on term-based approach that often suffers from the problems of term variation and noise. This paper illustrates an innovative approach that discovers relevant knowledge to precisely describe text features for retrieving web information. In particular, it extracts precise text patterns by considering both relevant and irrelevant documents. Then, the discovered patterns are used to find accurate relevant features in a training set. The proposed approach has been evaluated through the implementation of a novel information filtering model and a comparative evaluation is conducted by invoking state-of-the-art models. The experimental results obtained based on the Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms the best baseline method.

Keywords

Internet; information filtering; pattern recognition; text analysis; Reuters corpus volume 1; TREC topics; Web information retrieval; information filtering; term-based approach; text documents; text feature mining; text patterns; Data collection; Feature extraction; Noise measurement; Support vector machines; Text mining; Training; Feature Extraction; Feature Selection; Relevance Feedback and Text Mining;

fLanguage

English

Publisher

ieee

Conference_Titel

Advanced Applied Informatics (IIAIAAI), 2014 IIAI 3rd International Conference on

Conference_Location

Kitakyushu

Print_ISBN

978-1-4799-4174-2

Type

conf

DOI

10.1109/IIAI-AAI.2014.96

Filename

6913340