• DocumentCode
    259270
  • Title

    Mining Relevant Text Features for Retrieving Web Information

  • Author

    Pipanmaekaporn, Luepol ; Kamolsantiroj, Suwatchai

  • Author_Institution
    Dept. of Comput. & Inf. Sci., King Mongkut´s Univ. of Technol. North Bangkok, Bangkok, Thailand
  • fYear
    2014
  • fDate
    Aug. 31 2014-Sept. 4 2014
  • Firstpage
    447
  • Lastpage
    452
  • Abstract
    It is a big challenge to develop effective methods that can discover high quality and useful features in text documents. Most existing information retrieval and text mining methods focuses on term-based approach that often suffers from the problems of term variation and noise. This paper illustrates an innovative approach that discovers relevant knowledge to precisely describe text features for retrieving web information. In particular, it extracts precise text patterns by considering both relevant and irrelevant documents. Then, the discovered patterns are used to find accurate relevant features in a training set. The proposed approach has been evaluated through the implementation of a novel information filtering model and a comparative evaluation is conducted by invoking state-of-the-art models. The experimental results obtained based on the Reuters Corpus Volume 1 and TREC topics show that the proposed approach significantly outperforms the best baseline method.
  • Keywords
    Internet; information filtering; pattern recognition; text analysis; Reuters corpus volume 1; TREC topics; Web information retrieval; information filtering; term-based approach; text documents; text feature mining; text patterns; Data collection; Feature extraction; Noise measurement; Support vector machines; Text mining; Training; Feature Extraction; Feature Selection; Relevance Feedback and Text Mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Applied Informatics (IIAIAAI), 2014 IIAI 3rd International Conference on
  • Conference_Location
    Kitakyushu
  • Print_ISBN
    978-1-4799-4174-2
  • Type

    conf

  • DOI
    10.1109/IIAI-AAI.2014.96
  • Filename
    6913340