• DocumentCode
    1880784
  • Title

    Rule-based pattern extractor and named entity recognition: A hybrid approach

  • Author

    Sari, Yunita ; Hassan, Mohd Fadzil ; Zamin, Norshuhani

  • Author_Institution
    Comput. & Inf. Sci. Dept., Univ. Teknol. PETRONAS, Tronoh, Malaysia
  • Volume
    2
  • fYear
    2010
  • fDate
    15-17 June 2010
  • Firstpage
    563
  • Lastpage
    568
  • Abstract
    Name Entity Recognition (NER) is one of the important tasks in Information Extraction (IE) research that has been flourishing for more than fifteen years ago. NER enables an IE system to recognize and classify information units in an unstructured text. This paper presents a Rule-based pattern extractor and a Semi-Supervised NER approach to automatically generate extraction pattern from a limited corpus and label the pre-defined entities in a collection of accident documents. Link Grammar parser and Stanford Part-of-Speech tagger are used in the pattern extractor to identify named entity and construct extraction pattern. The extraction pattern then feed to Semi-Supervised NER to categorize the entities into some predefined categories. Performance is evaluated using Exact Match evaluation and tested on two different entities-DATE and LOCATION. Using only two features, the system shows promising result.
  • Keywords
    category theory; data mining; feature extraction; grammars; natural language processing; pattern classification; text analysis; Exact Match evaluation; Stanford Part-of-Speech tagger; accident documents; entity categorization; information extraction research; information unit classification; information unit recognition; link grammar parser; named entity recognition; rule-based pattern extractor; unstructured text; Artificial neural networks; Feature extraction; Nickel; Software; Training; Link Grammar; Self-Training Algorithm; Stanford POS Tagger;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology (ITSim), 2010 International Symposium in
  • Conference_Location
    Kuala Lumpur
  • ISSN
    2155-897
  • Print_ISBN
    978-1-4244-6715-0
  • Type

    conf

  • DOI
    10.1109/ITSIM.2010.5561392
  • Filename
    5561392