• DocumentCode
    2752105
  • Title

    A Hybrid Approach to Semi-supervised Named Entity Recognition in Health, Safety and Environment Reports

  • Author

    Sari, Yunita ; Hassan, M. Fadzil ; Zamin, Norshuhani

  • Author_Institution
    Dept. of Inf. & Comput. Sci., Univ. Teknol. PETRONAS, Tronoh, Malaysia
  • fYear
    2009
  • fDate
    3-5 April 2009
  • Firstpage
    599
  • Lastpage
    602
  • Abstract
    In the last few years, text mining have become the area of interests in Natural Language Processing (NLP). They share a similar idea i.e. to extract important facts from unstructured text which later help to populate database entries. Name Entity Recognition (NER) is one of the main task needed to develop text mining systems in which it is used to identify and classify entities in the text into predefined categories such as the names of persons, organizations, locations, dates, times, quantities, monetary values, percentages, etc. This paper focuses on studying the optimum solution to perform NER. To achieve our target, Health Safety and Environment (HSE) reports available from the University Teknologi PETRONAS (UTP) are chosen as the case study. The UTPpsilas HSE reports are the investigation reports which contain the information on incidents and accidents occurred during the daily operations. Many algorithms have been reported for NER ranging from simple statistical methods to advanced Natural language Processing (NLP) methods. This paper describes the possibility to apply Link Grammar (LG) and Basilisk Algorithm in NER.
  • Keywords
    data mining; environmental factors; health and safety; natural language processing; text analysis; basilisk algorithm; database entries; environment reports; health reports; link grammar; name entity recognition; natural language processing; predefined categories; safety reports; semisupervised named entity recognition; statistical method; text mining systems; Accidents; Data mining; Databases; Dictionaries; Health and safety; Natural language processing; Pattern analysis; Statistical analysis; Text mining; Text recognition; Basilisk Algorithm.; Health and Safety Environment; Link Grammar; Name Entity Recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Future Computer and Communication, 2009. ICFCC 2009. International Conference on
  • Conference_Location
    Kuala Lumpar
  • Print_ISBN
    978-0-7695-3591-3
  • Type

    conf

  • DOI
    10.1109/ICFCC.2009.52
  • Filename
    5189853