DocumentCode :
3667303
Title :
A hybrid method for Persian Named Entity Recognition
Author :
Farid Ahmadi;Hamed Moradi
Author_Institution :
Department of Information Technology, Urmia University of Technology, Iran
fYear :
2015
fDate :
5/1/2015 12:00:00 AM
Firstpage :
1
Lastpage :
7
Abstract :
Named Entity Recognition (NER) is an information extraction subtask that attempts to recognize and categorize named entities in unstructured text into predefined categories such as the names of people, organizations, and locations. Recently, machine learning approaches, such as Hidden Markov Model (HMM) as well as hybrid methods, are frequently used to solve Name Entity Recognition. Since the absence of publicly available data sets for NER in Persian, as our knowledge does not exist any machine learning base Persian NER system. Because of HMM innate weaknesses, in this paper, we have used both Hidden Markov Model and rule-based method to recognize named entities in Persian texts. The combination of rule-based method and machine learning method results in a high accurate recognition. The proposed system in is machine learning section uses from HMM and Viterbi algorithms; and in its rule-based section employs a set of lexical resources and pattern bases for the recognition of named entities including the names of people, locations and organizations. During this study, we annotate our own training and testing data sets to use in the related phases. Our hybrid approach performs on Persian language with 89.73% precision, 82.44% recall, and 85.93% F-measure using an annotated test corpus including 32,606 tokens.
Keywords :
"Hidden Markov models","Organizations","Decoding","Markov processes"
Publisher :
ieee
Conference_Titel :
Information and Knowledge Technology (IKT), 2015 7th Conference on
Print_ISBN :
978-1-4673-7483-5
Type :
conf
DOI :
10.1109/IKT.2015.7288806
Filename :
7288806
Link To Document :
بازگشت