DocumentCode :
228953
Title :
Defined entity extraction based on Indonesian text document
Author :
Mangasi, Tito ; Erwin, Alva ; Ipung, Heru Purnomo
Author_Institution :
Dept. of Inf. Technol., Swiss German Univ., Tangerang, Indonesia
fYear :
2014
fDate :
24-25 Sept. 2014
Firstpage :
61
Lastpage :
65
Abstract :
Entity Extraction basically is a part of process to extract document from unstructured metadata text documents. It is important to know whether the words stated in some documents are useful and contains of important information. With the growth of technology including website and internet, some involved in how semantic and technical challenged to make entity extraction much more efficient. In this case there are several tools that complied with existing name finder extraction. OpenNLP plays a good instrument to imply. Extracting entities such as person names, location and organization become terminology to defined the field of entity extraction. In generating the model for training set, Indonesian articles and documents need to be plenty and diverse so those entity easily to know exactly how to differentiate each other entities. There are several problems that necessary to minimize such as accuracy and efficiency. Percentage of word inside training set also need to have more custom and unique sentence. The result shown will be based on training set and the model generated. Mainly whole articles are in Indonesian language and this is not yet created in OpenNLP models.
Keywords :
Internet; natural language processing; text analysis; Indonesian articles; Indonesian language; Indonesian text document extraction; Internet; OpenNLP models; Web site; defined entity extraction; name finder extraction; person names; unstructured metadata text documents; Data mining; Entropy; Feature extraction; Information retrieval; Natural language processing; Organizations; Training; Entity Extraction; Entity Models; OpenNLP; Training Set;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
ICT For Smart Society (ICISS), 2014 International Conference on
Conference_Location :
Bandung
Type :
conf
DOI :
10.1109/ICTSS.2014.7013152
Filename :
7013152
Link To Document :
بازگشت