DocumentCode
228953
Title
Defined entity extraction based on Indonesian text document
Author
Mangasi, Tito ; Erwin, Alva ; Ipung, Heru Purnomo
Author_Institution
Dept. of Inf. Technol., Swiss German Univ., Tangerang, Indonesia
fYear
2014
fDate
24-25 Sept. 2014
Firstpage
61
Lastpage
65
Abstract
Entity Extraction basically is a part of process to extract document from unstructured metadata text documents. It is important to know whether the words stated in some documents are useful and contains of important information. With the growth of technology including website and internet, some involved in how semantic and technical challenged to make entity extraction much more efficient. In this case there are several tools that complied with existing name finder extraction. OpenNLP plays a good instrument to imply. Extracting entities such as person names, location and organization become terminology to defined the field of entity extraction. In generating the model for training set, Indonesian articles and documents need to be plenty and diverse so those entity easily to know exactly how to differentiate each other entities. There are several problems that necessary to minimize such as accuracy and efficiency. Percentage of word inside training set also need to have more custom and unique sentence. The result shown will be based on training set and the model generated. Mainly whole articles are in Indonesian language and this is not yet created in OpenNLP models.
Keywords
Internet; natural language processing; text analysis; Indonesian articles; Indonesian language; Indonesian text document extraction; Internet; OpenNLP models; Web site; defined entity extraction; name finder extraction; person names; unstructured metadata text documents; Data mining; Entropy; Feature extraction; Information retrieval; Natural language processing; Organizations; Training; Entity Extraction; Entity Models; OpenNLP; Training Set;
fLanguage
English
Publisher
ieee
Conference_Titel
ICT For Smart Society (ICISS), 2014 International Conference on
Conference_Location
Bandung
Type
conf
DOI
10.1109/ICTSS.2014.7013152
Filename
7013152
Link To Document