DocumentCode :
2652947
Title :
A Combined Template-Based and Case-Based Metadata Extraction for Heterogeneous Thai Documents
Author :
Khankasikam, Krisda ; Chakpitak, Nopasit ; Udomsripaiboon, Thana
Author_Institution :
Sch. of Inf. Commun. & Technol., Naresuan Univ. Phayao, Phayao
fYear :
2009
fDate :
22-24 Jan. 2009
Firstpage :
292
Lastpage :
296
Abstract :
Nowadays, a number of universities, laboratories, government agencies and companies that placing theirs documents online and making them searchable are increasing because the Internet infrastructure for global data access is fully functional. However, a large number of organizations have documents that lack metadata. The lack of metadata breaks off not only the discovery and dissemination of these documents over the Internet, but also their connectivity with other documents. Unfortunately, manual metadata extraction is expensive and time-consuming for a large document, and most existing automated metadata extraction approaches have focused on specific domains and homogeneous documents. In this paper, we propose a combined cased-based and template-based metadata extraction approach to solve these issues. The key idea of solving the heterogeneity is to classify documents into equivalent groups so that each document group contains similar documents only. Next, for each document group we have a template of previous case that contains a process to extract metadata from documents in the group.
Keywords :
classification; document handling; information retrieval; meta data; case-based metadata extraction; document classification; heterogeneous Thai document; template-based metadata extraction; Art; Artificial intelligence; Communication system control; Communications technology; Data mining; Educational institutions; Government; Internet; Laboratories; Nonhomogeneous media; Case-based Reasoning; Heterogeneous Documents; Metada Extraction; Template-based Reasoning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Computer Control, 2009. ICACC '09. International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-1-4244-3330-8
Type :
conf
DOI :
10.1109/ICACC.2009.88
Filename :
4777353
Link To Document :
بازگشت