Title : 
Automatic metadata extraction and classification of spreadsheet documents based on layout similarity
         
        
            Author : 
Chatvichienchai, Somchai
         
        
            Author_Institution : 
Dept. of Inf. & Media Studies, Univ. of Nagasaki, Nagasaki, Japan
         
        
        
            fDate : 
Nov. 29 2011-Dec. 1 2011
         
        
        
        
            Abstract : 
Effective information search is becoming a key success for business. Metadata is an essential part of modern information system since it helps people to find relevant documents from disparate repositories. Automatic document metadata extraction has received attention in recent years as it is an important task in generating powerful search indices to support effective information search. The objective of this paper is to propose an innovative method that automatically performs metadata extraction and classification on the spreadsheets having layout similar to that of a given sample spreadsheet whose metadata is previously defined. Metadata classification is based on document types (e.g. purchase order, sales report etc) and data context (e.g. customer name, order date etc) so that users can define the meanings of the keywords in the search query. Therefore, search engine of this work returns the search results that match user search intention more than those of conventional keyword search engines.
         
        
            Keywords : 
classification; document handling; meta data; query processing; search engines; spreadsheet programs; automatic document metadata extraction; information search; layout similarity; metadata classification; search engine; search query; spreadsheet document classification; Crawlers; Data mining; Indexes; Layout; Organizations; Search problems; XML;
         
        
        
        
            Conference_Titel : 
Advanced Information Management and Service (ICIPM), 2011 7th International Conference on
         
        
            Conference_Location : 
Jeju
         
        
            Print_ISBN : 
978-1-4577-0471-0