Title : 
ViQueL: A Spatial Query Language for Presentation-Oriented Documents
         
        
            Author : 
Ora, E. ; Riccetti, Francesco ; Ruffolo, Massimo
         
        
            Author_Institution : 
DEIS, Altilia srl, Univ. of Calabria, Rende, Italy
         
        
        
        
        
        
            Abstract : 
In last years the huge relevance of accessing and acquiring information made available by Web pages and business documents has grown much further. Thus, wrapping information from documents in HTML and PDF formats is receiving increasing interest. In this paper we present a textual query language, named ViQueL, that allows for querying information in both Web and PDF documents on the base of its spatial arrangement. The proposed language is founded on spatial grammars, i.e. context free grammars extended by spatial constructs. The main feature of ViQueL is that it make possible to identify and extract relevant information from HTML and PDF documents on the base of their visual appearance by using easy-to-write queries. Despite a considerable expressive power, combined complexity of ViQueL is in P-Time. Moreover, experiments show that ViQueL is reasonably efficient for real life extraction tasks.
         
        
            Keywords : 
hypermedia markup languages; query languages; HTML; PDF document; ViQueL; Web page; information acquiring; presentation oriented document; spatial arrangement; spatial query language; Data mining; Database languages; Grammar; HTML; Visualization; Web pages; Wrapping; Context Free Grammars; Information Extraction; Qualitative Spatial Reasoning; Wrapping;
         
        
        
        
            Conference_Titel : 
Tools with Artificial Intelligence (ICTAI), 2010 22nd IEEE International Conference on
         
        
            Conference_Location : 
Arras
         
        
        
            Print_ISBN : 
978-1-4244-8817-9
         
        
        
            DOI : 
10.1109/ICTAI.2010.121