DocumentCode :
2736868
Title :
Automatic Hypertext Table Understanding by using Logical Structure Description Algorithm
Author :
Huang, Chiung-Wei ; Chien, Chih-Yuan ; Hsu, Chun-Nan ; Lee, Hahn-Ming
Author_Institution :
Nat. Taiwan Univ. of Sci. & Technol., Taipei
fYear :
2007
fDate :
5-7 Sept. 2007
Firstpage :
247
Lastpage :
247
Abstract :
Due to focusing on template matching, conventional approaches bound their capability by complex and varied layout structures. This paper proposes a novel and efficient logical structure description algorithm, named structure description algorithm, to automatically extract logical structures from hypertext (Web) tables. Based on table field relationships, our approach starts from each data cell to search leftward and upward for its correlated headers. After that, rules for describing logical structure can be generated without defining the layout structure pattern in advance. In addition through the help of a table translation strategy, our method outputs a relational table which can be fed into a SQL database directly for information query and processing. Experimental results show that proposed method not only retains the logical structure in output relational table, but also outperforms two major methods on handling very complex Web tables.
Keywords :
Internet; SQL; query processing; relational databases; SQL database; Web tables; automatic hypertext table understanding; information query; logical structure description algorithm; structure description algorithm; template matching; Computer science; Data mining; HTML; Hydrogen; Information processing; Information retrieval; Pattern matching; Production; Relational databases; Service oriented architecture;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Innovative Computing, Information and Control, 2007. ICICIC '07. Second International Conference on
Conference_Location :
Kumamoto
Print_ISBN :
0-7695-2882-1
Type :
conf
DOI :
10.1109/ICICIC.2007.190
Filename :
4427892
Link To Document :
بازگشت