Title :
Three level method using machine learning and rule based approach for extracting Web-table information
Author :
Jung, Sung-Wong ; Lim, Sung-Shin ; Kwon, Hyuk-Chul
Author_Institution :
Dept. of Comput. Sci. & Eng., Pusan Nat. Univ., South Korea
Abstract :
Generally, Authors of HTML documents use various methods to clearly convey their intention. The table is the preeminent method among these, because the table contains meaningful data displayed in a structure with rows and columns. However, on the Internet, tables are used for the purpose of the knowledge structuring as well as design of documents. It is not easy task to distinguish those two tables because HTML does not separate presentation and structure. This makes information extracting from those tables more difficult. Therefore, in this paper, we are firstly interested in classifying tables into two types: meaningful tables and decorative tables. After that we extract information from meaningful tables.
Keywords :
Internet; hypermedia markup languages; information retrieval; knowledge based systems; learning (artificial intelligence); HTML documents; Internet; Web-table information extracting; decorative tables; documents design; knowledge structuring; machine learning; meaningful tables; preeminent method; rule based approach; Animation; Computer science; Data mining; HTML; Internet; Machine learning; Pressing; Protocols; Shape; Stochastic processes;
Conference_Titel :
Industrial Electronics Society, 2004. IECON 2004. 30th Annual Conference of IEEE
Print_ISBN :
0-7803-8730-9
DOI :
10.1109/IECON.2004.1432313