DocumentCode :
1649700
Title :
Extracting structures of HTML documents
Author :
Lim, Seung-Jin ; Ng, Yiu-Kai
Author_Institution :
Dept. of Comput. Sci., Brigham Young Univ., Provo, UT, USA
fYear :
1998
Firstpage :
420
Lastpage :
426
Abstract :
Information on the Web, which are conglomeration of heterogeneous data, such as texts, images and audio clips, are often accessed through documents written according to the HTML specification. According to the HTML specification, HTML documents are semistructured in nature. We propose a high-level stack machine (HSM) which accesses an HTML document through its URL and constructs a semistructured data graph (SDG) of the document. The SDG of an HTML document H precisely captures the structure of the semistructured data embedded in H based on the dependency relationship among the data objects in H. HSM is configurable to accommodate a user´s interest with respect to the HTML elements in H to be considered during the construction process of the SDG of H
Keywords :
Internet; hypermedia; query languages; HTML documents; HTML specification; Web; high-level stack machine; semistructured data graph; Computer science; Database languages; Electrical capacitance tomography; HTML; Hip; Information retrieval; Lips; Navigation; Uniform resource locators; World Wide Web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Networking, 1998. (ICOIN-12) Proceedings., Twelfth International Conference on
Conference_Location :
Tokyo
Print_ISBN :
0-8186-7225-0
Type :
conf
DOI :
10.1109/ICOIN.1998.648420
Filename :
648420
Link To Document :
بازگشت