مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

1649700

Title :

Extracting structures of HTML documents

Author :

Lim, Seung-Jin ; Ng, Yiu-Kai

Author_Institution :

Dept. of Comput. Sci., Brigham Young Univ., Provo, UT, USA

fYear :

1998

Firstpage :

420

Lastpage :

426

Abstract :

Information on the Web, which are conglomeration of heterogeneous data, such as texts, images and audio clips, are often accessed through documents written according to the HTML specification. According to the HTML specification, HTML documents are semistructured in nature. We propose a high-level stack machine (HSM) which accesses an HTML document through its URL and constructs a semistructured data graph (SDG) of the document. The SDG of an HTML document H precisely captures the structure of the semistructured data embedded in H based on the dependency relationship among the data objects in H. HSM is configurable to accommodate a user´s interest with respect to the HTML elements in H to be considered during the construction process of the SDG of H

Keywords :

Internet; hypermedia; query languages; HTML documents; HTML specification; Web; high-level stack machine; semistructured data graph; Computer science; Database languages; Electrical capacitance tomography; HTML; Hip; Information retrieval; Lips; Navigation; Uniform resource locators; World Wide Web;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information Networking, 1998. (ICOIN-12) Proceedings., Twelfth International Conference on

Conference_Location :

Tokyo

Print_ISBN :

0-8186-7225-0

Type :

conf

DOI :

10.1109/ICOIN.1998.648420

Filename :

648420

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1649700