DocumentCode :
2036782
Title :
New techniques for the discovery of logical documents in Web
Author :
Tajima, Keishi ; Tanaka, Katsumi
Author_Institution :
Dept. of Comput. & Syst. Eng., Kobe Univ., Japan
fYear :
1999
fDate :
1999
Firstpage :
125
Lastpage :
132
Abstract :
We propose a method of identifying logical documents in Web data. Pages in Web data are sometimes designed for presentation and do not always reflect logical structure, while a logical document is a data unit representing logical structure. One logical document often corresponds to a connected subgraph consisting of multiple pages. Therefore, for various Web data processing that should capture logical structure, such as querying facilities, extended support for user navigation, and Web structure analysis, logical documents are more appropriate data units than pages. We develop a method of identifying such logical documents in existing Web data. Our method uses three kinds of information: link structure, directory structure embedded in URIs, and page contents
Keywords :
Internet; data mining; information resources; information retrieval; Web data processing; Web logical document discovery; Web pages; connected subgraph; directory structure; link structure; logical structure; page contents; querying; user navigation; Data processing; Legged locomotion; Uniform resource locators;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database Applications in Non-Traditional Environments, 1999. (DANTE '99) Proceedings. 1999 International Symposium on
Conference_Location :
Kyoto
Print_ISBN :
0-7695-0496-5
Type :
conf
DOI :
10.1109/DANTE.1999.844950
Filename :
844950
Link To Document :
بازگشت