DocumentCode :
441867
Title :
Algorithms of mining intact record from isomorphic Web page
Author :
Qiu, Yong ; Lan, Yong-Jie
Author_Institution :
Sch. of Inf. & Electron. Eng., Shanghai Inst. of Bus. & Technol., China
Volume :
4
fYear :
2005
fDate :
18-21 Aug. 2005
Firstpage :
2373
Abstract :
The huge amount of information available on the Web has attracted many research efforts into developing tools to extract data from Web pages. Many Web pages are generated automatically from an underlying database; therefore, the HTML structure of pages is fairly specific and regular. Some existing algorithms like OMINI, MDR can extract information from multi-recording Web pages, the main point is to identify repetitive record structure automatically. However, Web pages maintain multi-records are actually directory page, the information in directory page is not intact; the intact information exists in lower level Web page, called detailed page. A detailed page has one record information only, so it can not be extracted using duplicated record finding algorithm. To solve this problem, extracting intact information from Web, a concept of isomorphic Web page is introduced, and two algorithm are proposed, one algorithm for finding directory that has isomorphic Web pages, the other for mining record information from isomorphic Web pages.
Keywords :
Internet; data mining; hypermedia markup languages; information retrieval; HTML; detailed page; directory page; duplicated record finding algorithm; isomorphic Web page; Data engineering; Data mining; Databases; Electronic mail; HTML; Local area networks; Machine learning; Software systems; Web mining; Web pages; Information Extracting; WEB; WEB mining; isomorphic webpage; webpage;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location :
Guangzhou, China
Print_ISBN :
0-7803-9091-1
Type :
conf
DOI :
10.1109/ICMLC.2005.1527341
Filename :
1527341
Link To Document :
بازگشت