DocumentCode :
1696044
Title :
Removing non-informative blocks from the web pages
Author :
Gunasundari, R. ; Karthikeyan, S.
Author_Institution :
Karpagam Univ., Coimbatore, India
fYear :
2010
Firstpage :
810
Lastpage :
814
Abstract :
With the enormous growth on the web, users get easily lost in the rich hyper structure. Thus developing user friendly and automated tools for providing relevant information without any redundant links to the users to cater to their needs is the primary task for the website owners. But user is interested only in the informative contents and not in non-informative content blocks. Web pages often contain navigation sidebars, advertisements, search blocks, copyright notices, etc which are not content blocks. The information contained in these noncontent blocks can harm web mining. So it is important to separate the informative primary content blocks from noninformative blocks. In this paper are proposed three different algorithms for removing non-content blocks from the web pages. Removal of non-informative content blocks from web pages can achieve significant storage and time saving.
Keywords :
Web services; Web sites; content management; data mining; information retrieval; Web blocks; Web mining; Web pages; Website; informative contents; noisy blocks; non-informative content; Algorithm design and analysis; Data mining; Entropy; Feature extraction; HTML; Web pages; Web blocks; Web content mining; Web documents; noisy blocks;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communication Control and Computing Technologies (ICCCCT), 2010 IEEE International Conference on
Conference_Location :
Ramanathapuram
Print_ISBN :
978-1-4244-7769-2
Type :
conf
DOI :
10.1109/ICCCCT.2010.5670731
Filename :
5670731
Link To Document :
بازگشت