DocumentCode :
3637592
Title :
Text-Based Web Page Classification with Use of Visual Information
Author :
Vladimír Bartík
Author_Institution :
Dept. of Inf. Syst., Brno Univ. of Technol., Brno, Czech Republic
fYear :
2010
Firstpage :
416
Lastpage :
420
Abstract :
As the number of pages on the web is permanently increasing, there is a need to classify pages into categories to facilitate indexing or searching them. In the method proposed here, we use both textual and visual information to find a suitable representation of web page content. In this paper, several term weights, based on TF or TF-IDF weighting are proposed. Modification is based on visual areas, in which the text appears and their visual properties. Some results of experiments are included in the final part of the paper.
Keywords :
"Visualization","Web pages","Classification algorithms","Accuracy","Support vector machine classification","HTML","Equations"
Publisher :
ieee
Conference_Titel :
Advances in Social Networks Analysis and Mining (ASONAM), 2010 International Conference on
Print_ISBN :
978-1-4244-7787-6
Type :
conf
DOI :
10.1109/ASONAM.2010.34
Filename :
5563068
Link To Document :
بازگشت