DocumentCode :
2319368
Title :
Automated quality assessment of web pages from textual content
Author :
Wang, Xiao-lin ; Zha, Hal ; Lu, Bao-liang
Author_Institution :
Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
Volume :
5
fYear :
2012
fDate :
15-17 July 2012
Firstpage :
2000
Lastpage :
2006
Abstract :
Given the vastness of Internet, search engines have to find not only relevant but also high-quality web pages to satisfy users´ information need. At present, most quality assessing methods for web pages are based on link analysis and user feedbacks. Considering that users acquire information from web pages mainly through reading their text, this paper addresses automated quality assessment of web pages from textual content. This paper surveys related works on assessing text´s quality, summarizes quality-related features, and examines them with a real-word data set. Experimental results show that features based on the length of text are the most effective, while combining length features with other features such as part-of-speech tags and readability can further improve the accuracy.
Keywords :
Internet; Web sites; information needs; quality management; search engines; Internet; Web page; automated quality assessment; information need; link analysis; part-of-speech tags; search engine; textual content; user feedback; Abstracts; Catalogs; Electronic publishing; Information services; Internet; Web pages; Quality assessment; information retrieval; supervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2012 International Conference on
Conference_Location :
Xian
ISSN :
2160-133X
Print_ISBN :
978-1-4673-1484-8
Type :
conf
DOI :
10.1109/ICMLC.2012.6359683
Filename :
6359683
Link To Document :
بازگشت