Title :
Automated quality assessment of web pages from textual content
Author :
Wang, Xiao-lin ; Zha, Hal ; Lu, Bao-liang
Author_Institution :
Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
Abstract :
Given the vastness of Internet, search engines have to find not only relevant but also high-quality web pages to satisfy users´ information need. At present, most quality assessing methods for web pages are based on link analysis and user feedbacks. Considering that users acquire information from web pages mainly through reading their text, this paper addresses automated quality assessment of web pages from textual content. This paper surveys related works on assessing text´s quality, summarizes quality-related features, and examines them with a real-word data set. Experimental results show that features based on the length of text are the most effective, while combining length features with other features such as part-of-speech tags and readability can further improve the accuracy.
Keywords :
Internet; Web sites; information needs; quality management; search engines; Internet; Web page; automated quality assessment; information need; link analysis; part-of-speech tags; search engine; textual content; user feedback; Abstracts; Catalogs; Electronic publishing; Information services; Internet; Web pages; Quality assessment; information retrieval; supervised learning;
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2012 International Conference on
Conference_Location :
Xian
Print_ISBN :
978-1-4673-1484-8
DOI :
10.1109/ICMLC.2012.6359683