DocumentCode :
1973563
Title :
Research on Web Document Summarization
Author :
Geng, Zengmin ; Zhang, Jujian ; Li, Xuefei ; Du, Jianxia ; Liu, Zhengdong
Author_Institution :
Comput. Inf. Center, Beijing Inst. of Fashion Technol., Beijing, China
fYear :
2010
fDate :
20-22 Aug. 2010
Firstpage :
1
Lastpage :
4
Abstract :
Web document summarization (WDS) is becoming one of the hot subjects in the text summarization field due to the rapidly increasing number of documents on Web. WDS is different from traditional text summarization because it must process hyperlinked texts. This paper first analyses the features of Web documents, then gives a definition for WDS, and finally presents an algorithm for WDS based on sentences extraction. Each sentence´s weight is a weighted sum of words´ weight and its sentence-structure´s weight. The former weight is adjusted by document class graph and latter weight considers both the Web formats and hyperlink attributes. The weight proportion of words and structures is learned by machine learning approach. Experiments on 2,000 Web documents show that our algorithm is feasible.
Keywords :
Internet; feature extraction; text analysis; Web document summarization; Web format; document class graph; hyperlinked text; machine learning approach; sentence structure weight; sentences extraction; text summarization; Algorithm design and analysis; Clothing; Computers; Feature extraction; Joining processes; Tagging; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Internet Technology and Applications, 2010 International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-5142-5
Electronic_ISBN :
978-1-4244-5143-2
Type :
conf
DOI :
10.1109/ITAPP.2010.5566074
Filename :
5566074
Link To Document :
بازگشت