DocumentCode :
3038595
Title :
Extraction technology of blog comments based on functional semantic units
Author :
Chun-long, Fan ; Hui, Meng
Author_Institution :
Dept. of Comput., Shenyang Aerosp. Univ., ShenYang, China
Volume :
3
fYear :
2012
fDate :
25-27 May 2012
Firstpage :
422
Lastpage :
426
Abstract :
Blog is an important kind of network information resources, extracting its comments information is essential for the researches of public opinion analysis and so on. In this paper we summarized the prevalent extraction algorithms of blog comments and described how to use page structure in information extraction. The indicator phrases such as "Home" have clear semantics and functional indication when people understand the web pages. The indicatior phrases are known as Functional Semantic Units (FSU). Base on the characteristic of FSU We propose a kind of comment information extracting model, and present a detailed model of thinking and implementation process. Such as the page structure linearized, functional semantic units are distinguished, main text are recognized and comments extraction algorithm etc. Finally, the experiments prove that the comment information extracting model is effective and better identification results.
Keywords :
Web sites; information retrieval; FSU; Web pages; blog comments; comment information extracting model; comments extraction algorithm; extraction technology; functional semantic units; network information resources; page structure; public opinion analysis; Arrays; Blogs; Data mining; Feature extraction; Layout; Semantics; Web pages; blog; comment; functional semantic unit; information extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Automation Engineering (CSAE), 2012 IEEE International Conference on
Conference_Location :
Zhangjiajie
Print_ISBN :
978-1-4673-0088-9
Type :
conf
DOI :
10.1109/CSAE.2012.6272985
Filename :
6272985
Link To Document :
بازگشت