DocumentCode
2028199
Title
Automatically mining review records from forum Web sites
Author
Liu, Wei ; Yan, Hualiang ; Xiao, Jianguo
Volume
5
fYear
2010
fDate
10-12 Aug. 2010
Firstpage
2450
Lastpage
2455
Abstract
The rapid development of Web 2.0 bring the flourish of web reviews. Web reviews are usually released in form of structured records. As the important information source for many popular applications(e.g. monitoring and analysis of public opinion), review records need to be extracted accurately from web pages. To the best of our knowledge, little work in literatures has systemically investigated this problem. Besides the variety of web page templates, the user-generated review contents raises a new challenge. The inconsistency of review contents on both DOM tree and visual appearance impair the similarity among review records, which makes a serious impact on performance of the existing solutions on web data record extraction. To tackle this challenge, we propose a novel approach that performs automatic extraction of review records by employing sophisticated techniques. Our experimental results over 20 forum web sites indicate that the proposed approach can achieve high extraction accuracy.
Keywords
Internet; Web sites; data mining; DOM tree; Web 2.0; Web reviews; automatically mining review records; forum Web sites; information source; public opinion; user-generated review contents; visual appearance; web data record extraction; web page templates; Book reviews; Data mining; Feature extraction; Noise; Semantics; Visualization; Web pages; opinion mining; review record; web data extraction;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on
Conference_Location
Yantai, Shandong
Print_ISBN
978-1-4244-5931-5
Type
conf
DOI
10.1109/FSKD.2010.5569292
Filename
5569292
Link To Document