Title :
Automatically mining review records from forum Web sites
Author :
Liu, Wei ; Yan, Hualiang ; Xiao, Jianguo
Abstract :
The rapid development of Web 2.0 bring the flourish of web reviews. Web reviews are usually released in form of structured records. As the important information source for many popular applications(e.g. monitoring and analysis of public opinion), review records need to be extracted accurately from web pages. To the best of our knowledge, little work in literatures has systemically investigated this problem. Besides the variety of web page templates, the user-generated review contents raises a new challenge. The inconsistency of review contents on both DOM tree and visual appearance impair the similarity among review records, which makes a serious impact on performance of the existing solutions on web data record extraction. To tackle this challenge, we propose a novel approach that performs automatic extraction of review records by employing sophisticated techniques. Our experimental results over 20 forum web sites indicate that the proposed approach can achieve high extraction accuracy.
Keywords :
Internet; Web sites; data mining; DOM tree; Web 2.0; Web reviews; automatically mining review records; forum Web sites; information source; public opinion; user-generated review contents; visual appearance; web data record extraction; web page templates; Book reviews; Data mining; Feature extraction; Noise; Semantics; Visualization; Web pages; opinion mining; review record; web data extraction;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on
Conference_Location :
Yantai, Shandong
Print_ISBN :
978-1-4244-5931-5
DOI :
10.1109/FSKD.2010.5569292