Title :
Research and implementation of the technology supporting MicroBlog data collection based on web crawler
Author :
Yuan Xiaohong ; Zhou Sisi
Author_Institution :
College of Computer Science and Information Technology, Central South University of Forestry and Technology, Hunan, Changsha 410004, China
Abstract :
MicroBlog is an effective vehicle for the network public opinion, and plays an important role in dissemination of the public opinion. A crawler which consisted of user crawling and contents crawling used to crawl MicroBlog is designed. The crawler used protocol-driven strategy, event-driven strategy and template extraction methods to achieve the successful extraction and data storage. Experiment shows that the crawler has an efficiency and integrity of information collection compared with the crawler BFS. A more flexible crawler is needed with the more complexity of DOM Tree.
Keywords :
AJAX; MicroBlog; crawler; web information extraction;
Conference_Titel :
Automatic Control and Artificial Intelligence (ACAI 2012), International Conference on
Conference_Location :
Xiamen
Electronic_ISBN :
978-1-84919-537-9
DOI :
10.1049/cp.2012.1307