DocumentCode
2575168
Title
Scrapy-Based Crawling and User-Behavior Characteristics Analysis on Taobao
Author
Wang, Jing ; Guo, Yuchun
Author_Institution
Sch. of Electron. & Inf. Eng., Beijing Jiaotong Univ., Beijing, China
fYear
2012
fDate
10-12 Oct. 2012
Firstpage
44
Lastpage
52
Abstract
The widespread use of Internet provides a good environment for e-commerce. Study on e-commerce network characteristics always focuses on the Taobao. So far, researches based on Taobao are related to credit rating system, marketing strategy, analysis of characteristics of the seller and so on. The purpose of all these studies is to analyze online marketing transactions in e-commerce. In this paper, we analyze e-commerce network from the perspective of graph theory. Our contributions lie in two aspects as following: (1) crawl Taobao share-platform using Scrapy crawl architecture. After analyzing format of web pages in Taobao deeply, combined with the BFS and MHRW two kinds of sampling methods, we ran crawler on five PCs for 30 days. Besides, we list some big problems encountered in the crawling process, then give the final solution. In addition, we crawled one type of sellers´ data in order to analyze relationships between sellers and buyers. (2) Analyze characteristics of users´ behavior in Taobao share-platform based on obtained dataset. We intend to find the relationships between sellers and buyers connected by items in share-platform. Surprisingly, we find that share-platform is a tool for some buyers to advertise items for sellers who have high credit score, and other buyers only to help them to support the platform.
Keywords
Internet; electronic commerce; marketing; sampling methods; BFS; Internet; MHRW; Scrapy crawl architecture; Web pages; crawl Taobao share-platform; credit rating system; e-commerce network characteristics; graph theory; marketing strategy; online marketing transactions; sampling methods; scrapy-based crawling; share-platform; user-behavior characteristics analysis; Communities; Crawlers; Engines; Marketing and sales; Sampling methods; Social network services; Web pages; MHRW; Scrapy; Taobao; bipartite graph; sampling method; user behavior;
fLanguage
English
Publisher
ieee
Conference_Titel
Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2012 International Conference on
Conference_Location
Sanya
Print_ISBN
978-1-4673-2624-7
Type
conf
DOI
10.1109/CyberC.2012.17
Filename
6384943
Link To Document