• DocumentCode
    2575168
  • Title

    Scrapy-Based Crawling and User-Behavior Characteristics Analysis on Taobao

  • Author

    Wang, Jing ; Guo, Yuchun

  • Author_Institution
    Sch. of Electron. & Inf. Eng., Beijing Jiaotong Univ., Beijing, China
  • fYear
    2012
  • fDate
    10-12 Oct. 2012
  • Firstpage
    44
  • Lastpage
    52
  • Abstract
    The widespread use of Internet provides a good environment for e-commerce. Study on e-commerce network characteristics always focuses on the Taobao. So far, researches based on Taobao are related to credit rating system, marketing strategy, analysis of characteristics of the seller and so on. The purpose of all these studies is to analyze online marketing transactions in e-commerce. In this paper, we analyze e-commerce network from the perspective of graph theory. Our contributions lie in two aspects as following: (1) crawl Taobao share-platform using Scrapy crawl architecture. After analyzing format of web pages in Taobao deeply, combined with the BFS and MHRW two kinds of sampling methods, we ran crawler on five PCs for 30 days. Besides, we list some big problems encountered in the crawling process, then give the final solution. In addition, we crawled one type of sellers´ data in order to analyze relationships between sellers and buyers. (2) Analyze characteristics of users´ behavior in Taobao share-platform based on obtained dataset. We intend to find the relationships between sellers and buyers connected by items in share-platform. Surprisingly, we find that share-platform is a tool for some buyers to advertise items for sellers who have high credit score, and other buyers only to help them to support the platform.
  • Keywords
    Internet; electronic commerce; marketing; sampling methods; BFS; Internet; MHRW; Scrapy crawl architecture; Web pages; crawl Taobao share-platform; credit rating system; e-commerce network characteristics; graph theory; marketing strategy; online marketing transactions; sampling methods; scrapy-based crawling; share-platform; user-behavior characteristics analysis; Communities; Crawlers; Engines; Marketing and sales; Sampling methods; Social network services; Web pages; MHRW; Scrapy; Taobao; bipartite graph; sampling method; user behavior;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2012 International Conference on
  • Conference_Location
    Sanya
  • Print_ISBN
    978-1-4673-2624-7
  • Type

    conf

  • DOI
    10.1109/CyberC.2012.17
  • Filename
    6384943