• DocumentCode
    655274
  • Title

    Detection of E-Commerce Systems with Sparse Features and Supervised Classification

  • Author

    Stoll, Kurt Uwe ; Hepp, Martin

  • Author_Institution
    E-Bus. & Web Sci. Res. Group, Univ. der Bundeswehr Munchen, Neubiberg, Germany
  • fYear
    2013
  • fDate
    11-13 Sept. 2013
  • Firstpage
    199
  • Lastpage
    206
  • Abstract
    Enriching web shop pages with structured data has recently become popular in e-commerce. It is mainly driven by search engines favouring those pages. While structured data in e-commerce is mainly generated automatically by shop extensions, this data covers only a small share of the market, resulting in a major hamper for applications operating on aggregated data. In this context, more than 90% of product detail pages on the web are generated by only 7 e-commerce systems. Meanwhile, little research addresses methods to automatically detect e-commerce systems. Automated detection would allow to design system-specific extractors able to grow the amount of structured data in e-commerce. Therefore, we propose a novel approach to this problem, which filters features generated from HTML tag attributes with an e-commerce specific white list. We evaluate 6 classification algorithms on the problem and discuss computational effort. We can show that this approach is capable of detecting the 6 most important e-commerce systems with a F1-score of 0.9 by analyzing only one HTML page per web shop. We evaluate our findings on an independent dataset and on reference shop sites.
  • Keywords
    Internet; classification; electronic commerce; hypermedia markup languages; retail data processing; search engines; F1-score; HTML page; HTML tag attributes; Web shop pages; aggregated data; automated detection; classification algorithms; e-commerce systems detection; search engines; shop extensions; shop sites; sparse features; structured data; supervised classification; system-specific extractors; Algorithm design and analysis; Business; HTML; Radio frequency; Support vector machines; Training; Web pages; e-commerce systems; supervised machine learning; web page classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    e-Business Engineering (ICEBE), 2013 IEEE 10th International Conference on
  • Conference_Location
    Coventry
  • Type

    conf

  • DOI
    10.1109/ICEBE.2013.30
  • Filename
    6686263