• DocumentCode
    2453104
  • Title

    A Study of Smoothing Algorithms for Item Categorization on e-Commerce Sites

  • Author

    Shen, Dan ; Ruvini, Jean-David ; Mukherjee, Rajyashree ; Sundaresan, Neel

  • Author_Institution
    eBay Res. Labs., Shanghai, China
  • fYear
    2010
  • fDate
    12-14 Dec. 2010
  • Firstpage
    23
  • Lastpage
    28
  • Abstract
    One central issue in a long-tail online marketplace such as eBay is to automatically put user self-input items into a catalog in real time. This task is extremely challenging when the inventory scales up, the items become ephemeral, and the user input remains noisy. Indeed, catalog learning has emerged as a key technical property for other major online ecommerce applications including search and recommendation. We formulate the item cataloging task as a Bayesian classification problem, which shall scale well in very large data set and have good online prediction performance. The inherent data sparseness issue, especially for those tail categories, is key to the overall model performance. We address the data sparseness issue by adapting statistically sound smoothing methods well studied in language modeling tasks. However, there are data characteristics specific to the ecommerce domain, including short yet focused item description, very large and hierarchical catalog taxonomy, and highly skewed distribution over types of items. We investigate these domain-specific regularities empirically, and report practically significant results with real-world true-scale data.
  • Keywords
    Web sites; belief networks; cataloguing; electronic commerce; Bayesian classification problem; adapting statistically sound smoothing methods; catalog learning; data sparseness; e-commerce sites; hierarchical catalog taxonomy; highly skewed distribution; inventory; item cataloging task; item categorization; language modeling tasks; long-tail online marketplace; online ecommerce; online prediction; real-world true-scale data; smoothing algorithms; Bayesian methods; Catalogs; Maximum likelihood estimation; Smoothing methods; Training; Training data; Vocabulary; catalog; hierarchy; item categorization; smoothing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on
  • Conference_Location
    Washington, DC
  • Print_ISBN
    978-1-4244-9211-4
  • Type

    conf

  • DOI
    10.1109/ICMLA.2010.11
  • Filename
    5708808