• DocumentCode
    589179
  • Title

    An NER-based Product Identification and Lucene-based Product Linking Approach to CPROD1 Challenge: Description of Submission System to CPROD1 Challenge

  • Author

    Zhiqiang Toh ; Wenting Wang ; Man Lan ; Xiaoli Li

  • Author_Institution
    Inst. for Infocomm Res., Singapore, Singapore
  • fYear
    2012
  • fDate
    10-10 Dec. 2012
  • Firstpage
    869
  • Lastpage
    871
  • Abstract
    This paper presents our methodology for CPROD1 Challenge, which is to identify the product mentions from text and then link the product to the entries in the catalog file. Our solution follows 2 steps. First, we use processing pipelines to extract product mentions by incorporating multiple techniques including traditional named entities recognition (NER), regular expression rules and gazetteer-based string matching. Second, we view product linking task into an information retrieval (IR) problem, where the description catalog file is populated into a database. Thus, each product mention acts as a search query and the returned results from catalog entry database serve as the links. The F1 scores of our submission on public and private test data are 24.82% and 16.04%, respectively.
  • Keywords
    cataloguing; file organisation; query processing; string matching; text analysis; CPROD1 Challenge; F1 scores; IR problem; Lucene-based product linking approach; NER; NER-based product identification; catalog entry database; catalog file; description catalog file; gazetteer-based string matching; information retrieval problem; named entity recognition; pipeline processing; private test data; product extraction; public test data; regular expression rules; search query; text analysis; Catalogs; Data mining; Feature extraction; Indexing; Information retrieval; Joining processes; Training data; named entity recognition; product disambiguation; product identification; product linking;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on
  • Conference_Location
    Brussels
  • Print_ISBN
    978-1-4673-5164-5
  • Type

    conf

  • DOI
    10.1109/ICDMW.2012.66
  • Filename
    6406532