• DocumentCode
    140817
  • Title

    Automatic generation of question answer pairs from noisy case logs

  • Author

    Ajmera, Jitendra ; Joshi, S. ; Verma, A. ; Mittal, Anish

  • Author_Institution
    IBM India Res. Lab., New Delhi, India
  • fYear
    2014
  • fDate
    March 31 2014-April 4 2014
  • Firstpage
    436
  • Lastpage
    447
  • Abstract
    In a customer support scenario, a lot of valuable information is recorded in the form of `case logs´. Case logs are primarily written for future references or manual inspections and therefore are written in a hasty manner and are very noisy. In this paper, we propose techniques that exploit these case logs to mine real customer concerns or problems and then map them to well written knowledge articles for that enterprise. This mapping results into generation of question-answer (QA) pairs. These QA pairs can be used for a variety of applications such as dynamically updating the frequently-asked-questions (FAQs), updating the knowledge repository etc. In this paper we show the utility of these discovered QA pairs as training data for a question-answering system. Our approach for mining the case logs is based on a composite model consisting of two generative models, viz, hidden Markov model (HMM) and latent Dirichlet allocation (LDA) model. The LDA model explains the long-range dependencies across words due to their semantic similarity and HMM models the sequential patterns present in these case logs. Such processing results in crisp `problem statement´ segments which are indicative of the real customer concerns. Our experiments show that this approach finds crisp problem-statements in 56% of the cases and outperforms other alternate methods for segmentation such as HMM, LDA and conditional random field (CRF). After finding these crisp problem-statements, appropriate answers are looked up from an existing knowledge repository index forming candidate QA pairs. We show that considering only the problemstatement segments for which the answers can be found further improves the segmentation performance to 82%. Finally, we show that when these QA pairs are used as training data, the performance of a question-answering system can be improved significantly.
  • Keywords
    data mining; hidden Markov models; question answering (information retrieval); FAQ; HMM; LDA model; QA pairs discovery; case logs mining; conditional random field; frequently-asked-questions; hidden Markov model; knowledge repository; latent Dirichlet allocation model; problem statement segments; question answer pairs; question-answering system; segmentation performance; semantic similarity; Context; Hidden Markov models; Noise measurement; Semantics; Syntactics; Training; Viterbi algorithm;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2014 IEEE 30th International Conference on
  • Conference_Location
    Chicago, IL
  • Type

    conf

  • DOI
    10.1109/ICDE.2014.6816671
  • Filename
    6816671