DocumentCode :
140817
Title :
Automatic generation of question answer pairs from noisy case logs
Author :
Ajmera, Jitendra ; Joshi, S. ; Verma, A. ; Mittal, Anish
Author_Institution :
IBM India Res. Lab., New Delhi, India
fYear :
2014
fDate :
March 31 2014-April 4 2014
Firstpage :
436
Lastpage :
447
Abstract :
In a customer support scenario, a lot of valuable information is recorded in the form of `case logs´. Case logs are primarily written for future references or manual inspections and therefore are written in a hasty manner and are very noisy. In this paper, we propose techniques that exploit these case logs to mine real customer concerns or problems and then map them to well written knowledge articles for that enterprise. This mapping results into generation of question-answer (QA) pairs. These QA pairs can be used for a variety of applications such as dynamically updating the frequently-asked-questions (FAQs), updating the knowledge repository etc. In this paper we show the utility of these discovered QA pairs as training data for a question-answering system. Our approach for mining the case logs is based on a composite model consisting of two generative models, viz, hidden Markov model (HMM) and latent Dirichlet allocation (LDA) model. The LDA model explains the long-range dependencies across words due to their semantic similarity and HMM models the sequential patterns present in these case logs. Such processing results in crisp `problem statement´ segments which are indicative of the real customer concerns. Our experiments show that this approach finds crisp problem-statements in 56% of the cases and outperforms other alternate methods for segmentation such as HMM, LDA and conditional random field (CRF). After finding these crisp problem-statements, appropriate answers are looked up from an existing knowledge repository index forming candidate QA pairs. We show that considering only the problemstatement segments for which the answers can be found further improves the segmentation performance to 82%. Finally, we show that when these QA pairs are used as training data, the performance of a question-answering system can be improved significantly.
Keywords :
data mining; hidden Markov models; question answering (information retrieval); FAQ; HMM; LDA model; QA pairs discovery; case logs mining; conditional random field; frequently-asked-questions; hidden Markov model; knowledge repository; latent Dirichlet allocation model; problem statement segments; question answer pairs; question-answering system; segmentation performance; semantic similarity; Context; Hidden Markov models; Noise measurement; Semantics; Syntactics; Training; Viterbi algorithm;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2014 IEEE 30th International Conference on
Conference_Location :
Chicago, IL
Type :
conf
DOI :
10.1109/ICDE.2014.6816671
Filename :
6816671
Link To Document :
بازگشت