• DocumentCode
    50208
  • Title

    DeepQA Jeopardy! Gamification: A Machine-Learning Perspective

  • Author

    Baughman, Aaron K. ; Chuang, Wesley ; Dixon, Kevin R. ; Benz, Zachary ; Basilico, Justin

  • Author_Institution
    IBM Special Events, Research Triangle Park, NC, USA
  • Volume
    6
  • Issue
    1
  • fYear
    2014
  • fDate
    Mar-14
  • Firstpage
    55
  • Lastpage
    66
  • Abstract
    DeepQA is a large-scale natural language processing (NLP) question-and-answer system that responds across a breadth of structured and unstructured data, from hundreds of analytics that are combined with over 50 models, trained through machine learning. After the 2011 historic milestone of defeating the two best human players in the Jeopardy! game show, the technology behind IBM Watson, DeepQA, is undergoing gamification into real-world business problems. Gamifying a business domain for Watson is a composite of functional, content, and training adaptation for nongame play. During domain gamification for medical, financial, government, or any other business, each system change affects the machine-learning process. As opposed to the original Watson Jeopardy!, whose class distribution of positive-to-negative labels is 1:100, in adaptation the computed training instances, question-and-answer pairs transformed into true-false labels, result in a very low positive-to-negative ratio of 1:100 000. Such initial extreme class imbalance during domain gamification poses a big challenge for the Watson machine-learning pipelines. The combination of ingested corpus sets, question-and-answer pairs, configuration settings, and NLP algorithms contribute toward the challenging data state. We propose several data engineering techniques, such as answer key vetting and expansion, source ingestion, oversampling classes, and question set modifications to increase the computed true labels. In addition, algorithm engineering, such as an implementation of the Newton-Raphson logistic regression with a regularization term, relaxes the constraints of class imbalance during training adaptation. We conclude by empirically demonstrating that data and algorithm engineering are complementary and indispensable to overcome the challenges in this first Watson gamification for real-world business problems.
  • Keywords
    business data processing; computer games; learning (artificial intelligence); natural language processing; question answering (information retrieval); text analysis; DeepQA Jeopardy! gamification; NLP algorithms; NLP question-and-answer system; Newton-Raphson logistic regression; Watson gamification; Watson machine-learning pipelines; algorithm engineering; business domain; configuration settings; data engineering techniques; domain gamification; extreme class imbalance; ingested corpus sets; large-scale natural language processing question-and-answer system; machine-learning process; nongame play; positive-to-negative ratio; question-and-answer pairs; real-world business problems; regularization term; structured data; training instances; true-false labels; unstructured data; Accuracy; Games; Logistics; Machine learning algorithms; Pipelines; Training; Gamification; machine learning; natural language processing (NLP); pattern recognition;
  • fLanguage
    English
  • Journal_Title
    Computational Intelligence and AI in Games, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1943-068X
  • Type

    jour

  • DOI
    10.1109/TCIAIG.2013.2285651
  • Filename
    6632881