Title :
Inferencing in information extraction: Techniques and applications
Author :
Barbosa, Denilson ; Haixun Wang ; Cong Yu
Author_Institution :
Univ. of Alberta, Edmonton, AB, Canada
Abstract :
Information extraction at Web scale has become one of the most important research topics in data management since major commercial search engines started incorporating knowledge in their search results a couple of years ago [1]. Users increasingly expect structured knowledge as answers to their search needs. Using Bing as an example, the result page for “Lionel Messi” is full of structured knowledge facts, such as his birthday and awards. The research efforts towards improving the accuracy and coverage of such knowledge bases have led to significant advances in Information Extraction techniques [2], [3]. As the initial challenge of accurately extracting facts for popular entities are being addressed, more difficult challenges have emerged such as extending knowledge coverage to long tail entities and domains, understanding interestingness and usefulness of facts within a given context, and addressing information-seeking needs more directly and accurately. In this tutorial, we will survey the recent research efforts and provide an introduction to the techniques that address those challenges, and the applications that benefit from the adoption of those techniques. In particular, this tutorial will focus on a variety of techniques that can be broadly viewed as knowledge inferencing, i.e., combining multiple data sources and extraction techniques to verify existing knowledge and derive new knowledge. More specifically, we focus on four main categories of inferencing techniques: 1) deep natural language processing using machine learning techniques, 2) data cleaning using integrity constraints, 3) large-scale probabilistic reasoning, and 4) leveraging human expertise for domain knowledge extraction.
Keywords :
Internet; data integrity; inference mechanisms; information retrieval; learning (artificial intelligence); natural language processing; search engines; Bing; Lionel Messi; Web scale; commercial search engines; data cleaning; data management; deep natural language processing; domain knowledge extraction; fact extraction; human expertise leverage; information extraction techniques; information-seeking needs; integrity constraints; knowledge coverage; knowledge inferencing; large-scale probabilistic reasoning; machine learning techniques; multiple data sources; Cleaning; Data mining; Google; Information retrieval; Knowledge based systems; Knowledge engineering; Tutorials;
Conference_Titel :
Data Engineering (ICDE), 2015 IEEE 31st International Conference on
Conference_Location :
Seoul
DOI :
10.1109/ICDE.2015.7113420