Inferencing in information extraction: Techniques and applications

Author

Barbosa, Denilson ; Haixun Wang ; Cong Yu

Author_Institution

Univ. of Alberta, Edmonton, AB, Canada

fYear

2015

fDate

13-17 April 2015

Firstpage

1534

Lastpage

1537

Abstract

Information extraction at Web scale has become one of the most important research topics in data management since major commercial search engines started incorporating knowledge in their search results a couple of years ago [1]. Users increasingly expect structured knowledge as answers to their search needs. Using Bing as an example, the result page for “Lionel Messi” is full of structured knowledge facts, such as his birthday and awards. The research efforts towards improving the accuracy and coverage of such knowledge bases have led to significant advances in Information Extraction techniques [2], [3]. As the initial challenge of accurately extracting facts for popular entities are being addressed, more difficult challenges have emerged such as extending knowledge coverage to long tail entities and domains, understanding interestingness and usefulness of facts within a given context, and addressing information-seeking needs more directly and accurately. In this tutorial, we will survey the recent research efforts and provide an introduction to the techniques that address those challenges, and the applications that benefit from the adoption of those techniques. In particular, this tutorial will focus on a variety of techniques that can be broadly viewed as knowledge inferencing, i.e., combining multiple data sources and extraction techniques to verify existing knowledge and derive new knowledge. More specifically, we focus on four main categories of inferencing techniques: 1) deep natural language processing using machine learning techniques, 2) data cleaning using integrity constraints, 3) large-scale probabilistic reasoning, and 4) leveraging human expertise for domain knowledge extraction.

Keywords

Internet; data integrity; inference mechanisms; information retrieval; learning (artificial intelligence); natural language processing; search engines; Bing; Lionel Messi; Web scale; commercial search engines; data cleaning; data management; deep natural language processing; domain knowledge extraction; fact extraction; human expertise leverage; information extraction techniques; information-seeking needs; integrity constraints; knowledge coverage; knowledge inferencing; large-scale probabilistic reasoning; machine learning techniques; multiple data sources; Cleaning; Data mining; Google; Information retrieval; Knowledge based systems; Knowledge engineering; Tutorials;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Engineering (ICDE), 2015 IEEE 31st International Conference on

Conference_Location

Seoul

Type

conf

DOI

10.1109/ICDE.2015.7113420

Filename

7113420