مرکز منطقه ای اطلاع رساني علوم و فناوري - Large-scale entity extraction and probabilistic record linkage

DocumentCode :

1771287

Title :

Large-scale entity extraction and probabilistic record linkage

Author :

Villanustre, Flavio

Author_Institution :

Reed Elsevier LexisNexis Risk Solutions, Alpharetta, GA, USA

fYear :

2014

fDate :

19-23 May 2014

Firstpage :

Lastpage :

Abstract :

Summary form only given. Large-scale entity extraction, disambiguation and linkage in Big Data can challenge the traditional methodologies developed over the last three decades. Entity linkage, in particular, is cornerstone for a wide spectrum of applications, such as Master Data Management, Data Warehousing, Social Graph Analytics, Fraud Detection and Identity Management. Traditional rules based heuristic methods usually don´t scale properly, are language specific and require significant maintenance over time. This presentation will introduce the audience to the use of probabilistic record linkage, also known as specificity based linkage, on Big Data, to perform language independent large-scale entity extraction, resolution and linkage across diverse sources. The presentation also includes a live demonstration reviewing the different steps required during the data integration process (ingestion, profiling, parsing, cleansing, standardization and normalization), and show the basic concepts behind probabilistic record linkage on a real-world application using the open source big data platform, HPCC Systems [1] from LexisNexis.

Keywords :

Big Data; data handling; information retrieval; probability; Big Data; HPCC systems; LexisNexis; data cleansing; data ingestion; data integration process; data normalization; data parsing; data profiling; data standardization; data warehousing; fraud detection; identity management; large-scale entity extraction; master data management; open source big data platform; probabilistic record linkage; rules based heuristic methods; social graph analytics; specificity based linkage; Abstracts; Big data; Couplings; Data mining; Maintenance engineering; Probabilistic logic; Warehousing; Big Data; disambiguation; entity extraction; identity fraud; identity management; public data; record linking;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Collaboration Technologies and Systems (CTS), 2014 International Conference on

Conference_Location :

Minneapolis, MN

Print_ISBN :

978-1-4799-5157-4

Type :

conf

DOI :

10.1109/CTS.2014.6867546

Filename :

6867546

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1771287