DocumentCode :
1771287
Title :
Large-scale entity extraction and probabilistic record linkage
Author :
Villanustre, Flavio
Author_Institution :
Reed Elsevier LexisNexis Risk Solutions, Alpharetta, GA, USA
fYear :
2014
fDate :
19-23 May 2014
Firstpage :
85
Lastpage :
85
Abstract :
Summary form only given. Large-scale entity extraction, disambiguation and linkage in Big Data can challenge the traditional methodologies developed over the last three decades. Entity linkage, in particular, is cornerstone for a wide spectrum of applications, such as Master Data Management, Data Warehousing, Social Graph Analytics, Fraud Detection and Identity Management. Traditional rules based heuristic methods usually don´t scale properly, are language specific and require significant maintenance over time. This presentation will introduce the audience to the use of probabilistic record linkage, also known as specificity based linkage, on Big Data, to perform language independent large-scale entity extraction, resolution and linkage across diverse sources. The presentation also includes a live demonstration reviewing the different steps required during the data integration process (ingestion, profiling, parsing, cleansing, standardization and normalization), and show the basic concepts behind probabilistic record linkage on a real-world application using the open source big data platform, HPCC Systems [1] from LexisNexis.
Keywords :
Big Data; data handling; information retrieval; probability; Big Data; HPCC systems; LexisNexis; data cleansing; data ingestion; data integration process; data normalization; data parsing; data profiling; data standardization; data warehousing; fraud detection; identity management; large-scale entity extraction; master data management; open source big data platform; probabilistic record linkage; rules based heuristic methods; social graph analytics; specificity based linkage; Abstracts; Big data; Couplings; Data mining; Maintenance engineering; Probabilistic logic; Warehousing; Big Data; disambiguation; entity extraction; identity fraud; identity management; public data; record linking;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Collaboration Technologies and Systems (CTS), 2014 International Conference on
Conference_Location :
Minneapolis, MN
Print_ISBN :
978-1-4799-5157-4
Type :
conf
DOI :
10.1109/CTS.2014.6867546
Filename :
6867546
Link To Document :
بازگشت