DocumentCode :
262946
Title :
Incremental entity fusion from linked documents
Author :
Malhotra, Pankaj ; Agarwal, Prabhakar ; Shroff, Gautam
Author_Institution :
TCS Res., Tata Consultancy Services Ltd., Noida, India
fYear :
2014
fDate :
7-10 July 2014
Firstpage :
1
Lastpage :
8
Abstract :
In many government applications, especially for intelligence and law-enforcement, we often find that information about entities, such as persons or even companies, are available in disparate data sources. For example, information distributed across passports, driving licences, bank accounts, and income tax documents that need to be resolved and fused to reveal a consolidated profile of an individual. In this paper we describe an algorithm to fuse documents that are highly likely to belong to the same entity by exploiting inter-document references in addition to attribute similarity. Our technique uses a combination of iterative graph-traversal, locality-sensitive hashing, iterative match-merge, and graph-clustering to discover unique entities based on a document corpus. Further, new sets of documents can be added incrementally while having to re-process only a small subset of a previously fused entity-document collection. We present performance and quality results via both Bayesian likelihood fusion as well as using Support Vector Machines to demonstrate benefit of using inter-document references, both to improve accuracy as well as for detecting attempts at deliberate obfuscation.
Keywords :
Bayes methods; document handling; file organisation; graph theory; iterative methods; merging; pattern matching; sensor fusion; support vector machines; Bayesian likelihood fusion; attribute similarity; document corpus; document fusion; government applications; graph-clustering; incremental entity fusion; interdocument references; iterative graph-traversal; iterative match-merge; law-enforcement; locality-sensitive hashing; obfuscation; support vector machines; unique entities discover; Bayes methods; Boolean functions; Databases; Fuses; Licenses; Silicon; Support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Fusion (FUSION), 2014 17th International Conference on
Conference_Location :
Salamanca
Type :
conf
Filename :
6916082
Link To Document :
بازگشت