DocumentCode
3099541
Title
Entity resolution for high velocity streams using semantic measures
Author
Priya, P. Anu ; Prabhakar, S. ; Vasavi, S.
Author_Institution
Comput. Sci. & Eng., VR Siddhartha Eng. Coll., Vijayawada, India
fYear
2015
fDate
12-13 June 2015
Firstpage
35
Lastpage
40
Abstract
Now-a-days large amount of data is generated from various stake holders such as data from sensors and satellites regarding environment and climate, social networking sites about messages, tweets, photos, videos and data from telecommunications etc. This big data, if processed in real-time, helps decision makers to make timely decisions when an event occurred. When source data sets are large (velocity, variety, veracity) traditional ETL (Extract, Transform, Load) is time consuming process. This paves path to extend traditional data management techniques for extracting business value from big data. This paper extends the hadoop framework for performing entity resolution in two phases. In phase 1 MapReduce generate rules for matching two real world objects with similarities. The more the similarity, the objects are similar. Similarity is calculated using domain dependent and independent Natural language processing measures. In Phase 2 these rules are used for matching stream data. Our proposed approach uses 13 semantic measures for resolving entities in stream data. Stream data such as tweets, messages, e-catalogues are used for testing the proposed system.
Keywords
Big Data; Internet; natural language processing; Hadoop framework; MapReduce; big data; data management techniques; domain dependent natural language processing; e-catalogues; entity resolution; high velocity streams; independent natural language processing; semantic measures; stream data matching; tweets; Accuracy; Erbium; Feeds; Satellites; Big data; entity resolution; stream processing; unstructured data;
fLanguage
English
Publisher
ieee
Conference_Titel
Advance Computing Conference (IACC), 2015 IEEE International
Conference_Location
Banglore
Print_ISBN
978-1-4799-8046-8
Type
conf
DOI
10.1109/IADCC.2015.7154663
Filename
7154663
Link To Document