• DocumentCode
    3099541
  • Title

    Entity resolution for high velocity streams using semantic measures

  • Author

    Priya, P. Anu ; Prabhakar, S. ; Vasavi, S.

  • Author_Institution
    Comput. Sci. & Eng., VR Siddhartha Eng. Coll., Vijayawada, India
  • fYear
    2015
  • fDate
    12-13 June 2015
  • Firstpage
    35
  • Lastpage
    40
  • Abstract
    Now-a-days large amount of data is generated from various stake holders such as data from sensors and satellites regarding environment and climate, social networking sites about messages, tweets, photos, videos and data from telecommunications etc. This big data, if processed in real-time, helps decision makers to make timely decisions when an event occurred. When source data sets are large (velocity, variety, veracity) traditional ETL (Extract, Transform, Load) is time consuming process. This paves path to extend traditional data management techniques for extracting business value from big data. This paper extends the hadoop framework for performing entity resolution in two phases. In phase 1 MapReduce generate rules for matching two real world objects with similarities. The more the similarity, the objects are similar. Similarity is calculated using domain dependent and independent Natural language processing measures. In Phase 2 these rules are used for matching stream data. Our proposed approach uses 13 semantic measures for resolving entities in stream data. Stream data such as tweets, messages, e-catalogues are used for testing the proposed system.
  • Keywords
    Big Data; Internet; natural language processing; Hadoop framework; MapReduce; big data; data management techniques; domain dependent natural language processing; e-catalogues; entity resolution; high velocity streams; independent natural language processing; semantic measures; stream data matching; tweets; Accuracy; Erbium; Feeds; Satellites; Big data; entity resolution; stream processing; unstructured data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advance Computing Conference (IACC), 2015 IEEE International
  • Conference_Location
    Banglore
  • Print_ISBN
    978-1-4799-8046-8
  • Type

    conf

  • DOI
    10.1109/IADCC.2015.7154663
  • Filename
    7154663