• DocumentCode
    2053394
  • Title

    An Algebraic Language for Semantic Data Integration on the Hidden Web

  • Author

    Hosain, Shazzad ; Jamil, Hasan

  • Author_Institution
    Dept. of Comput. Sci., Wayne State Univ., MI, USA
  • fYear
    2009
  • fDate
    14-16 Sept. 2009
  • Firstpage
    237
  • Lastpage
    244
  • Abstract
    Semantic integration in the hidden Web is an emerging area of research where traditional assumptions do not always hold. Frequent changes, conflicts and the sheer size of the hidden Web demand vastly different integration techniques that rely on autonomous detection and heterogeneity resolution, correspondence establishment, and information extraction strategies. In this paper, we present an algebraic language, called Integra, as a foundation for another SQL-like query language called BioFlow, for the integration of Life Sciences data on the hidden Web. The algebra presented here adopts the view that the web forms can be treated as user defined functions and the response they generate from the back end databases can be considered as traditional relations or tables. These assumptions allow us to extend the traditional relational algebra to include integration primitives such as schema matching, wrappers, form submission, and object identification as a family of database functions. These functions are then incorporated into the traditional relational algebra operators to extend them in the direction of semantic data integration. To support the well known concepts of horizontal and vertical integration, we also propose two new operators called link and combine. We show that these family of functions can be designed from existing literature and their implementation is completely orthogonal to our language in the same way many database technologies are (such as relational join operation). Finally, we show that for traditional relations without integration, our algebra reduces to classical relational algebra establishing it as a special case of Integra.
  • Keywords
    Internet; query languages; relational algebra; BioFlow; Integra; World Wide Web; algebraic language; autonomous detection; database function; database technology; hidden Web; information extraction strategy; query language; relational algebra; semantic data integration; Algebra; Computer science; Data mining; Database languages; HTML; Relational databases; Search engines; USA Councils; Web sites; World Wide Web; Algebraic language; Data integration; Semantic heterogeneity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantic Computing, 2009. ICSC '09. IEEE International Conference on
  • Conference_Location
    Berkeley, CA
  • Print_ISBN
    978-1-4244-4962-0
  • Electronic_ISBN
    978-0-7695-3800-6
  • Type

    conf

  • DOI
    10.1109/ICSC.2009.94
  • Filename
    5298624