• DocumentCode
    612158
  • Title

    Introducing shadows: Flexible document representation and annotation on the Web

  • Author

    Mota, M.S. ; Medeiros, C.B.

  • Author_Institution
    Inst. of Comput., Univ. of Campinas (UNICAMP), Campinas, Brazil
  • fYear
    2013
  • fDate
    8-12 April 2013
  • Firstpage
    13
  • Lastpage
    18
  • Abstract
    The Web is witnessing an exponential growth of increasingly complex, distributed and heterogeneous documents. This hampers document exchange, as well as their annotation and retrieval. While information retrieval mechanisms concentrate on textual features (corpus analysis), annotation approaches either target specific formats or require that a document follows interoperable standards. This work presents our effort to handle these problems, providing a more flexible solution. Rather than trying to modify or convert the document itself, or to target only textual characteristics, the strategy described in this work is based on an intermediate descriptor - the document shadow. A shadow represents domain-relevant aspects and elements of both structure and content of a given document, as defined by a user group. Rather than annotating documents themselves, it is the shadows that are annotated, thereby providing independence between annotations and document formats. Our annotations take advantage of the LOD initiative. Via annotations users can derive correlations across shadows, in a flexible way. Moreover, shadows and annotations are stored in databases, therefore allowing uniform database treatments of heterogeneous documents.
  • Keywords
    Internet; content management; data structures; document handling; information retrieval; open systems; LOD initiative; Web annotation; annotations users; complex documents; database storage; distributed documents; document annotation; document conversion; document exchange; document shadow; domain-relevant aspects; flexible document representation; heterogeneous documents; information retrieval mechanisms; interoperable standards; textual characteristics; textual features; uniform database treatments; Biodiversity; Data mining; Databases; Feature extraction; Semantics; Standards; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering Workshops (ICDEW), 2013 IEEE 29th International Conference on
  • Conference_Location
    Brisbane, QLD
  • Print_ISBN
    978-1-4673-5303-8
  • Electronic_ISBN
    978-1-4673-5302-1
  • Type

    conf

  • DOI
    10.1109/ICDEW.2013.6547416
  • Filename
    6547416