• DocumentCode
    67300
  • Title

    Facilitating Document Annotation Using Content and Querying Value

  • Author

    Ruiz, Eduardo J. ; Hristidis, Vagelis ; Ipeirotis, P.G.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Univ. of California, Riverside, Riverside, CA, USA
  • Volume
    26
  • Issue
    2
  • fYear
    2014
  • fDate
    Feb. 2014
  • Firstpage
    336
  • Lastpage
    349
  • Abstract
    A large number of organizations today generate and share textual descriptions of their products, services, and actions. Such collections of textual data contain significant amount of structured information, which remains buried in the unstructured text. While information extraction algorithms facilitate the extraction of structured relations, they are often expensive and inaccurate, especially when operating on top of text that does not contain any instances of the targeted structured information. We present a novel alternative approach that facilitates the generation of the structured metadata by identifying documents that are likely to contain information of interest and this information is going to be subsequently useful for querying the database. Our approach relies on the idea that humans are more likely to add the necessary metadata during creation time, if prompted by the interface; or that it is much easier for humans (and/or algorithms) to identify the metadata when such information actually exists in the document, instead of naively prompting users to fill in forms with information that is not available in the document. As a major contribution of this paper, we present algorithms that identify structured attributes that are likely to appear within the document, by jointly utilizing the content of the text and the query workload. Our experimental evaluation shows that our approach generates superior results compared to approaches that rely only on the textual content or only on the query workload, to identify attributes of interest.
  • Keywords
    content management; document handling; meta data; text analysis; content value; document annotation; query workload; querying value; structured attributes identification; structured metadata generation; text content; Databases; Design automation; Equations; Mathematical model; Probabilistic logic; Document annotation; adaptive forms; collaborative platforms;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2012.224
  • Filename
    6353425