• DocumentCode
    658351
  • Title

    Automating Document Annotation Using Open Source Knowledge

  • Author

    Singhal, Achintya ; Kasturi, Rangachar ; Srivastava, Jaideep

  • Author_Institution
    Dept. of Comput. Sci. & Electr. Eng., Univ. of Minnesota, Minneapolis, MN, USA
  • Volume
    1
  • fYear
    2013
  • fDate
    17-20 Nov. 2013
  • Firstpage
    199
  • Lastpage
    204
  • Abstract
    Annotating documents with relevant and comprehensive keywords offers invaluable assistance to the readers to quickly overview any document. The problem of document annotation is addressed in the literature under two broad classes of techniques namely, key phrase extraction and key phrase abstraction. In this paper, we propose a novel approach to generate summary phrases for research documents. Given the dynamic nature of scientific research, it has become important to incorporate new and popular scientific terminologies in document annotations. For this purpose, we have used crowd-source knowledge bases like Wikipedia and WikiCFP (a open source information source for call for papers) for automating key phrase generation. Also, we have taken into account the lack of availability of the document´s content (due to protective policies) and developed a global context based key-phrase identification approach. We show that given only the title of a document, the proposed approach generates its global context information using academic search engines like Google Scholar. We evaluated the performance of the proposed approach on real-world dataset obtained from a computer science research document corpus. We quantitatively evaluated the performance of the proposed approach and compared it with two baseline approaches.
  • Keywords
    Web sites; document handling; knowledge based systems; natural language processing; research and development; search engines; Google Scholar; WikiCFP; Wikipedia; academic search engines; computer science research document corpus; crowd-source knowledge bases; document annotation automation; global context based key-phrase identification; key phrase abstraction; key phrase extraction; key phrase generation; open source knowledge; scientific terminologies; summary phrases; Context; Databases; Electronic publishing; Encyclopedias; Google; Internet; Google Scholar; Wikipedia; document summarization; global context;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013 IEEE/WIC/ACM International Joint Conferences on
  • Conference_Location
    Atlanta, GA
  • Print_ISBN
    978-1-4799-2902-3
  • Type

    conf

  • DOI
    10.1109/WI-IAT.2013.30
  • Filename
    6690015