Annotation based classification of the PDF document for semantic web

Author

Shukla, Archana

Author_Institution

Comput. Sci. & Eng. Dept., Motilal Nehru Nat. Inst. of Technol., Allahabad, India

Volume

1

fYear

2011

fDate

8-10 April 2011

Firstpage

370

Lastpage

376

Abstract

Main aim of Research Scholars is to produce and communicate new knowledge and to apply innovative applications of existing knowledge which makes a significant impact at national or international level. The most difficult part of a masters or doctoral degree course arguably understands the research paper while identifying their problem area. Then it´s vital that students start with brainstorming interesting research paper ideas and finding a good research paper related to their work. Students of these programs are required to perform the research activity. But most of the time researchers face difficulty to identify their problem area. Defending the research will go on smoothly if the paper is clear in its aim of solving an argument. In this paper, I present an application which provides a user friendly interface based on the context of research academic degree program for research activity. One reason behind this surge is that viewpoints, summaries, notes, observation written by authors on the PDF document are often helpful to readers. My application extracts the metadata such as Title, Keywords, Date and Time, Author, Summary etc and Annotations from the PDF document automatically and also classifies the PDF document either on the basis of the number of comments or on the basis of number of authors made their comments on it. My application also provide facility to classifies the PDF document based on feedback in terms of scores given by research students in between the range of after review the comments available on the PDF document. This help research student in decision-making about the relevance of the PDF document or to judge the quality of the PDF document weather it is related to their problem area or not in the context of the domain, where researchers or students downloaded number of PDF document from the World Wide Web using software agents such as Google to identify their research problem area. These metadata defines the semantics of any document. I have - - developed my application using PDF BOX JAVA API. My work is motivated by the desire to have a knowledge base regarding metadata and annotation about the PDF document so that it can be used by the research students to take decision to identify their problem area.

Keywords

Java; application program interfaces; document handling; pattern classification; search engines; semantic Web; user interfaces; Google; PDF Box Java API; PDF document classification; annotation based classification; application program interface; doctoral degree course; masters degree course; metadata; research academic degree program; semantic Web; software agents; user friendly interface; Arrays; Context; Data mining; Meteorology; Portals; Relational databases; Annotation; Classification; Metadata; Search Engines; Semantic Web; World Wide Web;

fLanguage

English

Publisher

ieee

Conference_Titel

Electronics Computer Technology (ICECT), 2011 3rd International Conference on

Conference_Location

Kanyakumari

Print_ISBN

978-1-4244-8678-6

Electronic_ISBN

978-1-4244-8679-3

Type

conf

DOI

10.1109/ICECTECH.2011.5941625

Filename

5941625