• DocumentCode
    1825911
  • Title

    Automatic categorization of figures in scientific documents

  • Author

    Lu, Xiaonan ; Mitra, Prasenjit ; Wang, James Z. ; Giles, C. Lee

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Pennsylvania State Univ., University Park, PA
  • fYear
    2006
  • fDate
    38869
  • Firstpage
    129
  • Lastpage
    138
  • Abstract
    Figures are very important non-textual information contained in scientific documents. Current digital libraries do not provide users tools to retrieve documents based on the information available within the figures. We propose architecture for retrieving documents by integrating figures and other information. The initial step in enabling integrated document search is to categorize figures into a set of pre-defined types. We propose several categories of figures based on their functionalities in scholarly articles. We have developed a machine-learning-based approach for automatic categorization of figures. Both global features, such as texture, and part features, such as lines, are utilized in the architecture for discriminating among figure categories. The proposed approach has been evaluated on a testbed document set collected from the CiteSeer scientific literature digital library. Experimental evaluation has demonstrated that our algorithms can produce acceptable results for real- world use. Our tools can be integrated into a scientific-document digital library
  • Keywords
    classification; digital libraries; information retrieval; learning (artificial intelligence); automatic categorization; digital library; machine-learning; nontextual information; scientific document retrieval; Computer science; Databases; Design engineering; Educational institutions; Flowcharts; Information retrieval; Permission; Search engines; Software libraries; Testing; documents; feature extraction; figures; machine learning; scientific literature;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Libraries, 2006. JCDL '06. Proceedings of the 6th ACM/IEEE-CS Joint Conference on
  • Conference_Location
    Chapel Hill, NC
  • Print_ISBN
    1-59593-354-9
  • Type

    conf

  • DOI
    10.1145/1141753.1141778
  • Filename
    4119109