• DocumentCode
    3252682
  • Title

    A new approach to short web document creation based on textual and visual information

  • Author

    Zachariasova, Martina ; Kamencay, Patrik ; Hudec, Robert ; Benco, Miroslav ; Matuska, Slavomir

  • Author_Institution
    Dept. of Telecommun. & Multimedia, Univ. of Zilina, Zilina, Slovakia
  • fYear
    2013
  • fDate
    2-4 July 2013
  • Firstpage
    788
  • Lastpage
    792
  • Abstract
    This paper deals with research in area of automatic semantic inclusion of textual and non-textual information of Web documents. The main idea is to create a robust method for extraction of images and textual segments to obtain short web document. Thus, developed method consist of two data types extractions, where both, image and text data extraction are using Document Object Model (DOM) tree. Extracted objects are saved in separated databases followed by the images analysis that defines and describes image object from semantic point of view. Moreover, the semantic descriptions of all modal objects are utilized to short web document creation. We implement our novel method using the Scale Invariant Feature Transform (SIFT) descriptor within a Support Vector Machine (SVM) classifier. Further, in order to obtain a semantic description of objects in static image, the Support Vector Machine (SVM) classification were applied. Finally, semantic inclusion textual and visual information was realized. The developed method has been tested on real and off-line web documents.
  • Keywords
    feature extraction; information retrieval; object recognition; support vector machines; text analysis; text detection; transforms; DOM tree; SIFT descriptor; SVM classifier; automatic semantic inclusion; document object model tree; image analysis; image extraction; object extraction; object semantic description; off-line web document; real web document; scale invariant feature transform; semantic inclusion textual information; semantic inclusion visual information; separated databases; short-Web document creation; static image; support vector machine; text data extraction; textual segments; Algorithm design and analysis; Feature extraction; Image analysis; Image segmentation; Semantics; Support vector machines; Training; DOM; Image analysis; SIFT; SVM; Semantic Inclusion of Images and Textual segments;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Telecommunications and Signal Processing (TSP), 2013 36th International Conference on
  • Conference_Location
    Rome
  • Print_ISBN
    978-1-4799-0402-0
  • Type

    conf

  • DOI
    10.1109/TSP.2013.6614046
  • Filename
    6614046