DocumentCode :
3252682
Title :
A new approach to short web document creation based on textual and visual information
Author :
Zachariasova, Martina ; Kamencay, Patrik ; Hudec, Robert ; Benco, Miroslav ; Matuska, Slavomir
Author_Institution :
Dept. of Telecommun. & Multimedia, Univ. of Zilina, Zilina, Slovakia
fYear :
2013
fDate :
2-4 July 2013
Firstpage :
788
Lastpage :
792
Abstract :
This paper deals with research in area of automatic semantic inclusion of textual and non-textual information of Web documents. The main idea is to create a robust method for extraction of images and textual segments to obtain short web document. Thus, developed method consist of two data types extractions, where both, image and text data extraction are using Document Object Model (DOM) tree. Extracted objects are saved in separated databases followed by the images analysis that defines and describes image object from semantic point of view. Moreover, the semantic descriptions of all modal objects are utilized to short web document creation. We implement our novel method using the Scale Invariant Feature Transform (SIFT) descriptor within a Support Vector Machine (SVM) classifier. Further, in order to obtain a semantic description of objects in static image, the Support Vector Machine (SVM) classification were applied. Finally, semantic inclusion textual and visual information was realized. The developed method has been tested on real and off-line web documents.
Keywords :
feature extraction; information retrieval; object recognition; support vector machines; text analysis; text detection; transforms; DOM tree; SIFT descriptor; SVM classifier; automatic semantic inclusion; document object model tree; image analysis; image extraction; object extraction; object semantic description; off-line web document; real web document; scale invariant feature transform; semantic inclusion textual information; semantic inclusion visual information; separated databases; short-Web document creation; static image; support vector machine; text data extraction; textual segments; Algorithm design and analysis; Feature extraction; Image analysis; Image segmentation; Semantics; Support vector machines; Training; DOM; Image analysis; SIFT; SVM; Semantic Inclusion of Images and Textual segments;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Telecommunications and Signal Processing (TSP), 2013 36th International Conference on
Conference_Location :
Rome
Print_ISBN :
978-1-4799-0402-0
Type :
conf
DOI :
10.1109/TSP.2013.6614046
Filename :
6614046
Link To Document :
بازگشت