DocumentCode :
3619799
Title :
Information extraction from HTML product catalogues: from source code and images to RDF
Author :
M. Labsky;V. Svatek;O. Svab;P. Praks;M. Kratky;V. Snasel
Author_Institution :
Dept. of Inf. & Knowledge Eng.,, Univ. of Econ., Prague, Czech Republic
fYear :
2005
fDate :
6/27/1905 12:00:00 AM
Firstpage :
401
Lastpage :
404
Abstract :
We describe an application of information extraction from company Web sites focusing on product offers. A statistical approach to text analysis is used in conjunction with different ways of image classification. Ontological knowledge is used to group the extracted items into structured objects. The results are stored in an RDF repository and made available for structured search.
Keywords :
"Data mining","HTML","Resource description framework","Hidden Markov models","Bicycles","Ontologies","Semantic Web","Web pages","Knowledge engineering","Mathematics"
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on
Print_ISBN :
0-7695-2415-X
Type :
conf
DOI :
10.1109/WI.2005.78
Filename :
1517879
Link To Document :
بازگشت