DocumentCode
3619799
Title
Information extraction from HTML product catalogues: from source code and images to RDF
Author
M. Labsky;V. Svatek;O. Svab;P. Praks;M. Kratky;V. Snasel
Author_Institution
Dept. of Inf. & Knowledge Eng.,, Univ. of Econ., Prague, Czech Republic
fYear
2005
fDate
6/27/1905 12:00:00 AM
Firstpage
401
Lastpage
404
Abstract
We describe an application of information extraction from company Web sites focusing on product offers. A statistical approach to text analysis is used in conjunction with different ways of image classification. Ontological knowledge is used to group the extracted items into structured objects. The results are stored in an RDF repository and made available for structured search.
Keywords
"Data mining","HTML","Resource description framework","Hidden Markov models","Bicycles","Ontologies","Semantic Web","Web pages","Knowledge engineering","Mathematics"
Publisher
ieee
Conference_Titel
Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on
Print_ISBN
0-7695-2415-X
Type
conf
DOI
10.1109/WI.2005.78
Filename
1517879
Link To Document