Title :
Information extraction from HTML product catalogues: from source code and images to RDF
Author :
M. Labsky;V. Svatek;O. Svab;P. Praks;M. Kratky;V. Snasel
Author_Institution :
Dept. of Inf. & Knowledge Eng.,, Univ. of Econ., Prague, Czech Republic
fDate :
6/27/1905 12:00:00 AM
Abstract :
We describe an application of information extraction from company Web sites focusing on product offers. A statistical approach to text analysis is used in conjunction with different ways of image classification. Ontological knowledge is used to group the extracted items into structured objects. The results are stored in an RDF repository and made available for structured search.
Keywords :
"Data mining","HTML","Resource description framework","Hidden Markov models","Bicycles","Ontologies","Semantic Web","Web pages","Knowledge engineering","Mathematics"
Conference_Titel :
Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on
Print_ISBN :
0-7695-2415-X