• DocumentCode
    3619799
  • Title

    Information extraction from HTML product catalogues: from source code and images to RDF

  • Author

    M. Labsky;V. Svatek;O. Svab;P. Praks;M. Kratky;V. Snasel

  • Author_Institution
    Dept. of Inf. & Knowledge Eng.,, Univ. of Econ., Prague, Czech Republic
  • fYear
    2005
  • fDate
    6/27/1905 12:00:00 AM
  • Firstpage
    401
  • Lastpage
    404
  • Abstract
    We describe an application of information extraction from company Web sites focusing on product offers. A statistical approach to text analysis is used in conjunction with different ways of image classification. Ontological knowledge is used to group the extracted items into structured objects. The results are stored in an RDF repository and made available for structured search.
  • Keywords
    "Data mining","HTML","Resource description framework","Hidden Markov models","Bicycles","Ontologies","Semantic Web","Web pages","Knowledge engineering","Mathematics"
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM International Conference on
  • Print_ISBN
    0-7695-2415-X
  • Type

    conf

  • DOI
    10.1109/WI.2005.78
  • Filename
    1517879