• DocumentCode
    476079
  • Title

    A hybrid approach for web information extraction

  • Author

    Xiao, Ji-yi ; Zhu, Dao-hui ; Zou, La-mei

  • Author_Institution
    Sch. of Comput. Sci. & Technol., South China Univ., Hengyang
  • Volume
    3
  • fYear
    2008
  • fDate
    12-15 July 2008
  • Firstpage
    1560
  • Lastpage
    1563
  • Abstract
    This paper presents a new approach based on maximum entropy and maximum entropy Markov model for web information extraction. This approach is not only able to overcome the shortcoming of the less precision and recall of the hidden Markov model. In addition, this approach can make the most of various kinds of contextual information from web. The experiments are found that the hybrid approach has an average precision rate of 87.516% while the hidden Markov model trained by the Baum-Welch algorithm has an average precision rate of 68.630%. This implies that the hybrid approach is more optimized than the hidden Markov model trained by the Baum-Welch algorithm.
  • Keywords
    Internet; hidden Markov models; information retrieval; knowledge acquisition; Web information extraction; hidden Markov model; maximum entropy method; Computer science; Cybernetics; Data mining; Electronic mail; Entropy; Hidden Markov models; Iterative algorithms; Machine learning; Probability distribution; Training data; Generalized iterative scaling; Hidden Markov model; Information extraction; Maximum entropy; Maximum entropy Markov model;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2008 International Conference on
  • Conference_Location
    Kunming
  • Print_ISBN
    978-1-4244-2095-7
  • Electronic_ISBN
    978-1-4244-2096-4
  • Type

    conf

  • DOI
    10.1109/ICMLC.2008.4620654
  • Filename
    4620654