• DocumentCode
    1808474
  • Title

    A Web Page Segmentation Algorithm Based on Iterated Dividing and Shrinking

  • Author

    Jiuxin, Cao ; Bo, Mao ; Junzhou, Luo

  • Author_Institution
    Southeast Univ., Nanjing
  • fYear
    2007
  • fDate
    18-21 Sept. 2007
  • Firstpage
    701
  • Lastpage
    705
  • Abstract
    Based on image processing technology and the web page special characteristics, a new web page segmentation algorithm - Iterated Dividing and Shrinking Algorithm is proposed. Image dividing conditions are introduced, and the dividing zone concept is given. Based on that, the web page is first transformed into image, and then by shrinking and splitting repeatedly, the image is divided into sub- images which are consentaneous in vision. Experiments show that the algorithm is suitable for web page segmentation, and does well in expansibility and performance.
  • Keywords
    Internet; document image processing; image segmentation; Web page segmentation algorithm; image dividing; iterated dividing and shrinking algorithm; Algorithm design and analysis; Computer networks; Computer science; HTML; Image processing; Image segmentation; Information security; Laboratories; Parallel processing; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Network and Parallel Computing Workshops, 2007. NPC Workshops. IFIP International Conference on
  • Conference_Location
    Liaoning
  • Print_ISBN
    978-0-7695-2943-1
  • Type

    conf

  • DOI
    10.1109/NPC.2007.63
  • Filename
    4351566