• DocumentCode
    2472126
  • Title

    Automatic content extraction and visualization of Thai websites for improved information representation

  • Author

    Thanadechteemapat, Wigrai ; Fung, Chun Che

  • Author_Institution
    Sch. of Inf. Technol., Murdoch Univ., Murdoch, WA, Australia
  • fYear
    2012
  • fDate
    14-17 Oct. 2012
  • Firstpage
    2229
  • Lastpage
    2234
  • Abstract
    This paper presents an integrated approach to automatically provide an overview of content on Thai websites based on tag cloud. This approach is intended to address the information overload issue by presenting the overview to users in order that they could assess whether the information meets their needs. The approach has incorporated Web content extraction, Thai word segmentation, and information presentation to generate a tag cloud in Thai language as an overview of the key content in the webpage. From the experimental study, the generated Thai Tag clouds are able to provide an overview of the tags which frequently appear in the title and body of the content. Moreover, the first few lines in the tag cloud offer an improved readability.
  • Keywords
    Web sites; cloud computing; data visualisation; information retrieval; natural language processing; word processing; Thai Web sites; Thai language; Thai tag clouds; Thai word segmentation; Web page; automatic content extraction; automatic content visualization; content body; content title; information overload issue; information representation; readability improvement; Accuracy; Compounds; Data mining; Dictionaries; Feature extraction; Tag clouds; Visualization; Keyword Extraction; Maximum Term Frequency; Thai Word Segmentation; Thai tag cloud; Web Content Extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man, and Cybernetics (SMC), 2012 IEEE International Conference on
  • Conference_Location
    Seoul
  • Print_ISBN
    978-1-4673-1713-9
  • Electronic_ISBN
    978-1-4673-1712-2
  • Type

    conf

  • DOI
    10.1109/ICSMC.2012.6378072
  • Filename
    6378072