Title :
Thai word segmentation for visualization of Thai Web sites
Author :
Thanadechteemapat, Wigrai ; Fung, Chun-che
Author_Institution :
Sch. of Inf. Technol., Murdoch Univ., Murdoch, WA, Australia
Abstract :
Information overload is a problem in the Information Age and Information visualization is an approach to provide an overview of the content of a web site. Tag cloud is one of the ways to represent information as an image of a group of words. However, there are limitations on tag cloud generation, and one of them is due to the characteristics for the language. In order to extract tags or words for tag cloud, word segmentation is required. This paper proposes a Thai word segmentation approach for the visualization of Thai Web sites. The proposed Thai word segmentation technique is based on the longest matching technique together with a refined corpus. The results of Thai word segmentation are compatible with the results from previous BEST´s contests in Thailand.
Keywords :
Web sites; data visualisation; feature extraction; image segmentation; natural language processing; word processing; Thai Web site visualization; Thai word segmentation; Thailand; information overload; information visualization; matching technique; tag cloud generation; tag extraction; word extraction; Compounds; Data visualization; Internet; Tag clouds; Visualization; Web pages; Tag cloud; Thai Word Segmentation; Web Page Visualization;
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2011 International Conference on
Conference_Location :
Guilin
Print_ISBN :
978-1-4577-0305-8
DOI :
10.1109/ICMLC.2011.6016978