Title :
Topic word set-based text clustering
Author :
Ghazifard, Amir Mehdi ; Shamaee, Zeinab ; Shams, Maitham
Author_Institution :
E-Learning Dept., Univ. of Isfahan, Isfahan, Iran
Abstract :
Clustering is the task of grouping related and similar data without any prior knowledge about the labels. In some real world applications, we face huge amounts of unstructured textual data with no organization. In these situations, clustering is a primitive operation that needs to be done to help future e-commerce tasks. Clustering can be used to enhance different e-commerce applications like re-commender systems, customer relationship management systems or personal assistant agents. In this paper we propose a new method for text clustering, by constructing a term correlation graph, and then extracting topic word sets from it and finally, categorizing each document to its related topic with the help of a classification algorithm like SVM. This method provides a natural and understandable description for clusters by their topic word sets, and it also enables us to decide the cluster of documents only when needed and in a parallel fashion, thus significantly reducing the offline processing time. Our clustering method also outperforms the well-known k-means clustering algorithm according to clustering quality measures.
Keywords :
Internet; classification; customer relationship management; electronic commerce; graph theory; multi-agent systems; pattern clustering; set theory; support vector machines; text analysis; SVM; World Wide Web; classification algorithm; customer relationship management systems; document categorization; e-commerce tasks; k-means clustering algorithm; offline processing time reduction; personal assistant agents; recommender systems; related data grouping task; similar data grouping task; support vector machine; term correlation graph; topic word set extraction; topic word set-based text clustering; unstructured textual data; Classification algorithms; Clustering algorithms; Clustering methods; Correlation; Indexing; Organizations; Recommender systems; classification; clustering; e-commerce; term correlation graph; topic word set;
Conference_Titel :
e-Commerce in Developing Countries: With Focus on e-Security (ECDC), 2013 7th Intenational Conference on
Conference_Location :
Kish Island
Print_ISBN :
978-1-4799-0394-8
DOI :
10.1109/ECDC.2013.6556740