شماره ركورد كنفرانس :
3340
عنوان مقاله :
Topic Word Set-Based Text Clustering
پديدآورندگان :
Ghazifard Amir Mehdi E-Learning Department,University of Isfahan, Isfahan, Iran , Shams Mohammadreza ECE Department,University of Tehran, Tehran, Iran , Shamaee Zeinab ECE Department,Isfahan University of Technology, Isfahan, Iran
كليدواژه :
e-commerce , clustering , classification , term correlation graph , topic word set
عنوان كنفرانس :
هفتمين كنفرانس بين المللي تجارت الكترونيكي در كشورهاي در حال توسعه با تمركز بر امنيت ملي
چكيده لاتين :
Clustering is the task of grouping related and similar data without any prior knowledge
about the labels. In some real world applications, we face huge amounts of unstructured
textual data with no organization. In these situations, clustering is a primitive operation
that needs to be done to help future e-commerce tasks. Clustering can be used to enhance
different e-commerce applications like recommender systems, customer relationship
management systems or personal assistant agents. In this paper we propose a new method
for text clustering, by constructing a term correlation graph, and then extracting topic word
sets from it and finally, categorizing each document to its related topic with the help of a
classification algorithm like SVM. This method provides a natural and understandable
description for clusters by their topic word sets, and it also enables us to decide the cluster
of documents only when needed and in a parallel fashion, thus significantly reducing the
offline processing time. Our clustering method also outperforms the well-known k-means
clustering algorithm according to clustering quality measures.