شماره ركورد كنفرانس :
4747
عنوان مقاله :
Topic Detection for the Persian Language News Using Signature Words and Support Vector Machines
پديدآورندگان :
Khayat Mirza Rasouli Salar University of Tabriz , Babaei Giglou Hamed University of Tabriz , Razmara Jafar University of Tabriz
تعداد صفحه :
7
كليدواژه :
news analysis , linear SVM , one , vs , rest , multi , class classification
سال انتشار :
1398
عنوان كنفرانس :
اجلاس فناوري رسانه
زبان مدرك :
انگليسي
چكيده فارسي :
Topic detection systems which automatically determine the main topics of the news are important field of research in metadata-based approaches in machine learning and natural language processing. The main goal of this work is to build a system that learns previously extracted topics from the news text by making a relationship between the extracted features of the news text and their main topic. We hypothesize that word-based features like signature words in the documents may lead us to extract valuable information about the news topics. We proposed a word-based TF-IDF representation with ignoring less valuable words in the documents with the linear SVM classifier to identify the main topics of the news. Experimental results presented in this paper show that the TF-IDF representation with linear SVM classification of the documents is very promising in identifying the main topic of the news.
كشور :
ايران
لينک به اين مدرک :
بازگشت