مرکز منطقه ای اطلاع رساني علوم و فناوري - استخراج ويژگي‌ و بررسي كارآيي روش‌هاي كاهش بُعد در زمينه تحليل احساس

شماره ركورد :

1123418

عنوان مقاله :

استخراج ويژگي‌ و بررسي كارآيي روش‌هاي كاهش بُعد در زمينه تحليل احساس

عنوان به زبان ديگر :

Feature Extraction and Efficiency Comparison Using Dimension Reduction Methods in Sentiment Analysis Context

پديد آورندگان :

برادران، راضيه دانشگاه قم - گروه مهندسي كامپيوتر و فناوري اطلاعات , گلپر رابوكي، عفت دانشگاه قم - گروه رياضي

تعداد صفحه :

از صفحه :

تا صفحه :

كليدواژه :

پردازش زبان طبيعي , تحليل احساس , كدگذار خودكار , تجزيه مقدار تكين , تجزيه نامنفي ماتريس

چكيده فارسي :

امروزه با فراگير‌شدن دسترسي به اينترنت و به‌خصوص شبكه‌هاي اجتماعي، امكان به‌اشتراكگذاري عقايد و نظرات كاربران فراهم شده است. از سوي ديگر تحليل احساس و عقايد افراد مي‌تواند نقش به‌سزايي در تصميم‌گيري سازمان‌ها و توليدكنندگان داشته باشد. از‌اين‌‌رو وظيفه تحليل احساس و يا عقيدهكاوي به زمينه پژوهشي مهمي در حوزه پردازش زبان طبيعي تبديل شده است. يكي از چالش‌هاي استفاده از شيوههاي يادگيري ماشيني در حوزه پردازش زبان طبيعي، انتخاب و استخراج ويژگي‌هاي مناسب از ميان تعداد زياد ويژگي‌هاي اوليه براي دست‌يابي به مدلي با صحت مطلوب است. در اين پژوهش دو روش فشرده‌سازي براساس تجزيه‌هاي ماتريسي SVD و NMF و يك روش بر اساس شبكه‌هاي عصبي براي استخراج ويژگي‌هاي مؤثرتر و با تعداد كمتر در زمينه تحليل احساس در مجموعه‌داده نظرات به زبان فارسي مورد استفاده و تأثير سطح فشرده‌سازي و اندازه مجموعه‌داده در صحت مدلهاي ايجاد‌شده مورد ارزيابي قرارگرفته شده است. بررسي‌ها نشان مي‌دهد كه فشرده‌سازي نه‌‌‌تنها از بار محاسباتي و زماني ايجاد مدل كم مي‌كند، بلكه مي‌تواند صحت مدل را نيز افزايش دهد. بر طبق نتايج پياده‌سازي، فشرده‌سازي ويژگي‌ها از 7700 ويژگي اوليه به دوهزار ويژگي با استفاده از شبكه عصبي، نه‌‌تنها باعث كاهش هزينه محاسسباتي و فضاي ذخيره‌سازي ميشود، بلكه مي‌تواند صحت مدل را از % 05/77 به % 85/77 افزايش دهد. از سوي ديگر در مجموعه داده كوچك با استفاده از روش SVD نتايج بهتري به‌دست مي‌آيد و با تعداد ويژگي دوهزار مي‌توان به صحت % 92/63 در مقابل % 57/63 دست پيدا كرد؛ هم‌چنين آزمايش‌ها حاكي از آن است كه فشرده‌سازي با استفاده از شبكه عصبي در صورت بزرگي مجموعه‌داده براي ابعاد پايين مجموعه ويژگي،‌ بسيار بهتر از ساير روش‌ها عمل مي‌كند. به‌طوري‌كه تنها با يكصد ويژگي استخراج‌شده با استفاده از فشرده‌ساز شبكه عصبي از 7700 ويژگي اوليه مي‌توان به صحت قابل قبول % 46/74 در مقابل صحت اوليه % 05/77 با 7700 ويژگي دست يافت.

چكيده لاتين :

Nowadays, users can share their ideas and opinions with widespread access to the Internet and especially social networks. On the other hand, the analysis of people's feelings and ideas can play a significant role in the decision making of organizations and producers. Hence, sentiment analysis or opinion mining is an important field in natural language processing. One of the most common ways to solve such problems is machine learning methods, which creates a model for mapping features to the desired output. One challenge of using machine learning methods in NLP fields is feature selection and extraction among a large number of early features to achieve models with high accuracy. In fact, the high number of features not only cause computational and temporal problems but also have undesirable effects on model accuracy. Studies show that different methods have been used for feature extraction or selection. Some of these methods are based on selecting important features from feature sets such as Principal Component Analysis (PCA) based methods. Some other methods map original features to new ones with less dimensions but with the same semantic relations like neural networks. For example, sparse feature vectors can be converted to dense embedding vectors using neural network-based methods. Some others use feature set clustering methods and extract less dimension features set like NMF based methods. In this paper, we compare the performance of three methods from these different classes in different dataset sizes. Volume 16, Issue 3 (12-2019) JSDP 2019, 16(3): 88-79 | Back to browse issues page XML Persian Abstract Print Download citation: BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks Send citation to: Mendeley Zotero RefWorks Baradaran R, Golpar-Raboki E. Feature Extraction and Efficiency Comparison Using Dimension Reduction Methods in Sentiment Analysis Context. JSDP. 2019; 16 (3) :88-79 URL: http://jsdp.rcisp.ac.ir/article-1-698-en.html Feature Extraction and Efficiency Comparison Using Dimension Reduction Methods in Sentiment Analysis Context Razieh Baradaran Mrs *, Effat Golpar-Raboki Dr Qom University Abstract: (625 Views) Nowadays, users can share their ideas and opinions with widespread access to the Internet and especially social networks. On the other hand, the analysis of people's feelings and ideas can play a significant role in the decision making of organizations and producers. Hence, sentiment analysis or opinion mining is an important field in natural language processing. One of the most common ways to solve such problems is machine learning methods, which creates a model for mapping features to the desired output. One challenge of using machine learning methods in NLP fields is feature selection and extraction among a large number of early features to achieve models with high accuracy. In fact, the high number of features not only cause computational and temporal problems but also have undesirable effects on model accuracy. Studies show that different methods have been used for feature extraction or selection. Some of these methods are based on selecting important features from feature sets such as Principal Component Analysis (PCA) based methods. Some other methods map original features to new ones with less dimensions but with the same semantic relations like neural networks. For example, sparse feature vectors can be converted to dense embedding vectors using neural network-based methods. Some others use feature set clustering methods and extract less dimension features set like NMF based methods. In this paper, we compare the performance of three methods from these different classes in different dataset sizes. In this study, we use two compression methods using Singular Value Decomposition (SVD) that is based on selecting more important attributes and non-Negative Matrix Factorization (NMF) that is based on clustering early features and one Auto-Encoder based method which convert early features to new feature set with the same semantic relations. We compare these methods performance in extracting more effective and fewer features on sentiment analysis task in the Persian dataset. Also, the impact of the compression level and dataset size on the accuracy of the model has been evaluated. Studies show that compression not only reduces computational and time costs but can also increase the accuracy of the model. For experimental analysis, we use the Sentipers dataset that contains more than 19000 samples of user opinions about digital products and sample representation is done with bag-of-words vectors. The size of bag-of-words vectors or feature vectors is very large because it is the same as vocabulary size. We set up our experiment with 4 sub-datasets with different sizes and show the effect of different compression performance on various compression levels (feature count) based on the size of dataset size. According to experiment results of classification with SVM, feature compression using the neural network from 7700 to 2000 features not only increases the speed of processing and reduces storage costs but also increases the accuracy of the model from 77.05% to 77.85% in the largest dataset contains about 19000 samples. Also in the small dataset, the SVD approach can generate better results and by 2000 features from 7700 original features can obtain 63.92 % accuracy compared to 63.57 % early accuracy. Furthermore, the results indicate that compression based on neural network in large dataset with low dimension feature sets is much better than other approaches, so that with only 100 features extracted by neural network-based auto-encoder, the system achieves acceptable 74.46% accuracy against SVD accuracy 67.15% and NMF accuracy 64.09% and the base model accuracy 77.05% with 7700 features.

سال انتشار :

1398

عنوان نشريه :

پردازش علائم و داده ها

فايل PDF :

7755413

لينک به اين مدرک :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=8&DC=1123418