مروري بر محاسبات موازي بر روي مجموعه داده‌هاي بزرگ مبتني بر تكنيك MapReduce و Hadoop

عنوان به زبان ديگر

The Review of Parallel Computing on a Large Dataset Based on Map reduce

پديدآورندگان

پاك پرور شبنم shabnampakparvar12345@gmail.com موسسه آموزش عالي شهريار آستارا; , امين صفايي اردكاني فاطمه fatemehaminsafaei@gmail.com موسسه آموزش عالي شهريار آستارا; , حسيني فرناز uni.shahriar.fh@gmail.com موسسه آموزش عالي شهريار آستارا;

تعداد صفحه

كليدواژه

MapReduce , داده‌هاي بزرگ , Big Data , محاسبات موازي , Hadoop

سال انتشار

1396

عنوان كنفرانس

پنجمين كنفرانس بين المللي در مهندسي برق و كامپيوتر با تاكيد بر دانش بومي

زبان مدرك

فارسي

چكيده فارسي

MapReduce يك تكنيك پردازش موازي در سيستم‌هاي محاسباتي توزيع شده است. اين تكنيك، داده‌ها را به قسمت‌هاي كوچكتر تقسيم مي‌كند و هر فرآيند نيز به دستورات كوچكتر شكسته مي‌شود و گره‌هاي مختلف در سيستم‌هاي توزيع شده، بخشي از عمليات را بر مبناي اين قسمت‌ها مديريت مي‌كنند. در بخش اوليه اين تكنيك از تقسيم داده‌ها براي خواندن اطلاعات ورودي و گره‌هاي مياني استفاده مي‌شود. سپس اين داده‌ها برچسپ گذاري شده و در ميان گره‌هاي محاسباتي براساس استفاده از توابع درهم‌ساز توزيع شده و نتايج خود را به گره مركزي انتقال مي‌دهند. در بخش ثانويه اين تكنيك نتيجه‌ي اصلي بر مبناي فرمت درست خروجي توليد مي‌شود. تكنيك Hadoop نيز مدل برنامه نويسي ساده اي را مهيا مي‌كند كه كارآمدي مناسبي براي محاسبات داده‌هاي بزرگ دارد. در اين مطالعه موردي سه الگوريتم در حوزه MapReduce و چهار الگوريتم در حوزه Hadoop مورد بررسي و مقايسه قرار مي‌گيرند. نتايج حاصل از اين مطالعه نشان مي‌دهد در هر دو مورد تكنيك مبتني بر MapReduce توانسته تا حد امكان زمان و سرعت پردازش داده‌هايي با ابعاد بالا را بهبود بخشد.

چكيده لاتين

MapReduce is a parallel processing technique distributed in computing systems. This technique divides the data into smaller parts, and each process is broken into smaller ones, and the nodes in the distributed systems manage a portion of the operation based on these parts. In the initial section of this technique, data splitting is used to read input data and middle nodes. Then these data are pasted and distributed among the computational nodes based on the use of the hashing functions and transfer their results to the central node. In the secondary section of this technique, the main result is generated based on the correct output format. The Hadoop technique also provides a simple programming model that offers great performance for large data calculations. In this case study, three algorithms in the MapReduce domain and four algorithms in the Hadoop domain are reviewed and compared. The results of this study show that in both cases, MapReduce-based technique has been able to improve the processing time and speed of high-dimensional data as much as possible

كشور

ايران

لينک به اين مدرک

https://search.isc.ac/dl/search/defaultta.aspx?DTC=36&DC=291413