مرکز منطقه ای اطلاع رساني علوم و فناوري - Scaling the iHMM: Parallelization versus Hadoop

DocumentCode :

3637781

Title :

Scaling the iHMM: Parallelization versus Hadoop

Author :

Sébastien Bratières;Jurgen van Gael;Andreas Vlachos;Zoubin Ghahramani

Author_Institution :

Dept. of Eng., Univ. of Cambridge, Cambridge, UK

fYear :

2010

Firstpage :

1235

Lastpage :

1240

Abstract :

This paper compares parallel and distributed implementations of an iterative, Gibbs sampling, machine learning algorithm. Distributed implementations run under Hadoop on facility computing clouds. The probabilistic model under study is the infinite HMM, in which parameters are learnt using an instance blocked Gibbs sampling, with a step consisting of a dynamic program. We apply this model to learn part-of-speech tags from newswire text in an unsupervised fashion. However our focus here is on runtime performance, as opposed to NLP-relevant scores, embodied by iteration duration, ease of development, deployment and debugging.

Keywords :

"Hidden Markov models","Tagging","Data models","Markov processes","Computational modeling","Probabilistic logic","Machine learning"

Publisher :

ieee

Conference_Titel :

Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on

Print_ISBN :

978-1-4244-7547-6

Type :

conf

DOI :

10.1109/CIT.2010.223

Filename :

5577884

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3637781