DocumentCode :
2773610
Title :
HOCT: A Highly Scalable Algorithm for Training Linear CRF on Modern Hardware
Author :
Chen, Tianyuan ; Chang, Lei ; Ma, Jianqing ; Zhang, Wei ; Gao, Feng
Author_Institution :
Fudan Univ., Shanghai, China
fYear :
2009
fDate :
6-6 Dec. 2009
Firstpage :
276
Lastpage :
281
Abstract :
Conditional Random Fields (CRFs) are widely used in machine learning and natural language processing fields. A number of methods have been developed for CRF training. However, even with state-of-the-art algorithms, the training of CRF is still very time and space consuming. This make it infeasible to use CRFs in large-scale data analysis tasks. This paper proposes an efficient algorithm, HOCT, for CRF training on modern computer architectures. First, software prefetching techniques are utilized to hide cache miss latency. Second, we exploit SIMD to process data in parallel. Third, when dealing with large data sets, we let HOCT instead of operating system to manage swapping operations. Our experiments on various real data sets show that HOCT yields a fourfold speedup when the data can fit in memory, and over a 30-fold speedup when the memory requirement exceeds the physical memory.
Keywords :
data analysis; learning (artificial intelligence); natural language processing; parallel processing; storage management; CRF training; HOCT algorithm; SIMD process; conditional random fields; large-scale data analysis tasks; machine learning; natural language processing; parallel processing; software prefetching techniques; Computer architecture; Data analysis; Delay; Hardware; Large-scale systems; Machine learning; Machine learning algorithms; Management training; Natural language processing; Prefetching;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops, 2009. ICDMW '09. IEEE International Conference on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4244-5384-9
Electronic_ISBN :
978-0-7695-3902-7
Type :
conf
DOI :
10.1109/ICDMW.2009.69
Filename :
5360418
Link To Document :
بازگشت