Title :
HOCT: A Highly Scalable Algorithm for Training Linear CRF on Modern Hardware
Author :
Chen, Tianyuan ; Chang, Lei ; Ma, Jianqing ; Zhang, Wei ; Gao, Feng
Author_Institution :
Fudan Univ., Shanghai, China
Abstract :
Conditional Random Fields (CRFs) are widely used in machine learning and natural language processing fields. A number of methods have been developed for CRF training. However, even with state-of-the-art algorithms, the training of CRF is still very time and space consuming. This make it infeasible to use CRFs in large-scale data analysis tasks. This paper proposes an efficient algorithm, HOCT, for CRF training on modern computer architectures. First, software prefetching techniques are utilized to hide cache miss latency. Second, we exploit SIMD to process data in parallel. Third, when dealing with large data sets, we let HOCT instead of operating system to manage swapping operations. Our experiments on various real data sets show that HOCT yields a fourfold speedup when the data can fit in memory, and over a 30-fold speedup when the memory requirement exceeds the physical memory.
Keywords :
data analysis; learning (artificial intelligence); natural language processing; parallel processing; storage management; CRF training; HOCT algorithm; SIMD process; conditional random fields; large-scale data analysis tasks; machine learning; natural language processing; parallel processing; software prefetching techniques; Computer architecture; Data analysis; Delay; Hardware; Large-scale systems; Machine learning; Machine learning algorithms; Management training; Natural language processing; Prefetching;
Conference_Titel :
Data Mining Workshops, 2009. ICDMW '09. IEEE International Conference on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4244-5384-9
Electronic_ISBN :
978-0-7695-3902-7
DOI :
10.1109/ICDMW.2009.69