DocumentCode :
3717158
Title :
Regular expression acceleration on the micron automata processor: Brill tagging as a case study
Author :
Keira Zhou;Jack Wadden;Jeffrey J. Fox;Ke Wang;Donald E. Brown;Kevin Skadron
Author_Institution :
University of Virginia Charlottesville, VA 22904 USA
fYear :
2015
Firstpage :
355
Lastpage :
360
Abstract :
Brill tagging is a classic rule-based algorithm for part-of-speech (POS) tagging that assigns tags, such as nouns, verbs, adjectives, etc., to input tokens. Due to the the intense memory requirements of rule matching, CPU implementations of the Brill tagging algorithm have been found to be slow. We show that Micron´s Automata Processor (AP) - a new computing architecture that can perform massively parallel pattern matching - can greatly accelerate the second stage of Brill tagging via rule template matching. The 218 contextual rules are first converted into regular expressions (regex). Regex is used widely in natural language processing (NLP) tasks, thus, this case study involving Brill Tagging also shows how the AP might accelerate other applications that are able to be framed as regexes. We compare single-threaded, and multithreaded versions of Regex matching on an Intel i7 CPU, an Intel XeonPhi co-processor, and the AP. The results show a 63.90X speed-up using the AP as a regex accelerator over the fastest multi-threaded CPU version. We also investigate how performance of regex matching on both CPU architectures varies depending on the complexity of the regex. Taken together, these results demonstrate the potential for significant performance improvements by using accelerators for various NLP computational tasks, particularly those that involve rule-based or pattern-matching approaches.
Keywords :
"Tagging","Automata","Natural language processing","Acceleration","Pattern matching","Training","Big data"
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/BigData.2015.7363776
Filename :
7363776
Link To Document :
بازگشت