Title :
Hardware-Accelerated Parser for Extraction of Metadata in Semantic Network Content
Author :
Moscola, James ; Cho, Young H. ; Lockwood, John W.
Author_Institution :
Washington Univ. in St. Louis, St. Louis
Abstract :
We have implemented a new network information processing system using reconfigurable hardware that scans volumes of data in real-time. One of the key functions of the system is to extract semantic information. Before we can determine the meaning of text, we must identify its language. In a previous project, we have implemented an N-gram based language identifier that can process up to 1 Gbps throughput. However, a large percentage of computer network traffic, such as email and Web data, consists of markup information such as tags and protocol specific options. This additional data interferes with the language identification process causing decreased accuracy. Thus, we developed a hardware architecture for configurable application level processing. Our Application Level Processing System (ALPS) is a custom processor that is automatically generated using syntactic structure of the content. The resulting circuit is mapped on to a reconfigurable device to efficiently extract only the relevant data for the language identifier. To illustrate the effectiveness of the architecture, we have implemented a system that can process electronic mail. Our experiments show that ALPS can improve the accuracy of the hardware language identifier by up to a factor of 200 as compared to a system that does not decode the application-level protocol data.
Keywords :
XML; automatic programming; computer networks; field programmable gate arrays; meta data; program compilers; reconfigurable architectures; semantic networks; application level processing system; custom processor; electronic mail; hardware language identifier; hardware-accelerated parser; language identification process; metadata extraction; network information processing system; reconfigurable hardware; semantic network content; Application software; Computer networks; Data mining; Electronic mail; Hardware; Information processing; Protocols; Real time systems; Telecommunication traffic; Throughput;
Conference_Titel :
Aerospace Conference, 2007 IEEE
Conference_Location :
Big Sky, MT
Print_ISBN :
1-4244-0524-6
Electronic_ISBN :
1095-323X
DOI :
10.1109/AERO.2007.352793