Title :
Mining Unstructured Text at Gigabyte per Second Speeds
Author_Institution :
Nat. Security Agency, Fort Meade, MD
Abstract :
Humans communicate with text in thousands of languages, in dozens of scripts, in a variety of binary codes, on millions of topics. There is a need, for both government and commercial applications, to identify these text characteristics to enable follow-on processing such as transcoding, translation, transliteration, routing and prioritization. This paper deals with the implementation of real-time mining of unstructured text on high-speed hardware capable of processing network data streams at gigabyte per second speeds.
Keywords :
data mining; text analysis; gigabyte per second speed; network data streams; real-time mining; unstructured text mining; Binary codes; Conferences; Data mining; Encoding; Government; Humans; National security; Natural languages; Transcoding; USA Councils; Language; Natural; Processing;
Conference_Titel :
Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3503-6
Electronic_ISBN :
978-0-7695-3503-6
DOI :
10.1109/ICDMW.2008.9