Title :
Text Modeling for Real-Time Document Categorization
Author :
Byrnes, John ; Rohwer, Richard
Author_Institution :
HNC Software, LLC, Fair Isaac Corp., San Diego, CA
Abstract :
We report on experiments in adapting document categorization techniques to provide for implementation in high-speed hardware. Because resources are scarce, it is important to have a small set of robust and maximally informative variables over which learning can occur. We generate variables using information-theoretic clustering. The resulting performance is on par with general-purpose computing implementations which are able to take advantage of large amounts of time and memory. We conclude that custom high-speed hardware for document categorization can be made very accurate. We also believe that some of the strengths of information-theoretic data analysis techniques are brought out
Keywords :
classification; document handling; high-speed hardware; information-theoretic clustering; information-theoretic data analysis; real-time document categorization; text modeling; Biographies; Data analysis; Hardware; Internet; Labeling; Mutual information; Protocols; Robustness; Routing; TCPIP;
Conference_Titel :
Aerospace Conference, 2005 IEEE
Conference_Location :
Big Sky, MT
Print_ISBN :
0-7803-8870-4
DOI :
10.1109/AERO.2005.1559610