DocumentCode :
2792577
Title :
Text Modeling for Real-Time Document Categorization
Author :
Byrnes, John ; Rohwer, Richard
Author_Institution :
HNC Software, LLC, Fair Isaac Corp., San Diego, CA
fYear :
2005
fDate :
5-12 March 2005
Firstpage :
1
Lastpage :
11
Abstract :
We report on experiments in adapting document categorization techniques to provide for implementation in high-speed hardware. Because resources are scarce, it is important to have a small set of robust and maximally informative variables over which learning can occur. We generate variables using information-theoretic clustering. The resulting performance is on par with general-purpose computing implementations which are able to take advantage of large amounts of time and memory. We conclude that custom high-speed hardware for document categorization can be made very accurate. We also believe that some of the strengths of information-theoretic data analysis techniques are brought out
Keywords :
classification; document handling; high-speed hardware; information-theoretic clustering; information-theoretic data analysis; real-time document categorization; text modeling; Biographies; Data analysis; Hardware; Internet; Labeling; Mutual information; Protocols; Robustness; Routing; TCPIP;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Aerospace Conference, 2005 IEEE
Conference_Location :
Big Sky, MT
Print_ISBN :
0-7803-8870-4
Type :
conf
DOI :
10.1109/AERO.2005.1559610
Filename :
1559610
Link To Document :
بازگشت