DocumentCode :
3003750
Title :
Supervised learning in the wild: Text classification for critical technologies
Author :
Maiya, Arun S. ; Loaiza-Lemos, F. ; Rolfe, Robert M.
Author_Institution :
Inst. for Defense Anal., Alexandria, VA, USA
fYear :
2012
fDate :
Oct. 29 2012-Nov. 1 2012
Firstpage :
1
Lastpage :
6
Abstract :
We explore the problem of locating documents pertaining to critical technologies (e.g., restricted, proprietary, or sensitive technical information) from among a massive and highly heterogeneous collection of largely unimportant files. We present a system that employs the use of supervised machine learning (i.e., pattern recognition) to detect such critical documents. To address difficult or ambiguous instances, we supplement the text classifier with an automated keyword search. That is, we extract, in an automated fashion, discriminative terms (i.e., keywords) from the training set and match them against documents during the classification process. We demonstrate the effectiveness of this hybrid approach through a series of validation tests and case studies.
Keywords :
learning (artificial intelligence); text analysis; automated keyword search; case studies; classification process; critical documents; critical technologies; discriminative terms; pattern recognition; sensitive technical information; supervised learning; supervised machine learning; text classification; text classifier; validation tests; Keyword search; Machine learning; Machine learning algorithms; Servers; Standards; Support vector machines; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
MILITARY COMMUNICATIONS CONFERENCE, 2012 - MILCOM 2012
Conference_Location :
Orlando, FL
ISSN :
2155-7578
Print_ISBN :
978-1-4673-1729-0
Type :
conf
DOI :
10.1109/MILCOM.2012.6415660
Filename :
6415660
Link To Document :
بازگشت