DocumentCode :
330299
Title :
Learning to extract and classify names from text
Author :
Fox, Heidi ; Schwartz, Richard ; Stone, Rebecca ; Weischedel, Ralph ; Gadz, Walter
Author_Institution :
GTE/BBN Technol., Cambridge, MA, USA
Volume :
2
fYear :
1998
fDate :
11-14 Oct 1998
Firstpage :
1668
Abstract :
A requirement of virtually all analytic tools, such as timeline and spatial analysis, is structured data; however, much data is in text, an unstructured form. This article presents a new technology to bridge the gap between data buried in text and the requirement of structured data for analysis. The outcome should be an easy-to-maintain information technology component to support DoD and law enforcement applications. Our new approach uses statistical pattern recognition to learn to find data that is locally identifiable, e.g., that is not highly dependent on contexts. Examples are person names, organization names, locations, dates, times, monetary amounts, phone numbers, addresses, and social security numbers. The paper describes the statistical model employed, compares and contrasts the approach to previous approaches, numerically evaluates the adequacy of the technology on Government-supplied data, and illustrates the kind of examples needed for the system to learn to recognize the data desired from examples in documents
Keywords :
learning (artificial intelligence); pattern classification; public administration; statistical analysis; text analysis; DoD applications; Government-supplied data; addresses; dates; information technology component; law enforcement applications; learning; locations; monetary amounts; name classification; name extraction; organization names; personal names; social security numbers; statistical model; statistical pattern recognition; telephone numbers; text; times; Bridges; Data analysis; Data mining; Data security; Databases; Information analysis; Information technology; Law enforcement; Natural languages; Performance analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems, Man, and Cybernetics, 1998. 1998 IEEE International Conference on
Conference_Location :
San Diego, CA
ISSN :
1062-922X
Print_ISBN :
0-7803-4778-1
Type :
conf
DOI :
10.1109/ICSMC.1998.728133
Filename :
728133
Link To Document :
بازگشت