DocumentCode :
1291658
Title :
Extracting statistical data from free-form text
Author :
Hill, L. Owen ; Zein, David A.
Author_Institution :
IBM Corp., East Fishkill, NY, USA
Volume :
2
Issue :
3
fYear :
1986
fDate :
5/1/1986 12:00:00 AM
Firstpage :
18
Lastpage :
24
Abstract :
The authors describe a method for processing free-form text files. The method consists of segregating and separating four physically and logically identifiable regions. The four regions are postprocessed to update three history files that contain information about manufactured products over a period of time. The technique used in processing such files falls under the general category of data segregation and character recognition. It involves the use of logical and mathematical operations in recognizing region boundaries and types of data fields and establishing uniqueness in name recognition. Hashing methods are used, combined with logical matrix multiplication in updating the history files. Sparse formats are used to store multiple large arrays on disks, reducing storage requirements by more than a factor of two. The techniques are implemented using multiprogramming environments in an automated system.
Keywords :
data handling; manufacturing data processing; statistics; word processing; character recognition; data extraction; data fields; data segregation; free-form text; hashing; history files; logical operations; manufactured products; mathematical operations; matrix multiplication; multiple large arrays; multiprogramming environments; name recognition; region boundaries; sparse formats; statistical data; storage requirements; uniqueness; Arrays; Data mining; Graphics; History; Logic arrays; Matrix converters; Vectors;
fLanguage :
English
Journal_Title :
Circuits and Devices Magazine, IEEE
Publisher :
ieee
ISSN :
8755-3996
Type :
jour
DOI :
10.1109/MCD.1986.6311822
Filename :
6311822
Link To Document :
بازگشت