Title : 
A new algorithm for construction specific field terms using co-occurrence words information
         
        
            Author : 
Atlam, El-Sayed ; Ghada, Elmarhomy ; Fuketa, M. ; Aoe, Jun-Ichi
         
        
            Author_Institution : 
Dept. of Inf. Sci. & Intelligent Syst., Tokushima Univ., Japan
         
        
        
        
        
        
            Abstract : 
Readers can know the subject of many document fields by reading only some specific words called field association (FA) terms. It is very important to construct these FA terms to decide correctly the document fields from few words information in part of file. The field can be decided efficiency if the number of these FA terms is many and the frequency rate is high. If the number of level I (words that direct connect to terminal fields) FA word is limited, old methods can not determine the documents tiled easily and fast, special when there is a small number of corpus documents. This paper proposes a new method for deciding FA terms using the weight of co-occurrence words and declinable words which related to a narrow association category with eliminating FA terms ambiguity. Moreover, efficient FA terms are difficult to be extracted only by the information of the frequency of them. This paper proposed a new efficient method using new cooccurrence words weight which makes precision and recall are higher than the case of degree of frequency.
         
        
            Keywords : 
natural languages; text analysis; thesauri; word processing; co-occurrence words information; document processing; field association terms; field term construction; Clustering algorithms; Costs; Data compression; Data mining; Frequency; Information science; Intelligent systems; Natural language processing; Partitioning algorithms; Thesauri;
         
        
        
        
            Conference_Titel : 
Circuits and Systems, 2003 IEEE 46th Midwest Symposium on
         
        
        
            Print_ISBN : 
0-7803-8294-3
         
        
        
            DOI : 
10.1109/MWSCAS.2003.1562453