Title : 
A Cascaded Approach to Mention Detection and Chaining in Arabic
         
        
            Author : 
Zitouni, Imed ; Luo, Xiaoqiang ; Florian, Radu
         
        
            Author_Institution : 
IBM T. J. Watson Res. Center, Yorktown Heights, NY
         
        
        
        
        
            fDate : 
7/1/2009 12:00:00 AM
         
        
        
        
            Abstract : 
This paper presents a fully statistical approach to Arabic mention detection and chaining system, built around the maximum entropy principle. The presented system takes a cascade approach to processing an input document, by first detecting mentions in the document and then chaining the identified mentions into entities. Both system components use a common maximum entropy framework, which allows the integration of a large array of feature types, including lexical, morphological, syntactic, and semantic features. Arabic offers additional challenges for this task (when compared with English, for example), as segmentation is a needed processing step, so one can correctly identify and resolve enclitic pronouns. The system presented has obtained very competitive performance in the automatic content extraction (ACE) evaluation program.
         
        
            Keywords : 
maximum entropy methods; natural language processing; statistical analysis; text analysis; Arabic mention detection; automatic content extraction; cascade approach; chaining system; enclitic pronouns; lexical features; maximum entropy principle; segmentation; Cities and towns; Computational linguistics; Data mining; Entropy; Helium; Information retrieval; Morphology; Natural languages; Speech processing; Text processing; Arabic text processing; coreference resolution; maximum entropy; mention detection;
         
        
        
            Journal_Title : 
Audio, Speech, and Language Processing, IEEE Transactions on
         
        
        
        
        
            DOI : 
10.1109/TASL.2009.2016732