Title : 
Building a free, general-domain paraphrase database for Japanese
         
        
            Author : 
Mizukami, Masahiro ; Neubig, Graham ; Sakti, Sakriani ; Toda, Tomoki ; Nakamura, Satoshi
         
        
            Author_Institution : 
Nara Inst. of Sci. & Technol., Ikoma, Japan
         
        
        
        
        
            Abstract : 
Previous works have used parallel corpora and alignment techniques from phrase-based statistical machine translation to extract and generate paraphrases. In Japanese, paraphrases for a number of paraphrase categories or domains have been extracted by this method. However, most of these resources focus on a particular phenomenon in Japanese, and there are still no Japanese paraphrase resources that cover all varieties of phrases from several domains, and are freely available. In addition, because Japanese and English vary in grammar and word ordering, we perform syntax-based preprocessing to reduce this mismatch and extract paraphrases similar in quality to those extracted using more similar language pairs. The data used in creating the Japanese paraphrases is either in the public domain, or available under the Creative Commons license, and spans a variety of genres for wide coverage.
         
        
            Keywords : 
audio databases; computational linguistics; grammars; language translation; natural language processing; statistical analysis; Creative Commons license; English; Japanese; general-domain paraphrase database; grammar; parallel corpora; phrase-based statistical machine translation; syntax-based preprocessing; word ordering; Grammar; Licenses; Free Data; General-Domain; Paraphrasing;
         
        
        
        
            Conference_Titel : 
Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014 17th Oriental Chapter of the International Committee for the
         
        
        
            DOI : 
10.1109/ICSDA.2014.7051433