Title :
Word segmentation refinement by Wikipedia for textual entailment
Author :
Chuan-Jie Lin ; Yu-Cheng Tu
Author_Institution :
Dept. of Comput. Sci. & Eng., Nat. Taiwan Ocean Univ., Keelung, Taiwan
Abstract :
Textual entailment in Chinese differs from the way handling English because of the lack of word delimiters and capitalization. Information from word segmentation and Wikipedia often plays an important role in textual entailment recognition. However, the inconsistency of boundaries of word segmentation and matched Wikipedia titles should be resolved first. This paper proposed 4 ways to incorporate Wikipedia title matching and word segmentation, experimented in several feature combinations. The best system redoes word segmentation after matching Wikipedia titles. The best feature combination for BC task uses content words and Wikipedia titles only, which achieves a macro-average F-measure of 67.33% and an accuracy of 68.9%. The best MC RITE system also achieves a macro-average F-measure of 46.11% and an accuracy of 58.34%. They beat all the runs in NTCIR-10 RITE-2 CT tasks.
Keywords :
Web sites; natural language processing; pattern matching; text analysis; BC task; Chinese language; English language; MC RITE system; NTCIR-10 RITE-2 CT tasks; Wikipedia; Wikipedia title matching; accuracy analysis; content words; feature combinations; macroaverage F-measure; textual entailment recognition; word capitalization; word delimiters; word segmentation boundary inconsistency; word segmentation refinement; Benchmark testing; Electronic publishing; Encyclopedias; Internet; Numerical models; Training; NTCIR RITE benchmarks; Textual entailment; Wikipedia title matching; Word segmentation;
Conference_Titel :
Information Reuse and Integration (IRI), 2014 IEEE 15th International Conference on
DOI :
10.1109/IRI.2014.7051944