DocumentCode :
3269340
Title :
Combining Corpus-Based Features for Selecting Best Natural Language Sentences
Author :
Khosmood, Foaad ; Levinson, Robert
Author_Institution :
Dept. of Comput. Sci., California Polytech. State Univ., San Luis Obispo, CA, USA
Volume :
2
fYear :
2011
fDate :
18-21 Dec. 2011
Firstpage :
362
Lastpage :
365
Abstract :
Automated paraphrasing of natural language text has many interesting applications from aiding in better translations to generating better and more appropriate style language. In this paper, we are concerned with the problem of picking the best English sentence out of a set of machine generated paraphrase sentences, each designed to express the same content as a human generated original. We present a system of scoring sentences based on examples in large corpora. Specifically, we use the Microsoft Web N-Gram service and the text of the Brown Corpus to extract features from all candidate sentences and compare them against each other. We consider three feature combination methods: A handcrafted decision tree, linear regression and linear powerset regression. We find that while each method has particular strengths, the linear power set regression performs best against our human-evaluated test data.
Keywords :
decision trees; natural language processing; regression analysis; English sentence; Microsoft Web N-Gram service; automated paraphrasing; best natural language sentences; corpus-based features; handcrafted decision tree; linear powerset regression; linear regression; machine generated paraphrase sentences; natural language text; Correlation; Decision trees; Educational institutions; Humans; Linear regression; Natural languages; Transforms; Computational Natural Langauge Processing; Linear power-set regression; Linear regression; Paraphrasing Computational Linguistics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications and Workshops (ICMLA), 2011 10th International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
978-1-4577-2134-2
Type :
conf
DOI :
10.1109/ICMLA.2011.170
Filename :
6147706
Link To Document :
بازگشت