Title :
SyntacticDiff: Operator-based transformation for comparative text mining
Author :
Sean Massung;ChengXiang Zhai
Author_Institution :
Department of Computer Science, College of Engineering University of Illinois at Urbana-Champaign
Abstract :
We describe SyntacticDiff, a novel, general, and efficient edit-based method for transforming sequences of words given a reference text collection. These transformations can be used directly or can be employed as features to represent text data in a wide variety of text mining applications. As case studies, we apply SyntacticDiff to three quite different tasks, including grammatical error correction, student essay clustering and analysis, and native language identification, showing its benefit in each case. SyntacticDiff is completely general and can thus be potentially applied to any text data in any natural language. It is highly efficient, customizable, and able to capture syntactic differences from a reference text collection at the sentence, document, and subcollection levels. This enables both a rich translation method and feature representation for many text mining tasks that deal with word usage and syntax beyond bag-of-words.
Keywords :
"Text mining","Transforms","Syntactics","Natural language processing","Robustness","Writing"
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
DOI :
10.1109/BigData.2015.7363801