DocumentCode :
2963171
Title :
Simple Unsupervised Identification of Low-Level Constituents
Author :
Ponvert, Elias ; Baldridge, Jason ; Erk, Katrin
Author_Institution :
Dept. of Linguistics, Univ. of Texas at Austin, Austin, TX, USA
fYear :
2010
fDate :
22-24 Sept. 2010
Firstpage :
24
Lastpage :
31
Abstract :
We present an approach to unsupervised partial parsing: the identification of low-level constituents (which we dub clumps) in unannotated text. We begin by showing that CCLParser (Seginer 2007), an unsupervised parsing model, is particularly adept at identifying clumps, and that, surprisingly, building a simple right-branching structure above its clumps actually outperforms the full parser itself, indicating that much of the CCLParser´s performance comes from good local predictions. Based on this observation, we define a simple bigram model that is competitive with CCLParser for clumping which further illustrates how important this level of representation is for unsupervised parsing.
Keywords :
grammars; unsupervised learning; CCLParser; bigram model; clump identification; low level constituent; unannotated text; unsupervised identification; unsupervised partial parsing model; Analytical models; Biological system modeling; Gold; Joining processes; Pragmatics; Predictive models; Syntactics; partial parsing; text processing; unsupervised parsing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on
Conference_Location :
Pittsburgh, PA
Print_ISBN :
978-1-4244-7912-2
Electronic_ISBN :
978-0-7695-4154-9
Type :
conf
DOI :
10.1109/ICSC.2010.20
Filename :
5628814
Link To Document :
بازگشت