DocumentCode
2963171
Title
Simple Unsupervised Identification of Low-Level Constituents
Author
Ponvert, Elias ; Baldridge, Jason ; Erk, Katrin
Author_Institution
Dept. of Linguistics, Univ. of Texas at Austin, Austin, TX, USA
fYear
2010
fDate
22-24 Sept. 2010
Firstpage
24
Lastpage
31
Abstract
We present an approach to unsupervised partial parsing: the identification of low-level constituents (which we dub clumps) in unannotated text. We begin by showing that CCLParser (Seginer 2007), an unsupervised parsing model, is particularly adept at identifying clumps, and that, surprisingly, building a simple right-branching structure above its clumps actually outperforms the full parser itself, indicating that much of the CCLParser´s performance comes from good local predictions. Based on this observation, we define a simple bigram model that is competitive with CCLParser for clumping which further illustrates how important this level of representation is for unsupervised parsing.
Keywords
grammars; unsupervised learning; CCLParser; bigram model; clump identification; low level constituent; unannotated text; unsupervised identification; unsupervised partial parsing model; Analytical models; Biological system modeling; Gold; Joining processes; Pragmatics; Predictive models; Syntactics; partial parsing; text processing; unsupervised parsing;
fLanguage
English
Publisher
ieee
Conference_Titel
Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on
Conference_Location
Pittsburgh, PA
Print_ISBN
978-1-4244-7912-2
Electronic_ISBN
978-0-7695-4154-9
Type
conf
DOI
10.1109/ICSC.2010.20
Filename
5628814
Link To Document