• DocumentCode
    2963171
  • Title

    Simple Unsupervised Identification of Low-Level Constituents

  • Author

    Ponvert, Elias ; Baldridge, Jason ; Erk, Katrin

  • Author_Institution
    Dept. of Linguistics, Univ. of Texas at Austin, Austin, TX, USA
  • fYear
    2010
  • fDate
    22-24 Sept. 2010
  • Firstpage
    24
  • Lastpage
    31
  • Abstract
    We present an approach to unsupervised partial parsing: the identification of low-level constituents (which we dub clumps) in unannotated text. We begin by showing that CCLParser (Seginer 2007), an unsupervised parsing model, is particularly adept at identifying clumps, and that, surprisingly, building a simple right-branching structure above its clumps actually outperforms the full parser itself, indicating that much of the CCLParser´s performance comes from good local predictions. Based on this observation, we define a simple bigram model that is competitive with CCLParser for clumping which further illustrates how important this level of representation is for unsupervised parsing.
  • Keywords
    grammars; unsupervised learning; CCLParser; bigram model; clump identification; low level constituent; unannotated text; unsupervised identification; unsupervised partial parsing model; Analytical models; Biological system modeling; Gold; Joining processes; Pragmatics; Predictive models; Syntactics; partial parsing; text processing; unsupervised parsing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on
  • Conference_Location
    Pittsburgh, PA
  • Print_ISBN
    978-1-4244-7912-2
  • Electronic_ISBN
    978-0-7695-4154-9
  • Type

    conf

  • DOI
    10.1109/ICSC.2010.20
  • Filename
    5628814