Title :
Deducing linguistic structure from the statistics of large corpora
Author :
Brill, Eric ; Magerman, David ; Marcus, Mitchell ; Santorini, Beatrice
Author_Institution :
Dept. of Comput. & Inf. Sci., Pennsylvania Univ., Philadelphia, PA, USA
Abstract :
Two experiments that strongly suggest that largely distributional techniques might be developed to automatically provide both a set of part of speech tags for English and a skeletal parsing of free English text are described. In one experiment the authors have developed a constituent boundary parsing algorithm that derives an (unlabeled) bracketing, given text annotated for part of speech as input. In other experiment the authors have investigated whether a distributional analysis can discover a part of speech tag set which might prove adequate to support experiments. The state of a tagged natural language corpus to aid such experiments is summarized
Keywords :
computational linguistics; grammars; linguistics; natural languages; English text; boundary parsing algorithm; distributional analysis; large corpora; linguistic structure; skeletal parsing; speech tags; tagged natural language corpus; Data mining; Distributed computing; Error analysis; Information analysis; Mutual information; Natural languages; Speech analysis; Statistical distributions; Statistics; Stochastic processes;
Conference_Titel :
Information Technology, 1990. 'Next Decade in Information Technology', Proceedings of the 5th Jerusalem Conference on (Cat. No.90TH0326-9)
Conference_Location :
Jerusalem
Print_ISBN :
0-8186-2078-1
DOI :
10.1109/JCIT.1990.128309