A method for cross-document narrative alignment of a two-hundred-sixty-million word corpus

Author

Ben Miller;Jennifer Olive;Shakthidhar Gopavaram;Yanjun Zhao;Ayush Shrestha;Cynthia Berger

Author_Institution

Departments of English and Communication, Georgia State University

fYear

2015

Firstpage

1673

Lastpage

1677

Abstract

Identifying similar narrative sections across longer documents would help identify key events within a corpus, enrich understanding of those events, provide a mechanism for organizing corpora according to their event content, and allow for bottom-up testing of theories of narrative. This paper proposes an automated method for narrative alignment across large textual corpora using techniques from natural language processing and similarity-based image segmentation. This method proceeds by segmenting each document into a series of events, constructs sequences of abstracted representations of those events, compares pairs of sequences to generate image matrices, segments the images, identifies similar segments to discover commonly occurring narrative units, and, finally, returns the source sentences to make the clusters of narrative similarity readable. Preliminary tests of elements of this method were conducted on a small heterogeneous corpus (<; 100 documents) and a moderate heterogeneous corpus (10k documents). Further implementation as described in this position paper is necessary to scale to the full 251k document corpus from which the moderate corpus was drawn.

Keywords

"Image segmentation","Power cables","Image color analysis","Big data","Semantics","Arrays","Pattern matching"

Publisher

ieee

Conference_Titel

Big Data (Big Data), 2015 IEEE International Conference on

Type

conf

DOI

10.1109/BigData.2015.7363938

Filename

7363938