Title :
A method for cross-document narrative alignment of a two-hundred-sixty-million word corpus
Author :
Ben Miller;Jennifer Olive;Shakthidhar Gopavaram;Yanjun Zhao;Ayush Shrestha;Cynthia Berger
Author_Institution :
Departments of English and Communication, Georgia State University
Abstract :
Identifying similar narrative sections across longer documents would help identify key events within a corpus, enrich understanding of those events, provide a mechanism for organizing corpora according to their event content, and allow for bottom-up testing of theories of narrative. This paper proposes an automated method for narrative alignment across large textual corpora using techniques from natural language processing and similarity-based image segmentation. This method proceeds by segmenting each document into a series of events, constructs sequences of abstracted representations of those events, compares pairs of sequences to generate image matrices, segments the images, identifies similar segments to discover commonly occurring narrative units, and, finally, returns the source sentences to make the clusters of narrative similarity readable. Preliminary tests of elements of this method were conducted on a small heterogeneous corpus (<; 100 documents) and a moderate heterogeneous corpus (10k documents). Further implementation as described in this position paper is necessary to scale to the full 251k document corpus from which the moderate corpus was drawn.
Keywords :
"Image segmentation","Power cables","Image color analysis","Big data","Semantics","Arrays","Pattern matching"
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
DOI :
10.1109/BigData.2015.7363938