DocumentCode
3717317
Title
A method for cross-document narrative alignment of a two-hundred-sixty-million word corpus
Author
Ben Miller;Jennifer Olive;Shakthidhar Gopavaram;Yanjun Zhao;Ayush Shrestha;Cynthia Berger
Author_Institution
Departments of English and Communication, Georgia State University
fYear
2015
Firstpage
1673
Lastpage
1677
Abstract
Identifying similar narrative sections across longer documents would help identify key events within a corpus, enrich understanding of those events, provide a mechanism for organizing corpora according to their event content, and allow for bottom-up testing of theories of narrative. This paper proposes an automated method for narrative alignment across large textual corpora using techniques from natural language processing and similarity-based image segmentation. This method proceeds by segmenting each document into a series of events, constructs sequences of abstracted representations of those events, compares pairs of sequences to generate image matrices, segments the images, identifies similar segments to discover commonly occurring narrative units, and, finally, returns the source sentences to make the clusters of narrative similarity readable. Preliminary tests of elements of this method were conducted on a small heterogeneous corpus (<; 100 documents) and a moderate heterogeneous corpus (10k documents). Further implementation as described in this position paper is necessary to scale to the full 251k document corpus from which the moderate corpus was drawn.
Keywords
"Image segmentation","Power cables","Image color analysis","Big data","Semantics","Arrays","Pattern matching"
Publisher
ieee
Conference_Titel
Big Data (Big Data), 2015 IEEE International Conference on
Type
conf
DOI
10.1109/BigData.2015.7363938
Filename
7363938
Link To Document