Identification of Ancient Greek Papyrus Fragments Using Genetic Sequence Alignment Algorithms

Author

Williams, Alex C. ; Carroll, Hyrum D. ; Wallin, John F. ; Brusuelas, James ; Fortson, Lucy ; Lamblin, Anne-Francoise ; Haoyu Yu

Author_Institution

Middle Tennessee State Univ., Murfreesboro, TN, USA

Volume

2

fYear

2014

fDate

20-24 Oct. 2014

Firstpage

5

Lastpage

10

Abstract

Papyrologists analyze, transcribe, and edit papyrus fragments in order to enrich modern lives by better understanding the linguistics, culture, and literature of the ancient world. One of their common tasks is to match an unknown fragment to a known manuscript. This is especially challenging when the fragments are damaged and contain only limited information (e.g., due to deterioration). In the last 100 years, only about 10% of the more than 500,000 fragments recovered from the Egyptian village of Oxyrhynchus have been edited. We do not know what new ancient texts might be found and what can be learned from them, but using current methods of identification this process will take in excess of 1000 years. The identification of an anonymous string of characters with a collection of known text sequences is ubiquitous in computational biology. Genes are often represented by a sequence of continuous characters, each of which denotes an amino acid. Relationships are inferred by finding multi-letter patterns shared between the anonymous sequence and a known sequence. This process is commonly referred to as genetic sequence alignment. In this paper, we introduce a novel methodology that uses modern genetic sequence alignment algorithms as a method for identifying Ancient Greek text fragments. This application will offer papyrologists and other professionals in the humanities the ability to rapidly identify severely damaged texts. This approach leverages a new form of non-contextual, multi-line text identification for the Greek language that can greatly accelerate the tedious task of transcription and identification.

Keywords

bioinformatics; data mining; genetic algorithms; humanities; amino acid; ancient Greek papyrus fragment; ancient Greek text fragment; computational biology; culture; genetic sequence alignment algorithm; humanities; linguistics; multiline text identification; noncontextual identification; Amino acids; Computational biology; Databases; Educational institutions; Error analysis; Genetics; Matrices; Ancient Greek; genetic sequence alignment; identification; papyrus;

fLanguage

English

Publisher

ieee

Conference_Titel

e-Science (e-Science), 2014 IEEE 10th International Conference on

Conference_Location

Sao Paulo

Print_ISBN

978-1-4799-4288-6

Type

conf

DOI

10.1109/eScience.2014.14

Filename

6972089