Title :
Invariants Extraction Method Applied in an Omni-language Old Document Navigating System
Author :
Quang Anh Bui ; Visani, Muriel ; Mullot, Remy
Author_Institution :
Lab. L3i, Univ. of La Rochelle, La Rochelle, France
Abstract :
We are currently working on the concept of an omni script and interactive word retrieval system for ancient document collection navigation, based on query composition for non-expert users. To make the query, the user selects and composes writing pieces, which are invariants automatically extracted from the old document collection. In order to extract invariants from documents, strokes must be first extracted and clustered. Stroke extraction raises two main difficulties: detecting the ambiguous zones so as to extract primary strokes (writing pieces which do not contain any ambiguous zone) and grouping the primary strokes so as to form invariants. In this paper, we present existing methods for ambiguity zones detection and compare these methods on documents of different languages and periods to find out which one is more adapted in our context. Once ambiguous zones have been extracted, some neighboring primary strokes are grouped so as to obtain strokes and our clustering algorithm is applied over these strokes to find their representatives, i.e. the invariants. These invariants can further be used by the user to compose his/her query and to retrieve words from the document collection.
Keywords :
document handling; history; pattern clustering; query processing; ambiguity zones detection; ancient document collection navigation; clustering algorithm; interactive word retrieval system; invariants extraction method; omni-language old document navigation system; omni-script language; primary strokes; query composition; stroke extraction; user composition; user selection; Algorithm design and analysis; Clustering algorithms; Databases; Feature extraction; Shape; Visualization; Writing; Ambiguous Zones Detection; Clustering; Invariant Extraction; Stroke Extraction; Word Retrieval;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.268