Title :
Mother Fugger: Mining Historical Manuscripts with Local Color Patches
Author :
Zhu, Qiang ; Keogh, Eamonn
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of California, Riverside, CA, USA
Abstract :
Initiatives such as the Google Print Library Project and the Million Book Project have already archived more than ten million books in digital format, and within the next decade the majority of world´s books will be online. Although most of the data will naturally be text, there will also be tens of millions of pages of images, many in color. While there is an active research community pursuing data mining of text from historical manuscripts, there has been very little work that exploits the rich color information which is often present. In this work we introduce a simple color measure which both addresses and exploits typical features of historical manuscripts. To enable the efficient mining of massive archives, we propose a tight lower bound to the measure. Beyond the fast similarity search, we show how this lower bound allows us to build several higher-level data mining tools, including motif discovery and link analyses. We demonstrate our ideas in several data mining tasks on manuscripts dating back to the fifteenth century.
Keywords :
data mining; history; image colour analysis; indexing; records management; search problems; Google print library project; Million Book project; archive; color image; data mining; historical manuscript; link analyses; motif discovery; Color Indexing; Historical Manuscripts;
Conference_Titel :
Data Mining (ICDM), 2010 IEEE 10th International Conference on
Conference_Location :
Sydney, NSW
Print_ISBN :
978-1-4244-9131-5
Electronic_ISBN :
1550-4786
DOI :
10.1109/ICDM.2010.11