Title :
An Information Extraction Method for Digitized Textbooks of Traditional Chinese Medicine
Author :
Zhu, Wenhao ; Bai, Shunlai ; Zhang, Bofeng ; Xu, Weimin ; Wei, Daming
Author_Institution :
Sch. of Comput. Eng. & Sci., Shanghai Univ., Shanghai, China
fDate :
June 29 2010-July 1 2010
Abstract :
Digital libraries have shouldered the mission of preserving and spreading human culture in the era of information. However, knowledge extraction for digital libraries is not well studied, and that holds back the role promotion of digital libraries from information collector to knowledge provider. This paper presents an ontology-based approach, which extracts detailed attributes of Traditional Chinese Medicine (TCM) from digitized textbooks. According to the characters of digitized textbooks, we propose an extraction ontology that is compatible with both textbook extraction and TCM theory. To improve extraction tolerance for OCR errors, we extract features of different aspects. Finally, a structured pattern based extraction method is adopted to minimize extraction supervision. The result shows that our method is a practical and robust exploration to address the problem of information extraction for digitized textbooks of TCM.
Keywords :
data mining; digital libraries; feature extraction; ontologies (artificial intelligence); optical character recognition; text analysis; OCR errors; digital libraries; digitized textbooks; extraction ontology; extraction supervision; extraction tolerance; features extract; human culture; information collector; information extraction; knowledge extraction; knowledge provider; pattern based extraction; traditional Chinese medicine; Books; Catalogs; Data mining; Feature extraction; Libraries; Ontologies; Support vector machines; Digital Libraries; Information Extraction; Traditional Chinese Medicine;
Conference_Titel :
Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on
Conference_Location :
Bradford
Print_ISBN :
978-1-4244-7547-6
DOI :
10.1109/CIT.2010.291