DocumentCode :
583151
Title :
Cross Language Information Extraction for Digitized Textbooks of Specific Domains
Author :
Zhu, Wenhao ; Luo, Laihu ; Ju, Chaoyou ; Zhang, Bofeng
Author_Institution :
Sch. of Comput. Eng. & Sci., Shanghai Univ., Shanghai, China
fYear :
2012
fDate :
27-29 Oct. 2012
Firstpage :
1114
Lastpage :
1118
Abstract :
While the influence of the digitization movement is getting wider and wider, more and more countries have initiated their own digital library projects to preserve the culture by digitize millions of books. Together with all kinds of digital resources, such as videos, audios, images etc., the digital library can provide advanced services far more than reading and browsing. Information extraction is one of the fundamental methods to get structured information out of the digital books. Therefore, due to its importance for content integration and knowledge discovery, information extraction for different languages is becoming a key problem for the development of digital library. In this paper, we present a domain-related information extraction framework that suits for digitized textbooks of different languages. To achieve cross language adaptation, we introduce language independent features and simple language dependent features that bind with domain characters to generate extractors. Finally, we present two preliminary experiments to show the feasibility of this framework.
Keywords :
data mining; digital libraries; electronic publishing; content integration; cross language information extraction; digital books; digital library projects; digital resources; digitization movement; digitized textbooks; domain-related information extraction framework; knowledge discovery; language independent features; simple language dependent features; specific domains; Data mining; Electronic publishing; Encyclopedias; Feature extraction; Information retrieval; Libraries; Cross Language; Digitized Textbook; Information Extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Technology (CIT), 2012 IEEE 12th International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4673-4873-7
Type :
conf
DOI :
10.1109/CIT.2012.226
Filename :
6392063
Link To Document :
بازگشت