DocumentCode
680753
Title
Semi-automatic dictionary curation for domain-specific ontologies
Author
Kulkarni, Ashish ; Gavankar, Chetana ; Ramakrishnan, Ganesh ; Raghavan, Sriram
Author_Institution
Indian Inst. of Technol., Bombay, Mumbai, India
fYear
2013
fDate
4-6 Nov. 2013
Firstpage
727
Lastpage
734
Abstract
Within the broad area of information extraction, we study the problem of effective dictionary curation in an enterprise setting. Equipped with an ontology, representative of the domain of an enterprise, our approach populates the attributes of leaf nodes of the ontology with instances extracted from the enterprise corpus. For an attribute of interest, given a few seed examples or indicative features for the attribute, we first obtain a ranked list of ´list pages´ potentially containing additional dictionary terms. Our ranking model ranks pages from the enterprise corpus based on their ´list´ content using several visual and lexical features. We gather users´ judgement of the result pages and the model continuously learns from this feedback. We compare different techniques of dictionary curation using rule based extractors and visual features of pages. Based on rule writing exercise, we show the benefit of dictionaries for leaf node attributes, in writing rule based extractors for higher level nodes in an ontology. We have implemented a dictionary curation system based on these ideas. Experimental analysis using academic domain ontology and universities corpora, reveal (in the context of enterprise analytics) (i) the merit of dictionary support in rule based information extraction (ii) the viability and effectiveness of an interactive approach for dictionary creation.
Keywords
dictionaries; information retrieval; knowledge based systems; ontologies (artificial intelligence); academic domain ontology; dictionary creation; domain-specific ontologies; enterprise analytics; enterprise corpus; enterprise setting; experimental analysis; information extraction; interactive approach; lexical features; list content; list pages; ontology leaf nodes attributes; ranking model; rule based extractors; rule based information extraction; rule writing exercise; semiautomatic dictionary curation; universities corpora; visual features; Dictionaries; Feature extraction; Ontologies; Sociology; Visualization; dictionary curation; information extraction; ontology population;
fLanguage
English
Publisher
ieee
Conference_Titel
Tools with Artificial Intelligence (ICTAI), 2013 IEEE 25th International Conference on
Conference_Location
Herndon, VA
ISSN
1082-3409
Print_ISBN
978-1-4799-2971-9
Type
conf
DOI
10.1109/ICTAI.2013.112
Filename
6735323
Link To Document