Anaphora Resolution in Hindi Documents

Author

Agarwal, Sachin ; Srivastava, Manaj ; Agarwal, Pallavi ; Sanyal, Ratna

Author_Institution

Indian Inst. of Inf. Technol., Allahabad

fYear

2007

fDate

Aug. 30 2007-Sept. 1 2007

Firstpage

452

Lastpage

458

Abstract

This paper presents anaphora resolution as a technique of semantic analysis of text documents written in Hindi language. The focus is on texts that mainly employ simple sentences, such as children´s stories, short essays, etc. The technique works by locating sentences in the text that are semantically related through anaphors, analyzing their semantics and exploiting the latter for resolving referents of the respective anaphors. The approach used here is based on matching constraints for the grammatical attributes of different words. The algorithm for anaphora resolution has been tested extensively. The accuracy of anaphora resolution is nearly 96% for simple sentences and for compound and complex sentences; the accuracy is of the order of 80%. The causes of the errors are analyzed and possible techniques for improvements are discussed.

Keywords

grammars; knowledge representation; natural languages; pattern matching; text analysis; Hindi language; anaphora resolution; knowledge representation; semantic text document analysis; Algorithm design and analysis; Data mining; Genetics; Information retrieval; Information technology; Natural languages; Performance analysis; Speech; Tellurium; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on

Conference_Location

Beijing

Print_ISBN

978-1-4244-1611-0

Electronic_ISBN

978-1-4244-1611-0

Type

conf

DOI

10.1109/NLPKE.2007.4368070

Filename

4368070