Author/Authors :
Batura, T.V. Institute of Informatics Systems - Russian Academy of Sciences Siberian Branch - Novosibirsk State University, Russia , Murzin, F.A. Institute of Informatics Systems - Russian Academy of Sciences Siberian Branch - Novosibirsk State University, Russia , Semich, D.F. Institute of Informatics Systems - Russian Academy of Sciences Siberian Branch - Novosibirsk State University, Russia , Sagnayeva, S.K. Gumilyov Eurasian National University, Astana, Kazakhstan , Tazhibayeva, S.Zh. Gumilyov Eurasian National University, Astana, Kazakhstan , Bakiyev, M.N. Gumilyov Eurasian National University, Astana, Kazakhstan , Yerimbetova, A.S. Gumilyov Eurasian National University, Astana, Kazakhstan , Bakiyeva, A.M. Novosibirsk State University, Novosibirsk, Russia
Abstract :
Growing amount of information on the Internet and rapid development of social
networks make the task of text processing increasingly actual. In this paper we propose an
algorithm for the comparison of sentences and introduce certain measures of the closeness
(similarity) between the sentences. The estimation of the relevance of documents should
be based on the context of a search query and should not be limited only by keywords,
their similarity or frequency. So proposed measures take into account lexical, syntactic and
semantic relations between words. One of the problems we solve in the current time is the
development of a parser like Link Grammar Parser for Turkic languages most frequent in the
Internet, such as Kazakh, Uzbek (Cyrillic and Roman alphabets), and Turkish. The results
of our research are planned to be used in different information retrieval systems.
Keywords :
natural language processing , syntactic analysis , Link Grammar Parser , rele- vance , Turkic languages