Title :
The method of synonyms extraction from unannotated corpus
Author :
Pak, Alexander Alexandrovich ; Narynov, Sergazy Sakenovich ; Zharmagambetov, Arman Serikuly ; Sagyndykova, Sholpan Nazarovna ; Kenzhebayeva, Zhanat Elubaevna ; Turemuratovich, Irbulat
Author_Institution :
LLC AlemResearch, Almaty, Kazakhstan
Abstract :
The structuring of large volumes of e-documents assumes the organization of text on several levels, namely paragraphs, sentences, phrases, words. Methods of lexical paradigms extraction using statistical analysis were developed long ago. In this paper we attempt to move from lexical correlatives to the list of synonyms on various levels of generalization on the basis of local and global contexts´ statistics.
Keywords :
data mining; statistical analysis; text analysis; e-document structuring; generalization levels; global statistics; lexical correlatives; lexical paradigm extraction method; local statistics; paragraphs; phrases; sentences; statistical analysis; synonym extraction method; synonym list; text organization; unannotated corpus; words; Clustering algorithms; Context; Data mining; Educational institutions; Histograms; Information retrieval; Semantics; Data Mining; Extracting synonym algorithm; categorize the topics of texts; construction of a semantic map concepts; e-documents;
Conference_Titel :
Digital Information, Networking, and Wireless Communications (DINWC), 2015 Third International Conference on
Conference_Location :
Moscow
Print_ISBN :
978-1-4799-6375-1
DOI :
10.1109/DINWC.2015.7054207