DocumentCode :
1934154
Title :
The method of synonyms extraction from unannotated corpus
Author :
Pak, Alexander Alexandrovich ; Narynov, Sergazy Sakenovich ; Zharmagambetov, Arman Serikuly ; Sagyndykova, Sholpan Nazarovna ; Kenzhebayeva, Zhanat Elubaevna ; Turemuratovich, Irbulat
Author_Institution :
LLC AlemResearch, Almaty, Kazakhstan
fYear :
2015
fDate :
3-5 Feb. 2015
Firstpage :
1
Lastpage :
5
Abstract :
The structuring of large volumes of e-documents assumes the organization of text on several levels, namely paragraphs, sentences, phrases, words. Methods of lexical paradigms extraction using statistical analysis were developed long ago. In this paper we attempt to move from lexical correlatives to the list of synonyms on various levels of generalization on the basis of local and global contexts´ statistics.
Keywords :
data mining; statistical analysis; text analysis; e-document structuring; generalization levels; global statistics; lexical correlatives; lexical paradigm extraction method; local statistics; paragraphs; phrases; sentences; statistical analysis; synonym extraction method; synonym list; text organization; unannotated corpus; words; Clustering algorithms; Context; Data mining; Educational institutions; Histograms; Information retrieval; Semantics; Data Mining; Extracting synonym algorithm; categorize the topics of texts; construction of a semantic map concepts; e-documents;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Information, Networking, and Wireless Communications (DINWC), 2015 Third International Conference on
Conference_Location :
Moscow
Print_ISBN :
978-1-4799-6375-1
Type :
conf
DOI :
10.1109/DINWC.2015.7054207
Filename :
7054207
Link To Document :
بازگشت