DocumentCode :
3066591
Title :
N-gram analysis of text documents in Serbian language
Author :
Marovac, U. ; Pljaskovic, A. ; Crnisanin, A. ; Kajan, Ejub
Author_Institution :
Drzavni Univ. u Novom Pazaru, Novi Pazar, Serbia
fYear :
2012
fDate :
20-22 Nov. 2012
Firstpage :
1385
Lastpage :
1388
Abstract :
The modern way of life, e-business, a large amount of data available in electronic form imposed the need for analysis of textual documents written in different natural languages. Every natural language has many rules and variations which makes analysis of the document more difficult. By N-gram analysis of documents, the results can be obtained without specific lexical resources. In this paper, the n-gram analysis of textual documents written in Serbian language is shown and also the algorithm for extracting keywords (n-grams) from a document.
Keywords :
business data processing; natural language processing; text analysis; N-gram document analysis; Serbian language; e-business; electronic data; natural languages; text document analysis; textual document analysis; Electronic mail; HTML; Natural languages; Semantics; Servers; Telecommunications; Text analysis; ključne reči; n-grami; normalizacija dokumenta;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Telecommunications Forum (TELFOR), 2012 20th
Conference_Location :
Belgrade
Print_ISBN :
978-1-4673-2983-5
Type :
conf
DOI :
10.1109/TELFOR.2012.6419476
Filename :
6419476
Link To Document :
بازگشت