DocumentCode :
175136
Title :
Construction of Scholarly n-Gram from Huge Text Data
Author :
Myunggwon Hwang ; Mi-Nyeong Hwang ; Ha-Neul Yeom ; Hanmin Jung
Author_Institution :
Korea Inst. of Sci. & Technol. Inf. (KISTI), Daejeon, South Korea
fYear :
2014
fDate :
2-4 July 2014
Firstpage :
31
Lastpage :
35
Abstract :
The ultimate goal of this research is to provide n-gram data that is specialized for scholarly utilization. To this end, this paper outlines the construction of a scholarly n-gram through the processing of large text documents. Many researchers, especially non-native English language speakers, find it difficult to construct sentences and paragraphs with appropriate and disambiguated words. One of the methods that can assist them is the provision of n-gram data. A representative n-gram known as Web 1T 5-Gram Version 1, which was constructed by processing virtually all documents retrieved using Google, already exists. However, this data contain unfocused word recommendations, therefore, they are not suitable. Consequently, we are constructing a scholarly n-gram. In this paper, we demonstrate the efficiency of n-gram using Web 1T unigram and introduce and discuss the specifics of our research plan related to scholarly n-gram.
Keywords :
Internet; information retrieval; natural language processing; recommender systems; text analysis; English language speakers; Google; document retrieval; n-gram data; text document processing; word disambiguation; word recommendations; Context; Google; Reliability; Semantic Web; Semantics; Text categorization; Time-frequency analysis; context n-gram; personalized n-gram; scholarly n-gram; time-dependent n-gram;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS), 2014 Eighth International Conference on
Conference_Location :
Birmingham
Print_ISBN :
978-1-4799-4333-3
Type :
conf
DOI :
10.1109/IMIS.2014.4
Filename :
6975437
Link To Document :
بازگشت