Title :
Recognizing the languages in WebPages — A framework for NLP
Author :
Rajesh, S. ; Vandana, L. ; Carie, C. Anil ; Marapelli, Bhaskar
Author_Institution :
Dept. Of Cse, Nalla Narasimha Reddy Educ. Soc.´s Group of Instn., Hyderabad, India
Abstract :
In this paper we describe an experimental system using java programming language which demonstrates a variety of application level tradeoffs available to distributed NLP applications. In this paper, we proposed language identification system with N-gram-based matching for document retrieval. By using a well known N-gram based algorithm for automatic language identification, we construct a system that dynamically adds language labels for whole documents or text fragments.
Keywords :
Internet; Java; document handling; information retrieval; natural language processing; Java programming language; N-gram-based matching algorithm; Web pages; automatic language identification system; distributed NLP applications; document retrieval; text fragments; Computational intelligence; Conferences; Internet; Search engines; Software; Training; Web pages; IRS; JAVA; NLP; Object Oriented; language Model;
Conference_Titel :
Computational Intelligence and Computing Research (ICCIC), 2013 IEEE International Conference on
Conference_Location :
Enathi
Print_ISBN :
978-1-4799-1594-1
DOI :
10.1109/ICCIC.2013.6724269