Title :
Barq: distributed multilingual internet search engine with focus on Arabic language
Author :
Rachidi, T. ; Iraqi, O. ; Bouzoubaa, M. ; Khattab, A.B.E. ; Kourdi, M. El ; Zahi, A. ; Bensaid, A.
Author_Institution :
Alakhawayn Univ., Ifrane, Morocco
Abstract :
Barq is a distributed multilingual search engine with focus on the Arabic language. The Barq R&D project has involved, over a period of some two years, work on Arabic language processing, Arabic word root extraction, indexing, information retrieval, automatic categorization, focused crawling, distributed computing, distributed database systems, and performance tuning. Barq indexes all documents of the web (and optionally of a particular site) including Word and XML documents that contain at least a single word of Arabic in CP1256, UTF-8, ISO8859_6, ASMO 449 or ASMO 708 code set. The documents themselves can contain other Latin-based characters. This paper focuses on describing the architecture and design patterns of Barq; as well as the various types of search that Barq supports. Issues such as Stemming/Arabic root extraction, indexing, ranking, precision and recall measurements, automatic categorization etc., are presented too, but their details are described in other works.
Keywords :
Internet; distributed databases; indexing; information retrieval; natural languages; online front-ends; search engines; software architecture; Arabic language processing; Barq R&D project; Barq architecture; Latin based characters; Stemming/Arabic root extraction; Web; automatic categorization; distributed computing; distributed database systems; distributed multilingual internet search engine; extensible markup language documents; focused crawling; indexing; information retrieval; ranking; research and development; tuning; word documents; Data mining; Database systems; Delay; Distributed computing; Indexing; Internet; Portals; Research and development; Search engines; TV;
Conference_Titel :
Systems, Man and Cybernetics, 2003. IEEE International Conference on
Print_ISBN :
0-7803-7952-7
DOI :
10.1109/ICSMC.2003.1243853