DocumentCode :
264186
Title :
Efficient search of a sequence of words in a large text file
Author :
Kabir, M.N. ; Alginahi, Yasser M. ; Tayan, Omar
Author_Institution :
IT Res. Center for the Holy Quran & its Sci. (NOOR), Taibah Univ., Madinah, Saudi Arabia
fYear :
2014
fDate :
18-20 Jan. 2014
Firstpage :
1
Lastpage :
6
Abstract :
Searching a specific word or sequence of words, or quotations in a large text document is generally performed by string matching algorithms. Specific word can be more efficiently searched by using suitable hash functions on a hash table where the words of the large text file are organized in order. In this paper, we present a new technique of efficient search of a sequence of words in a large text file using the concept of hashing and linked list. For building the hash table, the words are organized in order. The number of occurrences and positions of each word and positions of its next words are added to the record of each word in the hash table. Any word can be searched using a simple hash function and the subsequent word can be located using the position of next word since this information is available with the record of current word making the hash table as a linked list of words. Since the text file is large, there would be redundancy of hash functional values. Therefore, hash function is used only to locate the upper and lower bounds of the first word. Then an algorithm like binary-search method is used to find the word in the table. We provide our methodology of building the hash table and search techniques. Plausible tests are carried out with simple examples to demonstrate how the algorithms work. Time and space complexities are also computed to show the efficiency of the method.
Keywords :
file organisation; information retrieval; search problems; string matching; text analysis; binary-search method; hash function; hash table; string matching algorithm; text file; Indexes; Memory management; Text recognition; Hash table; Large document; linked lists; string matching; text searching;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Applications & Research (WSCAR), 2014 World Symposium on
Conference_Location :
Sousse
Print_ISBN :
978-1-4799-2805-7
Type :
conf
DOI :
10.1109/WSCAR.2014.6916784
Filename :
6916784
Link To Document :
بازگشت