• DocumentCode
    264186
  • Title

    Efficient search of a sequence of words in a large text file

  • Author

    Kabir, M.N. ; Alginahi, Yasser M. ; Tayan, Omar

  • Author_Institution
    IT Res. Center for the Holy Quran & its Sci. (NOOR), Taibah Univ., Madinah, Saudi Arabia
  • fYear
    2014
  • fDate
    18-20 Jan. 2014
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Searching a specific word or sequence of words, or quotations in a large text document is generally performed by string matching algorithms. Specific word can be more efficiently searched by using suitable hash functions on a hash table where the words of the large text file are organized in order. In this paper, we present a new technique of efficient search of a sequence of words in a large text file using the concept of hashing and linked list. For building the hash table, the words are organized in order. The number of occurrences and positions of each word and positions of its next words are added to the record of each word in the hash table. Any word can be searched using a simple hash function and the subsequent word can be located using the position of next word since this information is available with the record of current word making the hash table as a linked list of words. Since the text file is large, there would be redundancy of hash functional values. Therefore, hash function is used only to locate the upper and lower bounds of the first word. Then an algorithm like binary-search method is used to find the word in the table. We provide our methodology of building the hash table and search techniques. Plausible tests are carried out with simple examples to demonstrate how the algorithms work. Time and space complexities are also computed to show the efficiency of the method.
  • Keywords
    file organisation; information retrieval; search problems; string matching; text analysis; binary-search method; hash function; hash table; string matching algorithm; text file; Indexes; Memory management; Text recognition; Hash table; Large document; linked lists; string matching; text searching;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Applications & Research (WSCAR), 2014 World Symposium on
  • Conference_Location
    Sousse
  • Print_ISBN
    978-1-4799-2805-7
  • Type

    conf

  • DOI
    10.1109/WSCAR.2014.6916784
  • Filename
    6916784