Abstract :
Search in consumer-oriented databases is becoming increasingly important, as the computer becomes a commonly used tool. Such databases are at the heart of e-mail managers, flight booking, and other e-commerce systems. Key problems associated with such searches are the structure and interface of the search query. The traditional solution for these problems involves the use of a separate text field for each element of the query structure. However, the requirement to support ever increasing numbers of inexperienced users, who require an efficient and user-friendly interface, is not met by the traditional solution. We present natural search queries (NSQ), a simple and intuitive approach to the search of structured information. Our solution combines the ideas of natural language database interfaces and operator based search; queries, in simplified and intuitive natural language, are entered into a single text field. It is a front-end search interface oriented towards the common user. Our aim is to allow as much freedom in formulating queries as possible, while interpreting such queries as accurately as possible, to automatically extract the elements of the query structure. In our project, we address the problem of e-mail databases, but the results may be applicable to other databases oriented towards consumer users. The paper introduces the grammar of natural search queries and probabilistic methods for recognizing the query structure (i.e., parsing, and hidden Markov model). In addition, we demonstrate a complete implementation of a system for processing NSQs and presenting retrieved messages. A specific subproblem that was addressed is the deterministic recognition of natural date constraints. Tests show promising results in processing a broad range of natural search queries.
Keywords :
customer services; grammars; natural language interfaces; probability; query processing; consumer-oriented database; grammar; natural language database interface; probabilistic method; query search; user-friendly interface; Business; Computer science; Databases; Heart; Hidden Markov models; Information technology; Markup languages; Natural languages; Testing; Web sites;