Title :
Integrating Web resources and lexicons into a natural language query system
Author :
Katz, Boris ; Yuret, Deniz ; Lin, Jimmy ; Felshin, Sue ; Schulman, Rebecca ; Ilik, Adnan ; Ibrahim, Ali ; Osafo-Kwaako, Philip
Author_Institution :
Artificial Intelligence Lab., MIT, Cambridge, MA, USA
Abstract :
The START system responds to natural language queries with answers in text, pictures, and other media. START´s sentence-level natural language parsing relies on a number of mechanisms to help it process the huge, diverse resources available on the World Wide Web. Blitz, a hybrid heuristic- and corpus-based natural language preprocessor enables START to integrate a large and ever-changing lexicon of proper names, by using heuristic rules and precompiled tables of symbols to preprocess various highly regular and fixed expressions into lexical tokens. LaMeTH, a content-based system for extracting information from HTML documents, assists START by providing a uniform method of accessing information on the Web in real time. These mechanisms have considerably improved STARTS ability to analyze real-world sentences and answer queries through expansion of its lexicon and integration of Web resources
Keywords :
Internet; content-based retrieval; hypermedia markup languages; information resources; multimedia databases; natural language interfaces; real-time systems; Blitz; HTML documents; LaMeTH; START system; Web resources; World Wide Web; content-based system; heuristic rules; lexicon; natural language parsing; natural language query system; pictures; real time; symbols; text; Artificial intelligence; Data mining; HTML; Humans; Information analysis; Internet; Laboratories; Natural languages; Real time systems; Web sites;
Conference_Titel :
Multimedia Computing and Systems, 1999. IEEE International Conference on
Conference_Location :
Florence
Print_ISBN :
0-7695-0253-9
DOI :
10.1109/MMCS.1999.778343