Title :
A global rule induction approach to information extraction
Author :
Xiao, Jing ; Chua, Tat-Seng ; Liu, Jimin
Author_Institution :
Sch. of Comput., National Univ. of Singapore, Singapore
Abstract :
The ability to extract desired pieces of information from natural language texts is an important task with a growing number of potential applications. This paper presents a pattern rule induction learning system, GRID, which emphasizes on utilizing global feature distribution in all of the training instances in order to make better decision on rule induction. GRID incorporates features at lexical, syntactical and semantic levels simultaneously. It induces rules by adopting a combination of top-down and bottom-up approaches. The features chosen in GRID are general and they were applied successfully to both semi-structured text and free text. Our experimental results on some publicly available Webpage corpora and MUC-4 test set indicate that our approach is effective.
Keywords :
Web sites; inductive logic programming; information retrieval; learning (artificial intelligence); natural languages; text analysis; GRID; MUC-4 test set; Webpage corpora; bottom-up approach; free text; global feature distribution; global rule induction; information extraction; lexical level features; natural language texts; pattern rule induction learning system; semantic level features; semistructured text; syntactical level features; top-down approach; Data mining; IEEE news; Induction generators; Information resources; Information retrieval; Learning systems; Natural languages; Seminars; Testing; Text categorization;
Conference_Titel :
Tools with Artificial Intelligence, 2003. Proceedings. 15th IEEE International Conference on
Print_ISBN :
0-7695-2038-3
DOI :
10.1109/TAI.2003.1250236