Author_Institution :
GE Res. & Dev. Center, Schenectady, NY, USA
Abstract :
NLDB, a knowledge-based system that automatically categorizes news stories for dissemination, retrieval, and browsing, is discussed. The major knowledge-based component of NLDB is a lexicosemantic pattern matcher that identifies combinations of words and phrases, as well as more complex patterns. These include word roots, grammatical categories, and semantic structures, such as verbs describing classes of events. It is shown that this linguistic analysis outperforms statistical methods. Because building lexicosemantic patterns can be a laborious process, a set of statistical methods that automate pattern acquisition while preserving the benefits of a knowledge-based approach are developed.<>
Keywords :
indexing; information dissemination; information retrieval; information services; knowledge based systems; linguistics; NLDB; browsing; grammatical categories; information dissemination; information retrieval; knowledge-based news categorization; knowledge-based system; lexicosemantic pattern matcher; linguistic analysis; pattern acquisition; semantic structures; statistical methods; word roots; Communication industry; Costs; Databases; Humans; Information retrieval; Jacobian matrices; Natural language processing; Pattern matching; Research and development; Statistical analysis;