Title :
Extraction of salient textual patterns: synergy between lexical cohesion and contextual coherence
Author :
Chan, Samuel W K
Author_Institution :
Dept. of Decision Sci., Chinese Univ. of Hong Kong, China
fDate :
3/1/2004 12:00:00 AM
Abstract :
Most current information retrieval systems rely solely on lexical item repetition, which is notorious for its vulnerability. In this research, we propose a novel method for the extraction of salient textual patterns. One of our major objectives is to move away from keywords and their associated limitations in textual information retrieval. How individual sentences in text fit together to be perceived as a salient pattern is identified. A text network that exhibits textual continuity, arising from a connectionist model, is described. The network facilitates a dynamic extraction of salient textual segments by capturing semantics from two different categories of natural language, namely lexical cohesion and contextual coherence. We also present the results of an empirical study designed to compare our model with the performance of human judges in the identification of salient textual patterns. The preliminary results show that our model has the potential for automatic salient patterns discovery in text.
Keywords :
feature extraction; information retrieval systems; knowledge acquisition; natural languages; text analysis; connectionist model; contextual coherence; information retrieval system; knowledge extraction; lexical cohesion; natural language; pattern extraction; semantic relatedness; text continuity; textual information retrieval; textual patterns; Artificial intelligence; Coherence; Councils; Data mining; Humans; Information analysis; Information retrieval; Knowledge engineering; Natural languages; Psychology;
Journal_Title :
Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on
DOI :
10.1109/TSMCA.2003.820570