DocumentCode :
2131461
Title :
Keyword Extraction Based on Lexical Chains and Word Co-occurrence for Chinese News Web Pages
Author :
Li, Xinghua ; Wu, Xindong ; Hu, Xuegang ; Xie, Fei ; Jiang, Zhaozhong
Author_Institution :
Sch. of Comput. Sci. & Inf. Eng., Hefei Univ. of Technol., Hefei
fYear :
2008
fDate :
15-19 Dec. 2008
Firstpage :
744
Lastpage :
751
Abstract :
This paper presents a new keyword extraction algorithm for Chinese news Web pages using lexical chains and word co-occurrence combined with frequency features, cohesion features, and corelation features. A lexical chain is an external performance consistency by semantically related words of a text, and is the representation of the semantic content of a portion of the text. Word co-occurrence distribution is an important statistical model widely used in natural language processing that reflects the correlation of the words. Lexical chains and word co-occurrence are combined in this paper to extract keywords for Chinese news Web pages in our proposed algorithm KELCC. This algorithm is not domain-specific and can be applied to a single Web page without corpus. Experiments on randomly selected Web pages have been performed to demonstrate the quality of the keywords extracted by our proposed algorithm.
Keywords :
natural language processing; statistical analysis; text analysis; Chinese news Web page; cohesion features; corelation features; frequency features; keyword extraction; lexical chain; natural language processing; statistical model; word cooccurrence; word correlation; Computer science; Conferences; Data engineering; Data mining; Frequency; Machine learning; Machine learning algorithms; Thesauri; USA Councils; Web pages; keyword extraction; lexical chains; semantic similarity; word sense disambiguation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3503-6
Electronic_ISBN :
978-0-7695-3503-6
Type :
conf
DOI :
10.1109/ICDMW.2008.122
Filename :
4734002
Link To Document :
بازگشت