مرکز منطقه ای اطلاع رساني علوم و فناوري - Improved algorithm for keywords extraction from documents without corpus

DocumentCode :

3001953

Title :

Improved algorithm for keywords extraction from documents without corpus

Author :

Chen, Jing ; Wu, Jianfeng

Author_Institution :

Inst. of Modern Ind. Design, Zhejiang Univ., Hangzhou, China

fYear :

2009

fDate :

26-29 Nov. 2009

Firstpage :

2339

Lastpage :

2341

Abstract :

In this paper, an algorithm for extracting keywords without corpus is described. We use the co-occurrence information of the words and the biases of distribution to extract the more important words based on the most frequently appearing words so called reference words. Firstly, the most frequently terms are chosen from the document. Then due to keywords have a non-linear relationship with the set of frequently terms, the bias between words in documents and reference terms is measured. At last we prove that the algorithm is effective.

Keywords :

information retrieval; text analysis; bias distribution; co-occurrence information; documents keywords extraction; reference words; Algorithm design and analysis; Data mining; Distributed computing; Frequency; Indexing; Machine intelligence; Machinery; Mutual information; Probability distribution; Testing; Bias Distribution; Co-occurrence; Frequent Terms;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer-Aided Industrial Design & Conceptual Design, 2009. CAID & CD 2009. IEEE 10th International Conference on

Conference_Location :

Wenzhou

Print_ISBN :

978-1-4244-5266-8

Electronic_ISBN :

978-1-4244-5268-2

Type :

conf

DOI :

10.1109/CAIDCD.2009.5375325

Filename :

5375325

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3001953