DocumentCode :
3001953
Title :
Improved algorithm for keywords extraction from documents without corpus
Author :
Chen, Jing ; Wu, Jianfeng
Author_Institution :
Inst. of Modern Ind. Design, Zhejiang Univ., Hangzhou, China
fYear :
2009
fDate :
26-29 Nov. 2009
Firstpage :
2339
Lastpage :
2341
Abstract :
In this paper, an algorithm for extracting keywords without corpus is described. We use the co-occurrence information of the words and the biases of distribution to extract the more important words based on the most frequently appearing words so called reference words. Firstly, the most frequently terms are chosen from the document. Then due to keywords have a non-linear relationship with the set of frequently terms, the bias between words in documents and reference terms is measured. At last we prove that the algorithm is effective.
Keywords :
information retrieval; text analysis; bias distribution; co-occurrence information; documents keywords extraction; reference words; Algorithm design and analysis; Data mining; Distributed computing; Frequency; Indexing; Machine intelligence; Machinery; Mutual information; Probability distribution; Testing; Bias Distribution; Co-occurrence; Frequent Terms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer-Aided Industrial Design & Conceptual Design, 2009. CAID & CD 2009. IEEE 10th International Conference on
Conference_Location :
Wenzhou
Print_ISBN :
978-1-4244-5266-8
Electronic_ISBN :
978-1-4244-5268-2
Type :
conf
DOI :
10.1109/CAIDCD.2009.5375325
Filename :
5375325
Link To Document :
بازگشت