Title :
Improved algorithm for keywords extraction from documents without corpus
Author :
Chen, Jing ; Wu, Jianfeng
Author_Institution :
Inst. of Modern Ind. Design, Zhejiang Univ., Hangzhou, China
Abstract :
In this paper, an algorithm for extracting keywords without corpus is described. We use the co-occurrence information of the words and the biases of distribution to extract the more important words based on the most frequently appearing words so called reference words. Firstly, the most frequently terms are chosen from the document. Then due to keywords have a non-linear relationship with the set of frequently terms, the bias between words in documents and reference terms is measured. At last we prove that the algorithm is effective.
Keywords :
information retrieval; text analysis; bias distribution; co-occurrence information; documents keywords extraction; reference words; Algorithm design and analysis; Data mining; Distributed computing; Frequency; Indexing; Machine intelligence; Machinery; Mutual information; Probability distribution; Testing; Bias Distribution; Co-occurrence; Frequent Terms;
Conference_Titel :
Computer-Aided Industrial Design & Conceptual Design, 2009. CAID & CD 2009. IEEE 10th International Conference on
Conference_Location :
Wenzhou
Print_ISBN :
978-1-4244-5266-8
Electronic_ISBN :
978-1-4244-5268-2
DOI :
10.1109/CAIDCD.2009.5375325