DocumentCode :
1850051
Title :
Domain-specific keyphrase extraction and near-duplicate article detection based on ontology
Author :
Nhon Do ; LongVan Ho
Author_Institution :
Dept. of Comput. Sci., Univ. of Inf. Technol. - VNU HCMC, Ho Chi Minh City, Vietnam
fYear :
2015
fDate :
25-28 Jan. 2015
Firstpage :
123
Lastpage :
126
Abstract :
The significant increase in number of the online newspapers has given web users a giant information source. The users are really difficult to manage content as well as check the correctness of articles. In this paper, we introduce algorithms of extracting keyphrase and matching signatures for near-duplicate articles detection. Based on ontology, keyphrases of articles are extracted automatically and similarity of two articles is calculated by using extracted keyphrases. Algorithms are applied on Vietnamese online newspapers for Labor & Employment. Experimental results show that our proposed methods are effective.
Keywords :
feature extraction; ontologies (artificial intelligence); text analysis; text detection; domain-specific keyphrase extraction; near-duplicate article detection; ontology; Algorithm design and analysis; Data mining; Educational institutions; Employment; Information technology; Ontologies; Semantics; document retrieval system; keyphrase extraction; near-duplicate detection; ontology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computing & Communication Technologies - Research, Innovation, and Vision for the Future (RIVF), 2015 IEEE RIVF International Conference on
Conference_Location :
Can Tho
Print_ISBN :
978-1-4799-8043-7
Type :
conf
DOI :
10.1109/RIVF.2015.7049886
Filename :
7049886
Link To Document :
بازگشت