DocumentCode :
3740087
Title :
ClRank: A Method for Keyword Extraction from Web Pages Using Clustering and Distribution of Nouns
Author :
Mohammad Rezaei;Najlah Gali; Fr?nti
Author_Institution :
Sch. of Comput., Univ. of Eastern Finland, Joensuu, Finland
Volume :
1
fYear :
2015
Firstpage :
79
Lastpage :
84
Abstract :
Text analysis of a web page is more difficult than the analysis of the text of normal document due to the presence of additional information, such as HTML structure, styling codes, irrelevant text, and presence of hyperlinks. In this paper, we propose an unsupervised method to extract keywords from a web page. The method extracts unigram nouns by applying part of speech tagging on the text. It then clusters the nouns based on their semantic similarity. It selects a number of keywords from the highest scored clusters. Experimental results show that our method outperforms state-of-the-art TextRank by 13 % in precision, 6 % in recall, and 10 % in F-measure.
Keywords :
"Web pages","Semantics","HTML","Tagging","Clustering algorithms","Mice","Speech"
Publisher :
ieee
Conference_Titel :
Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015 IEEE / WIC / ACM International Conference on
Type :
conf
DOI :
10.1109/WI-IAT.2015.64
Filename :
7396783
Link To Document :
بازگشت