مرکز منطقه ای اطلاع رساني علوم و فناوري - ClRank: A Method for Keyword Extraction from Web Pages Using Clustering and Distribution of Nouns

DocumentCode :

3740087

Title :

ClRank: A Method for Keyword Extraction from Web Pages Using Clustering and Distribution of Nouns

Author :

Mohammad Rezaei;Najlah Gali; Fr?nti

Author_Institution :

Sch. of Comput., Univ. of Eastern Finland, Joensuu, Finland

Volume :

fYear :

2015

Firstpage :

Lastpage :

Abstract :

Text analysis of a web page is more difficult than the analysis of the text of normal document due to the presence of additional information, such as HTML structure, styling codes, irrelevant text, and presence of hyperlinks. In this paper, we propose an unsupervised method to extract keywords from a web page. The method extracts unigram nouns by applying part of speech tagging on the text. It then clusters the nouns based on their semantic similarity. It selects a number of keywords from the highest scored clusters. Experimental results show that our method outperforms state-of-the-art TextRank by 13 % in precision, 6 % in recall, and 10 % in F-measure.

Keywords :

"Web pages","Semantics","HTML","Tagging","Clustering algorithms","Mice","Speech"

Publisher :

ieee

Conference_Titel :

Web Intelligence and Intelligent Agent Technology (WI-IAT), 2015 IEEE / WIC / ACM International Conference on

Type :

conf

DOI :

10.1109/WI-IAT.2015.64

Filename :

7396783

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3740087