مرکز منطقه ای اطلاع رساني علوم و فناوري - The Hot Keyphrase Extraction Based on TF*PDF

DocumentCode :

2901561

Title :

The Hot Keyphrase Extraction Based on TF*PDF

Author :

Yan Gao ; Liu, Yan Gao Jin ; Ma, PeiXun

Author_Institution :

Coll. of Inf. Sci. & Eng., Central South Univ., Changsha, China

fYear :

2011

fDate :

16-18 Nov. 2011

Firstpage :

1524

Lastpage :

1528

Abstract :

Keyphrase consisting of several words is viewed as the phrase that represent the topic and the content of the whole text. Extracting keyphrase is a good way to detect hot topics and tracking topics from news report. In this paper, a two-step keyphrase extraction method based on TF*PDF is proposed. In the first step, the position-weighted IT*PDF algorithm is proposed to obtain candidate hot term list and the bursty value of term is used to filter the noise in the list. In the second step, a phrase identification process combines hot terms into phrases using position information, frequency information etc. At last the position-weighted TF*PDF algorithm are also used to weight the phrase, and the top k phrases are chosen as hot keyphrases. The experiments on the real web data indicate that our extraction method provides solutions with improved quality.

Keywords :

Internet; information retrieval; text analysis; TF-PDF algorithm; hot keyphrase extraction; hot topic detection; inverse document frequency; news report; phrase identification process; position-weighted IT-PDF algorithm; term frequency; tracking topic detection; two-step keyphrase extraction method; Conferences; Data mining; Event detection; Feature extraction; Internet; Noise; Presses; TDT; TF*PDF; bursty value; keyphrase extraction;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Trust, Security and Privacy in Computing and Communications (TrustCom), 2011 IEEE 10th International Conference on

Conference_Location :

Changsha

Print_ISBN :

978-1-4577-2135-9

Type :

conf

DOI :

10.1109/TrustCom.2011.211

Filename :

6121007

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2901561