Building A Document Class Hierarchy for Obtaining More Proper Bibliographies from Web

Author

Wang, Daling ; Yu, Ge ; Hu, Minghan ; Bao, Yubin ; Zhang, Meng

Author_Institution

Sch. of Inf. Sci. & Eng., Northeastern Univ., Shenyang

fYear

2005

fDate

8-9 April 2005

Firstpage

214

Lastpage

219

Abstract

In order for researchers in scientific and technological fields to find more proper information resources on Web, an auxiliary search structure is proposed, which is a class hierarchy of documents built based on the keywords of the documents. To cover the contents of the document properly, the keywords are extracted by means of mining maximal sequential frequent phrases. In this paper, the concept of maximal sequential frequent phrase is defined, and the corresponding mining algorithm is designed and implemented. The experiments show that keywords extraction using maximal sequential frequent phrase has better F-measure than that of using traditional TFIDF weight. Moreover, compared with previous works, our extended class hierarchy tree represents a relationship hierarchy either between keywords themselves or between keywords and documents, by which the queries on different professional levels can be supported

Keywords

Internet; data mining; search engines; text analysis; TFIDF weight; World Wide Web; auxiliary search structure; bibliographies; document class hierarchy; document keywords; information resources; keyword extraction; maximal sequential frequent phrase mining; Algorithm design and analysis; Bibliographies; Books; Data mining; Information resources; Information science; Internet; Proposals; Search engines; Writing;

fLanguage

English

Publisher

ieee

Conference_Titel

Web Information Retrieval and Integration, 2005. WIRI '05. Proceedings. International Workshop on Challenges in

Conference_Location

Tokyo

Print_ISBN

0-7695-2414-1

Type

conf

DOI

10.1109/WIRI.2005.13

Filename

1553016