• DocumentCode
    2699414
  • Title

    Building A Document Class Hierarchy for Obtaining More Proper Bibliographies from Web

  • Author

    Wang, Daling ; Yu, Ge ; Hu, Minghan ; Bao, Yubin ; Zhang, Meng

  • Author_Institution
    Sch. of Inf. Sci. & Eng., Northeastern Univ., Shenyang
  • fYear
    2005
  • fDate
    8-9 April 2005
  • Firstpage
    214
  • Lastpage
    219
  • Abstract
    In order for researchers in scientific and technological fields to find more proper information resources on Web, an auxiliary search structure is proposed, which is a class hierarchy of documents built based on the keywords of the documents. To cover the contents of the document properly, the keywords are extracted by means of mining maximal sequential frequent phrases. In this paper, the concept of maximal sequential frequent phrase is defined, and the corresponding mining algorithm is designed and implemented. The experiments show that keywords extraction using maximal sequential frequent phrase has better F-measure than that of using traditional TFIDF weight. Moreover, compared with previous works, our extended class hierarchy tree represents a relationship hierarchy either between keywords themselves or between keywords and documents, by which the queries on different professional levels can be supported
  • Keywords
    Internet; data mining; search engines; text analysis; TFIDF weight; World Wide Web; auxiliary search structure; bibliographies; document class hierarchy; document keywords; information resources; keyword extraction; maximal sequential frequent phrase mining; Algorithm design and analysis; Bibliographies; Books; Data mining; Information resources; Information science; Internet; Proposals; Search engines; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Information Retrieval and Integration, 2005. WIRI '05. Proceedings. International Workshop on Challenges in
  • Conference_Location
    Tokyo
  • Print_ISBN
    0-7695-2414-1
  • Type

    conf

  • DOI
    10.1109/WIRI.2005.13
  • Filename
    1553016