• DocumentCode
    2632707
  • Title

    Mining the URLs: An Approach to Measure the Similarities between Named-Entities

  • Author

    Liu, Hui ; Zhao, Jinglei ; Lu, Ruzhan

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai
  • fYear
    2008
  • fDate
    18-20 June 2008
  • Firstpage
    75
  • Lastpage
    75
  • Abstract
    Measuring the similarity between named-entities is a foundation work for a number of practical applications, such as information extraction, query expansion, etc. In this paper the authors study the similarity measure between two named-entities. Especially, the authors are interested in fine-grained similarity differences between named-entities in one class, such as "novelist". Different from previous works on named-entity associations, this paper suggests a novel Web mining method that solely depends on the URLs returned by a search engine using named-entities as queries. The problem of similarity between two namedentities is converted to that of similarity of two URL sets. Evaluations show that this method achieves good results under two experiments.
  • Keywords
    Internet; data mining; query processing; search engines; URL; Web mining method; information extraction; named-entities; query expansion; search engine; similarity measure; Application software; Computer science; Data mining; Information analysis; Natural language processing; Pattern analysis; Search engines; Taxonomy; Uniform resource locators; Web mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Innovative Computing Information and Control, 2008. ICICIC '08. 3rd International Conference on
  • Conference_Location
    Dalian, Liaoning
  • Print_ISBN
    978-0-7695-3161-8
  • Electronic_ISBN
    978-0-7695-3161-8
  • Type

    conf

  • DOI
    10.1109/ICICIC.2008.362
  • Filename
    4603264