• DocumentCode
    2068406
  • Title

    A method for person name disambiguation based on Baidu Encyclopedia

  • Author

    Li, Xinfu ; Cao, Wenxue

  • Author_Institution
    Key Lab. of Machine Learning & Comput. Intell., Hebei Univ., Baoding, China
  • fYear
    2011
  • fDate
    16-18 Dec. 2011
  • Firstpage
    423
  • Lastpage
    426
  • Abstract
    The phenomenon of person name ambiguity is widespread on web pages in that one name may be used by different people. It is important to uniquely identify the given person on the web. In this paper, the method Baidu-PND is proposed by the authors. It is an unsupervised name disambiguation method based on Baidu Encyclopedia. We extract three features including background knowledge, contextual feature and Related-Set of the characters from the online Baidu Encyclopedia. The weights of the features are studied by logistic regression algorithm. Then we make a linear fusion of the features. The maximum combined value is selected as the correct person on web pages. Experiments are conducted to measure the performance of Baidu-PND, which show that the performance is higher than we expected, validating its feasibility and effectiveness for person name disambiguation on web pages. And, Baidu-PND is a new method for knowledge mining based on Baidu Encyclopedia.
  • Keywords
    Internet; encyclopaedias; feature extraction; natural language processing; regression analysis; search engines; Baidu Encyclopedia; Baidu-PND method; Web pages; feature extraction; logistic regression algorithm; person name disambiguation; unsupervised name disambiguation method; Accuracy; Context; Educational institutions; Encyclopedias; Feature extraction; Physics; Web pages; Baidu Encyclopedia; person name disambiguation; unsupervised learning; web mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Transportation, Mechanical, and Electrical Engineering (TMEE), 2011 International Conference on
  • Conference_Location
    Changchun
  • Print_ISBN
    978-1-4577-1700-0
  • Type

    conf

  • DOI
    10.1109/TMEE.2011.6199232
  • Filename
    6199232