Author_Institution :
Dept. of Comput. Sci., Univ. of British Columbia, Vancouver, BC
Abstract :
Name variants are ubiquitous in real world due typographical errors (e.g., "Forschungszentrum Julich" vs. "Forschungszentrum Julich"), abbreviated, imcomplete, or missing information (e.g., "R. E. Ellis" vs. "Randy E. Ellis"), lack of standard name formatting convention (e.g., "Spike Jonze" vs. "Jones, Spike"), and their combinations. In this paper, we project this name disambiguation problem to graph representation, and then analyze graphs using social network analysis. In particular, we used real duplicate name entities that we manually verifed from ACM digital library. Then, using various string similarity metrics and additional information (i.e., co-author names, titles, and venues), we analyze the effectiveness of string similarity metrics and additional information based on social network analysis. Through our experimental validation, name disambiguation problem can be analyzed in graphical, visual manner.
Keywords :
digital libraries; graph theory; network theory (graphs); string matching; ACM digital library; graph representation; name disambiguation problem; name variants; social network analysis; string similarity metrics; Computer errors; Computer science; Databases; Erbium; Information analysis; Information technology; Portals; Search problems; Social network services; Software libraries; Name Disambiguation; Social Networks; String Similarity Metrics;