Title :
Mixed Group Discovery: Incorporating Group Linkage with Alternatively Consistent Social Network Analysis
Author_Institution :
Coll. of Inf. Sci. & Technol., Pennsylvania State Univ., State College, PA, USA
Abstract :
Poor quality data exists widely in various database applications. In relational database, each entity is associated with a group of relational records. For two entities with similar identifiers, the records of one entity may be mistakenly combined into the group of the other. This is referred to as the mixed group problem. In this paper, we formulate the problem of discovering mixed groups and propose two unsupervised algorithms as solution. From the relational records, we observe that a group in one database is unlikely to be mixed in the same pattern as that of the same entity in another independent database. Also we find that the collaborative relationship between entities tend to be alternatively consistent over time. By investigating and applying these properties, we propose two mixed group discovery algorithms, as well as a generic model that covers various situations. Empirical experiments on both synthetic and real datasets from Citeseer and ACM digital libraries show that our algorithms can identify mixed groups with more than 70% precision and 80% recall, and the overall performance is significantly better than existing methods.
Keywords :
digital libraries; relational databases; social networking (online); unsupervised learning; ACM digital libraries; Citeseer digital libraries; group linkage; independent database; mixed group discovery algorithms; relational database; social network analysis; unsupervised algorithms; Algorithm design and analysis; Clustering algorithms; Collaboration; Complexity theory; Couplings; Databases; Social network services;
Conference_Titel :
Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on
Conference_Location :
Pittsburgh, PA
Print_ISBN :
978-1-4244-7912-2
Electronic_ISBN :
978-0-7695-4154-9
DOI :
10.1109/ICSC.2010.26