Author :
Ubi, Jaan ; Ubi, Evald ; Liiv, Innar ; Vohandu, Leo
Author_Institution :
Dept. of Inf., Tallinn Univ. of Technol., Tallinn, Estonia
Abstract :
The goal of the paper is to study how the strictly optimal solutions of community detection, based on similarity matrices, depend on the parameter of the distance threshold setting method, applied beforehand. In order to detect communities, we apply the oft-used modularity metric and arrive at strict optimality by linear programming, solving an np-hard problem. The distance threshold method is used, making the matrix more and more sparse, and thus the best value of the threshold is determined, by analyzing the number of subsequent clusters detected. Our method is applied on educational coopetition data in the business school of TUT, with four specializations, out of which we sample 36 students, each time selecting from a pair of specializations. Since the optimal number of clusters tends to be four, for any two-fold sampling, we detect a natural division within each specialization as well, the reason for which is a matter for further study. As a result, coopetition - the simultaneous competition and cooperation - is measured between the departments of the business school. The average grade of the students is a proxy for the competitive score of the department. The traditional conductance is used as a proxy for the cooperative score of the department. For our data, the optimal value for the threshold in community detection is 0.07, this way enough noise has been removed from the data, but not too many values, so that vital information is retained. Thus, we most often obtain our goal of detecting four clusters, in the two-fold sampling, effectively displaying the usefulness of fine-tuning the distance threshold while evaluating it by the strictly optimal community detection.
Keywords :
educational institutions; linear programming; management education; network theory (graphs); sampling methods; sparse matrices; NP-hard problem; TUT; business school; community structure detection analysis; competitive score; distance threshold setting method; educational coopetition data; linear programming; oft-used modularity metric; optimal community detection; similarity matrices; sparse matrix; subsequent clusters; two-fold sampling; Business; Communities; Educational institutions; Finance; Noise; Optimization; Sparse matrices; coopetition; education; linear programming; metrics; social network analysis; threshold method;