Title :
Improved algorithms for exact and approximate boolean matrix decomposition
Author :
Yuan Sun;Shiwei Ye;Yi Sun;Tsunehiko Kameda
Author_Institution :
Information and Society Research Division, National Institute of Informatics, Tokyo, Japan
Abstract :
An arbitrary m×n Boolean matrix M can be decomposed exactly as M = UοV, where U (resp. V) is an m×k (resp. k ×n) Boolean matrix and ο denotes the Boolean matrix multiplication operator. We first prove an exact formula for the Boolean matrix J such that M = MοJT holds, where J is maximal in the sense that if any 0 element in J is changed to a 1 then this equality no longer holds. Since minimizing k is NP-hard, we propose two heuristic algorithms for finding suboptimal but good decomposition. We measure the performance (in minimizing k) of our algorithms on several real datasets in comparison with other representative heuristic algorithms for Boolean matrix decomposition (BMD). The results on some popular benchmark datasets demonstrate that one of our proposed algorithms performs as well or better on most of them. Our algorithms have a number of other advantages: They are based on exact mathematical formula, which can be interpreted intuitively. They can be used for approximation as well with competitive “coverage.” Last but not least, they also run very fast. Due to interpretability issues in data mining, we impose the condition, called the “column use condition,” that the columns of the factor matrix U must form a subset of the columns of M. In educational databases, the “ideal item response matrix” R, the “knowledge state matrix” A and the “Q-matrix” Q play important roles. We show that they are related exactly by R̅ = A ̅ο QT. Thus, given R, we can find A and Q with a small number (k) of “knowledge states,” using our exact BMD heuristics.
Keywords :
"Matrix decomposition","Approximation algorithms","Heuristic algorithms","Approximation methods","Data mining","Algorithm design and analysis"
Conference_Titel :
Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on
Print_ISBN :
978-1-4673-8272-4
DOI :
10.1109/DSAA.2015.7344813