Title :
Finding Conceptual Document Clusters with Improved Top-N Formal Concept Search
Author :
Okubo, Yoshiaki ; Haraguchi, Makoto
Author_Institution :
Div. of Comput. Sci., Hokkaido Univ., Sapporo
Abstract :
In this paper, we discuss a method for conceptual clustering of documents. Our cluster is defined with the notion of formal concept analysis which can provide a conceptual meaning for each document cluster. Our clustering is formalized as a Top-N delta-valid formal concept problem. We improve our previous clique search-based algorithm for the problem so that it can be applied to larger scale datasets. For more efficient computation, we present some pruning rules based on theoretical properties of formal concepts. A depth-first branch-and-bound algorithm with the prunings is designed. Our experimental results show valuable clusters can be extracted from a collection of Web documents. Moreover, the algorithm outperforms some fast algorithms for mining closed itemsets equivalent to formal concepts
Keywords :
data analysis; data mining; document handling; tree searching; Web document extraction; Web mining; clique search-based algorithm; conceptual document clustering; depth-first branch-and-bound algorithm; formal concept search analysis; Algorithm design and analysis; Clustering algorithms; Clustering methods; Computer science; Data mining; Information science; Internet; Itemsets; Web pages; Web sites;
Conference_Titel :
Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2747-7