DocumentCode :
1087001
Title :
A Clustering-Based Approach for Integrating Document-Category Hierarchies
Author :
Cheng, Tsang-Hsiang ; Wei, Chih-Ping
Author_Institution :
Southern Taiwan Univ., Tainan
Volume :
38
Issue :
2
fYear :
2008
fDate :
3/1/2008 12:00:00 AM
Firstpage :
410
Lastpage :
424
Abstract :
E-commerce applications generate and consume a tremendous amount of online information, which is typically available as textual documents. Conceivably, organizations and individuals generally use category sets or hierarchies to organize, archive, and access their documents. Meanwhile, organizations and individuals constantly acquire relevant documents from various Internet sources, each of which may organize its documents in a category set or hierarchy different from that used by the acquiring organization or individual. Consequently, the integration of source documents organized in a category hierarchy into an existing category hierarchy deployed by the acquiring organization or individual becomes an important issue in the e-commerce era. Existing category-integration techniques are mainly designed to integrate document catalogs, each of which is organized nonhierarchically (i.e., in a flat set). In this paper, we propose a clustering-based category-hierarchy integration (CHI) technique, which is an extension of the clustering-based category-integration (CCI) technique. Our empirical evaluation results show that the proposed CHI technique appears to improve the effectiveness of category-hierarchy integration compared with that attained by nonhierarchical category-integration techniques, particularly in homogeneous and comparable scenarios.
Keywords :
electronic commerce; pattern clustering; text analysis; Internet; e-commerce application; textual document category hierarchy integration; Catalogs; Educational institutions; Finance; Helium; Internet; Radiofrequency interference; Taxonomy; Technology management; Text mining; Web sites; Category-hierarchy integration; document clustering; document management; document-category integration; taxonomy integration; text mining;
fLanguage :
English
Journal_Title :
Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on
Publisher :
ieee
ISSN :
1083-4427
Type :
jour
DOI :
10.1109/TSMCA.2007.914758
Filename :
4459764
Link To Document :
بازگشت