DocumentCode
2764708
Title
Document vector compression and its application in document clustering
Author
Fox, T.W.
Author_Institution
Intelligent Engines, Calgary Univ., Alta.
fYear
2005
fDate
1-4 May 2005
Firstpage
2029
Lastpage
2032
Abstract
Document clustering organizes documents into groups such that each group contains documents with similar content. The majority of document clustering algorithms require a vector representation for each document. Each vector has well over 10,000 elements. Consequently, the memory required during clustering can be extremely high when clustering hundreds of thousands of documents. This paper introduces document vector compression, which is based on the discrete cosine transform (DCT). Document vector compression reduces the run-time memory requirements by as much as 60%. Document vector compression does not degrade the final cluster quality (total F-measure) as does other document vector reduction techniques
Keywords
data compression; discrete cosine transforms; document image processing; image coding; image representation; DCT; discrete cosine transform; document clustering algorithms; document vector compression; run-time memory requirements; vector representation; Arithmetic; Clustering algorithms; Compaction; Data mining; Degradation; Discrete cosine transforms; Engines; Frequency; Information retrieval; Runtime;
fLanguage
English
Publisher
ieee
Conference_Titel
Electrical and Computer Engineering, 2005. Canadian Conference on
Conference_Location
Saskatoon, Sask.
ISSN
0840-7789
Print_ISBN
0-7803-8885-2
Type
conf
DOI
10.1109/CCECE.2005.1557384
Filename
1557384
Link To Document