Title :
High Speed Document Clustering in Reconfigurable Hardware
Author :
Covington, G. Adam ; Comstock, Charles L G ; Levine, Andrew A. ; Lockwood, John W. ; Cho, Young H.
Author_Institution :
Washington Univ., St. Louis, MO
Abstract :
High-performance document clustering systems enable similar documents to be automatically organized into groups. In the past, the large amount of computational time needed to cluster documents prevented practical use of such systems with a large number of documents. A full hardware implementation of the K-means clustering algorithm has been designed and implemented in reconfigurable hardware that clusters 512k documents rapidly. This implementation, uses four parallel cosine distance metrics to cluster document vectors that each have 4000 dimensions. The synthesized hardware runs on the field programmable port extender (FPX) platform at a clock rate of 80 MHz. Although the clock rate on the Xilinx VirtexE 2000 is slower than a CPU, the implementation runs 26 times faster than an algorithmically equivalent software algorithm running on an Intel 3.60 GHz Xeon. The same architecture was used to synthesize a faster and larger design for the Xilinx Virtex4 LX200. This larger implementation can contain up to 25 parallel cosine distance metrics. The implementation synthesized with a clock rate of 250 MHz and outperforms the equivalent software by a factor of 328
Keywords :
document handling; field programmable gate arrays; indexing; microprocessor chips; parallel algorithms; pattern clustering; reconfigurable architectures; 250 MHz; 80 MHz; CPU; Intel Xeon; K-means clustering algorithm; Xilinx Virtex4 LX200; Xilinx VirtexE 2000; document clustering systems; document vectors; field programmable port extender platform; parallel cosine distance metrics; reconfigurable hardware; software algorithm; Algorithm design and analysis; Application specific integrated circuits; Central Processing Unit; Clocks; Clustering algorithms; Computer architecture; Field programmable gate arrays; Hardware; Laboratories; Software algorithms;
Conference_Titel :
Field Programmable Logic and Applications, 2006. FPL '06. International Conference on
Conference_Location :
Madrid
Print_ISBN :
1-4244-0312-X
DOI :
10.1109/FPL.2006.311245