• DocumentCode
    3507803
  • Title

    High Speed Document Clustering in Reconfigurable Hardware

  • Author

    Covington, G. Adam ; Comstock, Charles L G ; Levine, Andrew A. ; Lockwood, John W. ; Cho, Young H.

  • Author_Institution
    Washington Univ., St. Louis, MO
  • fYear
    2006
  • fDate
    28-30 Aug. 2006
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    High-performance document clustering systems enable similar documents to be automatically organized into groups. In the past, the large amount of computational time needed to cluster documents prevented practical use of such systems with a large number of documents. A full hardware implementation of the K-means clustering algorithm has been designed and implemented in reconfigurable hardware that clusters 512k documents rapidly. This implementation, uses four parallel cosine distance metrics to cluster document vectors that each have 4000 dimensions. The synthesized hardware runs on the field programmable port extender (FPX) platform at a clock rate of 80 MHz. Although the clock rate on the Xilinx VirtexE 2000 is slower than a CPU, the implementation runs 26 times faster than an algorithmically equivalent software algorithm running on an Intel 3.60 GHz Xeon. The same architecture was used to synthesize a faster and larger design for the Xilinx Virtex4 LX200. This larger implementation can contain up to 25 parallel cosine distance metrics. The implementation synthesized with a clock rate of 250 MHz and outperforms the equivalent software by a factor of 328
  • Keywords
    document handling; field programmable gate arrays; indexing; microprocessor chips; parallel algorithms; pattern clustering; reconfigurable architectures; 250 MHz; 80 MHz; CPU; Intel Xeon; K-means clustering algorithm; Xilinx Virtex4 LX200; Xilinx VirtexE 2000; document clustering systems; document vectors; field programmable port extender platform; parallel cosine distance metrics; reconfigurable hardware; software algorithm; Algorithm design and analysis; Application specific integrated circuits; Central Processing Unit; Clocks; Clustering algorithms; Computer architecture; Field programmable gate arrays; Hardware; Laboratories; Software algorithms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Field Programmable Logic and Applications, 2006. FPL '06. International Conference on
  • Conference_Location
    Madrid
  • Print_ISBN
    1-4244-0312-X
  • Type

    conf

  • DOI
    10.1109/FPL.2006.311245
  • Filename
    4101007