Title of article :
Document clustering using the LSI subspace signature model
Author/Authors :
W.Z. Zhu، نويسنده , , R.B. Allen، نويسنده ,
Issue Information :
ماهنامه با شماره پیاپی سال 2013
Pages :
17
From page :
844
To page :
860
Abstract :
We describe the latent semantic indexing subspace signature model (LSISSM) for semantic content representation of unstructured text. Grounded on singular value decomposition, the model represents terms and documents by the distribution signatures of their statistical contribution across the top-ranking latent concept dimensions. LSISSM matches term signatures with document signatures according to their mapping coherence between latent semantic indexing (LSI) term subspace and LSI document subspace. LSISSM does feature reduction and finds a low-rank approximation of scalable and sparse term-document matrices. Experiments demonstrate that this approach significantly improves the performance of major clustering algorithms such as standard K-means and self-organizing maps compared with the vector space model and the traditional LSI model. The unique contribution ranking mechanism in LSISSM also improves the initialization of standard K-means compared with random seeding procedure, which sometimes causes low efficiency and effectiveness of clustering. A two-stage initialization strategy based on LSISSM significantly reduces the running time of standard K-means procedures.
Keywords :
text mining , Knowledge representation , automatic classification
Journal title :
Journal of the American Society for Information Science and Technology
Serial Year :
2013
Journal title :
Journal of the American Society for Information Science and Technology
Record number :
994847
Link To Document :
بازگشت