DocumentCode :
3686549
Title :
Document clustering based on time series
Author :
Liviu Sebastian Matei;Ştefan Trăuşan-Matu
Author_Institution :
University Politehnica of Bucharest, Faculty of Automatic Control and Computer Science, Bucharest, Romania
fYear :
2015
Firstpage :
128
Lastpage :
133
Abstract :
This paper presents a novel document clustering algorithm that represents documents as a time series of words. Document clustering is very important due to the fact that it permits us to group them based on some certain criteria, especially nowadays when a large number of articles are available. The timed series representation of the document instead of the vector model permits us to consider a new algorithm for the computation of the distance between documents: dynamic time warping. This novel representation together with the dynamic time warping algorithm represents the foundation for computing the similarity and the clustering of the documents. The clustering algorithm used is hierarchical clustering. This novel clustering method of texts is applied on named entities and on the parts of speech of the words that compose the documents. As test data we are using the Reuters corpus of newspaper articles.
Keywords :
"Time series analysis","Clustering algorithms","Speech","Heuristic algorithms","Signal processing algorithms","Computational modeling","Algorithm design and analysis"
Publisher :
ieee
Conference_Titel :
System Theory, Control and Computing (ICSTCC), 2015 19th International Conference on
Type :
conf
DOI :
10.1109/ICSTCC.2015.7321281
Filename :
7321281
Link To Document :
بازگشت