DocumentCode :
2546184
Title :
A Real-Time Categorization and Clustering Method for Text Data of Laws and Regulations
Author :
Su, Bianping ; Wang, Rong ; Wang, Yiping
fYear :
2010
fDate :
23-25 Sept. 2010
Firstpage :
1
Lastpage :
4
Abstract :
Taking the features of data in low and high frequency texts and the frequencies which such features emerge in a single text into consideration, the paper sets up a vector space model for part of texts of field. Then the paper also establishes a classifying and clustering method with features of classification and clustering by designing and constructing the two-dimensional analytic indexes of similarities and differences between field texts. This method is designed for field texts because it is quite suitable for effective machine learning and can extend the text data and textual categories dynamically in real-time. Meantime, it solves the single-label classification and multi-label classification issues at one time, overcoming the defects of previous text classifying methods which can only expand data instead of capacity. The general text clustering methods have many defects: they are not suitable for high-dimensional data sets or large data sets; they don´t have the text data and category expanding function and they can not handle the outlier data problem well. On the contrary, this new method can offer solutions to these defects. According to this method, the corresponding algorithm has been established and the effectiveness of the method has been proven by the experiment on the data sets of laws and regulations of construction industry in Shaanxi Province in China.
Keywords :
classification; law administration; learning (artificial intelligence); legislation; pattern clustering; public information systems; text analysis; construction industry; high-dimensional data sets; large data sets; laws; machine learning; multilabel classification; real-time categorization; regulations; single-label classification; text classifying methods; text clustering methods; text data; textual category; two-dimensional analytic indexes; vector space model; Clustering algorithms; Clustering methods; Feature extraction; Real time systems; Support vector machine classification; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Wireless Communications Networking and Mobile Computing (WiCOM), 2010 6th International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4244-3708-5
Electronic_ISBN :
978-1-4244-3709-2
Type :
conf
DOI :
10.1109/WICOM.2010.5600178
Filename :
5600178
Link To Document :
بازگشت