مرکز منطقه ای اطلاع رساني علوم و فناوري - Research on Data Cleaning in Text Clustering

DocumentCode :

3075524

Title :

Research on Data Cleaning in Text Clustering

Author :

Yuhang, Zhang ; Yue, Wang ; Wei, Yang

Author_Institution :

Coll. of Technol. & Econ., Liaoning Tech. Univ.(LNTU), Fuxin, China

Volume :

fYear :

2010

fDate :

16-18 July 2010

Firstpage :

305

Lastpage :

307

Abstract :

The more reasonable method of data cleaning has been proposed according to situation that data cleaning mistake away words which have distinguish capacity in text clustering pre-treatment presently. This method considers the situation of new field words happening. For the problem of rare word filtering, consider both the importance degree of the word in the whole text collection, namely word frequency, and the importance in the text in which it appears, namely weightings. So this method avoids dividing it into existed category in order to achieve the goal of filtering comparatively accurately which make result of text clustering more precise. Text clustering is made by means of C-means algorithm at last and verifying this method improves the accuracy of text clustering result.

Keywords :

pattern classification; pattern clustering; text analysis; word processing; C-means algorithm; data cleaning; text clustering; word filtering; word frequency; Cleaning; Clustering algorithms; Dispersion; Equations; Filtering; Mathematical model; Vocabulary; data cleaning; text clustering; weighting; word frequency;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information Technology and Applications (IFITA), 2010 International Forum on

Conference_Location :

Kunming

Print_ISBN :

978-1-4244-7621-3

Electronic_ISBN :

978-1-4244-7622-0

Type :

conf

DOI :

10.1109/IFITA.2010.73

Filename :

5635068

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3075524