Automatic and Adaptive Clusters for Information Extraction

Author

Charulatha, B.S. ; Rodrigues, Paul ; Chitralekha, T.

Author_Institution

JNTUK, Kakinada, India

fYear

2014

Firstpage

60

Lastpage

63

Abstract

The web pages are heterogeneous and unstructured. The heterogeneity is due to the hybrid nature of the documents. The unstructureness is due to either multilingual or multimedia content in the web page. The mining should be independent of the language and software. The objective is when any data or content mining is done on a set of data is chosen to form the basis as done with keywords. If the base data is chosen arbitrarily, it is automatic, whereas some ´knowledge´ or ´background´ is put in the choice it is adaptive. Statistical features of the images are extracted from the pixel map of the image. The extracted features are presented to clustering algorithms, Fuzzy C Means and Subtractive clustering algorithm. The algorithm classifies the given image as a text or image representation. The accuracy of classification is compared and presented.

Keywords

Internet; data mining; feature extraction; fuzzy set theory; image classification; image representation; pattern clustering; statistical analysis; Web pages; adaptive clusters; automatic clusters; content mining; data mining; fuzzy C means algorithm; image classification; image pixel map; image representation; image statistical feature extraction; information extraction; multilingual content; multimedia content; subtractive clustering algorithm; Accuracy; Classification algorithms; Clustering algorithms; Data mining; Feature extraction; Image representation; Web pages; Fuzzy c means; clustering; heterogeneous; multimedia; statistical features; subtractive clustering accuracy; unstructured;

fLanguage

English

Publisher

ieee

Conference_Titel

Soft Computing and Machine Intelligence (ISCMI), 2014 International Conference on

Type

conf

DOI

10.1109/ISCMI.2014.29

Filename

7079355