DocumentCode
3585194
Title
Automatic and Adaptive Clusters for Information Extraction
Author
Charulatha, B.S. ; Rodrigues, Paul ; Chitralekha, T.
Author_Institution
JNTUK, Kakinada, India
fYear
2014
Firstpage
60
Lastpage
63
Abstract
The web pages are heterogeneous and unstructured. The heterogeneity is due to the hybrid nature of the documents. The unstructureness is due to either multilingual or multimedia content in the web page. The mining should be independent of the language and software. The objective is when any data or content mining is done on a set of data is chosen to form the basis as done with keywords. If the base data is chosen arbitrarily, it is automatic, whereas some ´knowledge´ or ´background´ is put in the choice it is adaptive. Statistical features of the images are extracted from the pixel map of the image. The extracted features are presented to clustering algorithms, Fuzzy C Means and Subtractive clustering algorithm. The algorithm classifies the given image as a text or image representation. The accuracy of classification is compared and presented.
Keywords
Internet; data mining; feature extraction; fuzzy set theory; image classification; image representation; pattern clustering; statistical analysis; Web pages; adaptive clusters; automatic clusters; content mining; data mining; fuzzy C means algorithm; image classification; image pixel map; image representation; image statistical feature extraction; information extraction; multilingual content; multimedia content; subtractive clustering algorithm; Accuracy; Classification algorithms; Clustering algorithms; Data mining; Feature extraction; Image representation; Web pages; Fuzzy c means; clustering; heterogeneous; multimedia; statistical features; subtractive clustering accuracy; unstructured;
fLanguage
English
Publisher
ieee
Conference_Titel
Soft Computing and Machine Intelligence (ISCMI), 2014 International Conference on
Type
conf
DOI
10.1109/ISCMI.2014.29
Filename
7079355
Link To Document