DocumentCode :
2754054
Title :
Fitting document representation to specific datasets by adjusting membership functions
Author :
Garcia-Plaza, A.P. ; Fresno, V. ; Martinez, Ricardo
Author_Institution :
NLP & IR Group, UNED, Madrid, Spain
fYear :
2012
fDate :
10-15 June 2012
Firstpage :
1
Lastpage :
8
Abstract :
In this work we deal with the problem of web page clustering from the point of view of document representation. Fuzzy ruled-based systems have been successfully used to represent web documents by means of heuristic combinations of criteria. In these systems, rules were established based on the way humans read documents and have been analyzed in previous works. However, membership functions parameters were fixed by default, assuming that any document would follow similar patterns regardless of the rest of documents in the collection. In this work we analyze to what extent collection information could be used to adjust the membership functions in order to improve document representation, and therefore, clustering results. We compare our proposal to the original one in which is based, and to another similar or common approaches. We also perform statistical significance tests to ensure that our modifications have a real effect over the original representation. Results show that adjusting document representation parameters to concrete collections leads to better clustering results.
Keywords :
Internet; document handling; fuzzy set theory; pattern clustering; statistical analysis; Web page clustering; concrete collections; document representation; fuzzy ruled-based systems; membership functions; membership functions parameters; statistical significance tests; Accuracy; Concrete; Fuzzy systems; Knowledge based systems; Standards; Tuning; Web pages; Clustering; Fuzzy Logic; Representation; Web Page;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems (FUZZ-IEEE), 2012 IEEE International Conference on
Conference_Location :
Brisbane, QLD
ISSN :
1098-7584
Print_ISBN :
978-1-4673-1507-4
Electronic_ISBN :
1098-7584
Type :
conf
DOI :
10.1109/FUZZ-IEEE.2012.6251249
Filename :
6251249
Link To Document :
بازگشت