DocumentCode :
1957508
Title :
Exposure of illegal Web sites using conceptual fuzzy sets-based information filtering system
Author :
Shinmura, A. ; Taniguchi, Kazuhiro ; Kawahara, Kenji ; Takagi, Toshiyuki
Author_Institution :
Dept. of Comput. Sci., Meiji Univ., Kawasaki
fYear :
2002
fDate :
2002
Firstpage :
327
Lastpage :
332
Abstract :
Currently on the Internet, there exists a host of illegal Web sites which specialize in the distribution of commercial software and music. This paper proposes a method to distinguish illegal Web sites from legal ones not only by using TF-IDF (term frequency-inverse document frequency) values but also by recognizing the purpose/meaning of the Web sites. This is achieved by describing what are considered to be illegal sites and by judging whether the objective Web sites match the description of illegality. Conceptual fuzzy sets (CFSs) are used to describe the concept of illegal Web sites. First, we introduce the usefulness of CFSs in overcoming those problems, and propose the realization of CFSs using RBF (radial basis function)-like networks. In a CFS, the meaning of a concept is represented by the distribution of the activation values of the other nodes. Because the distribution changes depend on which labels are activated as a result of the conditions, the activations show a context-dependent meaning. Next, we propose the architecture of a filtering system. Finally, we compare the proposed method with the TF-IDF method with a support vector machine. The e-measures, as a total evaluation, indicate that the proposed system shows better results as compared to the TF-IDF method with the support vector machine.
Keywords :
Internet; computer crime; fuzzy set theory; information resources; information retrieval system evaluation; learning automata; online front-ends; radial basis function networks; relevance feedback; software architecture; Internet; TF-IDF values; Web site purpose; commercial software distribution; concept meaning; conceptual fuzzy sets; context-dependent meaning; e-measures; illegal Web sites; illegality description; information filtering system architecture; information retrieval evaluation; label activation; music distribution; node activation value distribution changes; radial basis function neural nets; support vector machine; term frequency-inverse document frequency values; Computer science; Distributed computing; Fuzzy sets; Fuzzy systems; Information filtering; Information filters; Internet; Radial basis function networks; Support vector machines; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Information Processing Society, 2002. Proceedings. NAFIPS. 2002 Annual Meeting of the North American
Print_ISBN :
0-7803-7461-4
Type :
conf
DOI :
10.1109/NAFIPS.2002.1018079
Filename :
1018079
Link To Document :
بازگشت