DocumentCode :
3601361
Title :
Discovering Latent Semantics in Web Documents Using Fuzzy Clustering
Author :
I-Jen Chiang ; Liu, Charles Chih-Ho ; Yi-Hsin Tsai ; Kumar, Ajit
Author_Institution :
Grad. Inst. of Biomed. Inf., Taipei Med. Univ., Taipei, Taiwan
Volume :
23
Issue :
6
fYear :
2015
Firstpage :
2122
Lastpage :
2134
Abstract :
Web documents are heterogeneous and complex. There exists complicated associations within one web document and linking to the others. The high interactions between terms in documents demonstrate vague and ambiguous meanings. Efficient and effective clustering methods to discover latent and coherent meanings in context are necessary. This paper presents a fuzzy linguistic topological space along with a fuzzy clustering algorithm to discover the contextual meaning in the web documents. The proposed algorithm extracts features from the web documents using conditional random field methods and builds a fuzzy linguistic topological space based on the associations of features. The associations of cooccurring features organize a hierarchy of connected semantic complexes called “CONCEPTS,” wherein a fuzzy linguistic measure is applied on each complex to evaluate 1) the relevance of a document belonging to a topic, and 2) the difference between the other topics. Web contents are able to be clustered into topics in the hierarchy depending on their fuzzy linguistic measures; web users can further explore the CONCEPTS of web contents accordingly. Besides the algorithm applicability in web text domains, it can be extended to other applications, such as data mining, bioinformatics, content-based, or collaborative information filtering, etc.
Keywords :
computational linguistics; document handling; feature extraction; fuzzy set theory; random processes; semantic Web; CONCEPTS; Web documents; conditional random field methods; connected semantic complexes; feature extraction; fuzzy clustering; fuzzy linguistic topological space; latent semantics; Clustering algorithms; Context; Data mining; Feature extraction; Neural networks; Pragmatics; Semantics; Fuzzy aggregation algorithm; fuzzy aggregation algorithm; fuzzy linguistic topological space; fuzzy semantic topology; fuzzy web hierarchical clustering; named entity recognition; named entity recognition (NER);
fLanguage :
English
Journal_Title :
Fuzzy Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1063-6706
Type :
jour
DOI :
10.1109/TFUZZ.2015.2403878
Filename :
7042824
Link To Document :
بازگشت