مرکز منطقه ای اطلاع رساني علوم و فناوري - Outlier detection from massive short documents using domain ontology

DocumentCode :

3442116

Title :

Outlier detection from massive short documents using domain ontology

Author :

Wang, Yongheng ; Yang, Shenghong

Author_Institution :

Sch. of Comput. & Commun., Hunan Univ., Changsha, China

Volume :

fYear :

2010

fDate :

29-31 Oct. 2010

Firstpage :

558

Lastpage :

562

Abstract :

With the rapid development of information technology, huge data is accumulated. A vast amount of such data appears as short documents such as paper summary or conversations in open chatting rooms. It is useful to detect outliers from those documents in intelligence analysis applications. However, traditional outlier detecting methods based on vector space model can not get acceptable accuracy because the key words appear at low frequency. On the other hand, traditional outlier detecting algorithms become very inefficient or even unavailable when processing massive data. In this paper a density-based outlier detecting method using domain ontology is presented. This algorithm uses domain ontology to calculate the semantic distance between short documents which improves the accuracy. Parallel method is also used to get better performance and scalability.

Keywords :

data analysis; document handling; information technology; ontologies (artificial intelligence); parallel processing; domain ontology; information technology; intelligence analysis application; massive data processing; open chatting room; outlier detection; parallel method; semantic distance; short document; Bismuth; density; domain ontology; massive; outlier detection; short document;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Intelligent Computing and Intelligent Systems (ICIS), 2010 IEEE International Conference on

Conference_Location :

Xiamen

Print_ISBN :

978-1-4244-6582-8

Type :

conf

DOI :

10.1109/ICICISYS.2010.5658426

Filename :

5658426

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3442116