DocumentCode :
168638
Title :
Next generation data classification and linkage: Role of probabilistic models and artificial intelligence
Author :
Hettiarachchi, Gayan Prasad ; Hettiarachchi, Nadeeka Nilmini ; Hettiarachchi, Dhammika Suresh ; Ebisuya, Azusa
Author_Institution :
Dept. of Phys., Osaka Univ., Toyonaka, Japan
fYear :
2014
fDate :
10-13 Oct. 2014
Firstpage :
569
Lastpage :
576
Abstract :
Data classification and linkage is the task of identifying information corresponding to the same entity from one or more data sources. Methods used to tackle data classification and linkage problems fall into two broad categories. One commonly used method is deterministic models, in which sets of often very complex rules are used to classify pairs of entities as links. The other is the probabilistic model, in which statistical or probabilistic approaches are used to classify pairs. However, these models fail to deliver when there are lots of missing values, typographical errors, non-standardized entities, etc. To this end, intelligent routines making use of artificial neural networks, genetic algorithms and clustering algorithms can provide the next generation models for data classification and linkage. An introduction to data linkage, impact on humanity and community, current models, associated pitfalls, new directions and issues both technical and social for next generation data classification and linkage systems are discussed using an example prototype. A new model for linkage is proposed, where it is highlighted that not only the relationships between attributes of different entities, but also identification of relationships within the attributes of an entity is important in handling missing values and can provide better accuracy.
Keywords :
genetic algorithms; learning (artificial intelligence); neural nets; pattern classification; probability; artificial intelligence; artificial neural networks; clustering algorithms; complex rules; data sources; deterministic models; entity attributes; genetic algorithms; information identification; missing value handling; next generation data classification; next generation data linkage; probabilistic approach; probabilistic model; relationship identification; social issues; statistical approach; technical issues; Accuracy; Artificial neural networks; Couplings; Data models; Joining processes; Next generation networking; Probabilistic logic; Big data; classification; data linkage; machine learning; phonetic matching; probabilistic models; string comparison;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Global Humanitarian Technology Conference (GHTC), 2014 IEEE
Conference_Location :
San Jose, CA
Type :
conf
DOI :
10.1109/GHTC.2014.6970340
Filename :
6970340
Link To Document :
بازگشت