DocumentCode :
653505
Title :
Author Name Disambiguation in Technology Trend Analysis Using SVM and Random Forests and Novel Topic Based Features
Author :
Kastner, Sebastian ; Sung-Pil Choi ; Hanmin Jung
fYear :
2013
fDate :
20-23 Aug. 2013
Firstpage :
2141
Lastpage :
2144
Abstract :
Technology trend analysis systems use data mining to process vast amounts of papers, patents and news articles to analyze and predict the life cycles of technologies, products and other kinds of entities. Some systems can also extract relations between entities such as technologies, authors and products. In order to establish precise relations between entities, entity disambiguation has to be performed. In this study, we focused on author disambiguation in the context of technology trend analysis. We used Random Forests and SVM to learn a pair wise similarity function to decide whether two articles were written by the same author or not. Besides comparing common features such as article titles and author affiliations we also studied features that were built from the analyses that were made by KISTI´s InSciTe system. For training and evaluation a corpus containing 24, 750 pair wise article similarities was manually constructed using data from InSciTe. Using this corpus, Random Forests outperformed SVM and reached an accuracy value of 98.31%. Only using the newly introduced features, an accuracy of 94.79% was achieved, proving their usefulness.
Keywords :
data mining; information analysis; learning (artificial intelligence); pattern classification; support vector machines; SVM; author name disambiguation; data mining; entities relation extraction; novel topic based features; pairwise similarity function; random forests; support vector machines; technology trend analysis systems; Accuracy; Feature extraction; Libraries; Market research; Supervised learning; Support vector machines; Training; Author disambiguation; Link discovery; Technology Trend Analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Green Computing and Communications (GreenCom), 2013 IEEE and Internet of Things (iThings/CPSCom), IEEE International Conference on and IEEE Cyber, Physical and Social Computing
Conference_Location :
Beijing
Type :
conf
DOI :
10.1109/GreenCom-iThings-CPSCom.2013.403
Filename :
6682413
Link To Document :
بازگشت