DocumentCode :
3739208
Title :
Citation Prediction Using Diverse Features
Author :
Harish S. Bhat;Li-Hsuan Huang;Sebastian Rodriguez;Rick Dale;Evan Heit
Author_Institution :
Univ. of California, Merced, Merced, CA, USA
fYear :
2015
Firstpage :
589
Lastpage :
596
Abstract :
Using a large database of nearly 8 million bibliographic entries spanning over 3 million unique authors, we build predictive models to classify a paper based on its citation count. Our approach involves considering a diverse array of features including the interdisciplinarity of authors, which we quantify using Shannon entropy and Jensen-Shannon divergence. Rather than rely on subject codes, we model the disciplinary preferences of each author by estimating the author´s journal distribution. We conduct an exploratory data analysis on the relationship between these interdisciplinarity variables and citation counts. In addition, we model the effects of (1) each author´s influence in coauthorship graphs, and (2) words in the title of the paper. We then build classifiers for two-and three-class classification problems that correspond to predicting the interval in which a paper´s citation count will lie. We use cross-validation and a true test set to tune model parameters and assess model performance. The best model we build, a classification tree, yields test set accuracies of 0.87 and 0.66, respectively. Using this model, we also provide rankings of attribute importance, for the three-class problem, these rankings indicate the importance of our interdisciplinarity metrics in predicting citation counts.
Keywords :
"Predictive models","Entropy","Databases","Feature extraction","Training","Data mining","Measurement"
Publisher :
ieee
Conference_Titel :
Data Mining Workshop (ICDMW), 2015 IEEE International Conference on
Electronic_ISBN :
2375-9259
Type :
conf
DOI :
10.1109/ICDMW.2015.131
Filename :
7395721
Link To Document :
بازگشت