DocumentCode :
188961
Title :
Structural Analysis of Source Code Collected from Programming Contests
Author :
Bokuk Park ; Haesung Tak ; Hwan Gue Cho
Author_Institution :
Dept. of Comput. Sci. & Eng., Pusan Nat. Univ., Pusan, South Korea
fYear :
2014
fDate :
11-13 Sept. 2014
Firstpage :
571
Lastpage :
576
Abstract :
Programming contests such as the International Olympiad for Informatics (IOI) and the International Collegiate Programming Contest (ICPC) are effective for encouraging young and bright programmers. These contests require contestants to complete a few tasks (between three and nine) related to algorithmic problems within a limited time. For this study, we collected a set of 2,400 programming codes submitted to the KOI (Korea Olympiad for Informatics) in 2011 and 2012 as well as 2,300 programming codes submitted at the preliminary contest session for the ICPC in 2009, 2011, and 2012 at the East-Asia regional contest. Because submitted source codes were evaluated with blind test cases, we can define a criteria to separate the high- and low-scoring students in the order of their respective scores. The main objective of this paper is to reveal the relationship between the task´s proposed features, its difficulty, the school grade (elementary, middle-, and high-school), and the score. We do so with the data-mining tool WEKA. The ultimate goal of this study is to predict the score of some particular code with static analysis. We propose a simple and straightforward complexity measure based on the block-tree structure. We considered the high scoring student group as a positive class and the low scoring student group as negative class. The performance of the data mining classifier named Naïve Bayes are evaluated based on 10-fold cross validation test. We decided that the meaningful classification for a harmonic mean of sensitivity and specificity is empirically larger than 0.6 empirically. Among the codes acquired through the KOI, we found a set of outlier codes that attempt to reply with the correct response to receive extra points. Among the codes acquired through the ICPC, we discovered that good collegiate programmers (i.e., Those with high score) attempt to keep their code more compact, both lexically and structurally. We used WEKA to analyze the code using co- e-features proposed in this study, and the results are detailed quantitatively.
Keywords :
data mining; pattern classification; program diagnostics; software metrics; source code (software); 10-fold cross validation test; East-Asia regional contest; ICPC; IOI; International Collegiate Programming Contest; International Olympiad for Informatics; KOI; Korea Olympiad for Informatics; Naïve Bayes; WEKA data-mining tool; algorithmic problems; blind test cases; block-tree structure; data mining classifier; outlier codes; programming codes; programming contests; source code score prediction; source code structural analysis; static analysis; Complexity theory; Data mining; Educational institutions; Harmonic analysis; Programming; Sensitivity;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Technology (CIT), 2014 IEEE International Conference on
Conference_Location :
Xi´an
Type :
conf
DOI :
10.1109/CIT.2014.171
Filename :
6984713
Link To Document :
بازگشت