مرکز منطقه ای اطلاع رساني علوم و فناوري - Investigating Samples Representativeness for an Online Experiment in Java Code Search

Abstract :

Context: The results of large-scale studies in software engineering can be significantly impacted by samples´ representativeness. Diverse population sources can be used to support sampling for such studies. Goal: To compare two samples, one from the crowdsourcing platform Mechanical Turk and another from the professional social network LinkedIn, in an online experiment for evaluating the relevance of Java code snippets to programming tasks. Method: To compare the samples (subjects´ experience, programming habits) and experimental results concerned with three experimental trials. Results: LinkedIn´s subjects present significantly higher levels of experience in Java programming and programming in general than Mechanical Turk´s subjects. The experimental results revealed a significant difference between samples and suggested that LinkedIn´s subjects were more pessimistic than Mechanical Turk´s subjects despite a high level consistency in the experimental results. Conclusion: The combined use of sources of sampling can bring benefits to large scale studies in software engineering, especially when heterogeneity is desired in the population. Thus, it can be useful to investigate and characterize alternative sources of sampling for performing large-scale studies in software engineering.