Title : 
Using regular expressions for mining data in large software repositories
         
        
            Author : 
Awang Abu Bakar, Normi Sham
         
        
            Author_Institution : 
Dept. of Comput. Sci., Int. Islamic Univ. Malaysia, Kuala Lumpur, Malaysia
         
        
        
        
        
        
            Abstract : 
The usage of data mining technique in collecting data from software repositories involves the extraction of both basic and value-added information from existing software repositories. Regular Expressions (Regex) provide a mechanism to select specific strings from a set of character strings. In this paper, we discuss how regular expressions are used to create a data mining tool, known as OSSGrab. We developed the mining tool using Python scripting, in combination with Regex, and as a result, the time spent on data collection can be saved significantly.
         
        
            Keywords : 
authoring languages; data mining; formal languages; software engineering; OSSGrab; Python scripting; Regex; basic information extraction; character strings; data collection; data mining technique; large-software repositories; regular expressions; value-added information extraction; Decision support systems; Data mining; empirical software engineering; open source; regular expressions; software repositories;
         
        
        
        
            Conference_Titel : 
Information and Communication Technology for The Muslim World (ICT4M), 2014 The 5th International Conference on
         
        
            Conference_Location : 
Kuching
         
        
        
            DOI : 
10.1109/ICT4M.2014.7020649