A key problem in genomics is the classification and annotation of sequences in a genome. A major challenge is identifying good sequence features. Evolutionary algorithms have the potential to search a large space of features and automatically generate useful ones. This paper proposes a two-stage method that generates features using multiple replicates of a genetic algorithm operating on an augmented finite state machine, called a side effect machine (SEM), and then selects a small diverse feature set using several methods, including a novel method called dissimilarity clustering. We apply our method to three problems related to transposable elements and compare the results to those using
-mer features. We are able to produce a small set of interesting and comprehensible features that create random forest classifiers more accurate and less prone to overfitting than those created using
-mer features. We analyze the SEM fitness landscapes and discuss the use of different fitness functions.