8.2: Genetic Algorithm

The genetic algorithm technique is an optimization technique. It is a population-based metaheuristics search algorithm, inspired by the natural phenomenon of evolution. Parents produce offspring, who inherit the strengths and abilities of both parents in the form of genes. This happens through a process called crossover. In the genetic algorithm, the crossover is carried out at one or more random cut-off points and the gene is swapped from both parents based on crossover probability. Figure 8.2.1 below illustrates the process of crossover. The cut-off point for the genes is kept in the 4th column for the parents and highlighted in red. Genes are swapped at the cut-off points and child 1 and child 2 have the swapped genes. Genes from the beginning until the cut-off point are the same as parents. From the 5th point onwards, genes are obtained from the other parents.

Figure 8.2.1: Crossover in genetic algorithm

There is also a likelihood of genetic mutation when a gene is passed from parents to offspring. In the case of the genetic algorithm, a mutation is introduced in the form of new patterns to randomly refresh the population. This encourages the search for unexplored feature combinations. A small probability (1% 2%) is selected for each possible feature as mutation probability. A random number is generated. If the random number is less than the mutation probability, then the feature is mutated and its status is changed. If it was originally supposed to be included, it is excluded and vice versa. Figure 8.2.2 explains the mutation process in the genetic algorithm.

Cells highlighted in red were mutated and values were changed. If the original value was 1, it was changed to 0, and vice versa.

Figure 8.2.2: Mutation in genetic algorithm

An initial set of solutions are created, called population. Each solution in the population is called a chromosome . Each chromosome has a length equal to the total number of features. Individual features are represented as binary 0/1. 0 stands for a specific feature being removed from feature space and 1 stand for the feature being included. As explained previously, through the process of crossover, a list of the best chromosomes is selected and passed through mutation. This process is repeated several times, based on how many generations are defined by the user. Imagine there are 10 features x1 to x10 and we want to try 7 solutions at a time. i.e., a population of 7. Figure 8.2.3 illustrates what the initial solution will look like for 10 features and a population of 7. Each row will be a chromosome and all chromosomes combined will be called population. Figure 8.2.4 illustrates the genetic algorithm flowchart.

Figure 8.2.3: Solution matrix for 10 features and population of 7

 

 

 

 

Fig 8.2.4: Genetic algorithm flowchart

M: Number of generations

N: Population size

pc: Probability of crossover

pm: Probability of mutation

k: Number of contestants

Number of generations, population size, crossover, and mutation probability are some of the hyperparameters of genetic algorithm.

This method tries to find the best possible feature combination by searching in a computationally efficient manner. This also means it does not search for all possible combinations. As a result, the best-performing features obtained from the genetic algorithm might not be the best combination of features. If the number of features are small, the genetic algorithm can be a really useful method. However, for high dimensional data in which the number of features is numerous, the genetic algorithm might not be computationally efficient, and the resultant features obtained from the genetic algorithm might not be useful either.

We have earlier initialized a feature selection object in this chapter as fsObj = FeatureSelection. We can use the object fsObj for performing the genetic algorithm by using the GeneticAlgorithm function in the object fsObj. Below is how the syntax will look like.

best_columns = fsObj.GeneticAlgorithm(generations=20, population=75, run_time=1200)

In this syntax example, we have used 20 generations and 75 population for the genetic algorithm. We will execute genetic algorithm feature selection for 1200 minutes, i.e., 20 hours and will stop the execution after that.