Chapter 8: Feature Selection Using Metaheuristic Algorithms

Each feature in a dataset can have a main effect and an interaction effect, because of which, different combinations of features have varying model performance. This makes feature selection an inherently combinatorial problem. We need to find the combination of features that gives the best model performance. As the number of features keeps increasing, the number of possible combinations keeps increasing, and so does the computational cost of trying all the possible combinations. Metaheuristic algorithms can help us solve this problem by searching for a limited and lesser number of solutions. It does so by searching for better solutions iteratively. At the beginning of the algorithm, it starts with randomly generated solutions and tries to improve the solution at each iteration. Metaheuristic algorithms are procedures that can find a good solution for optimization problems, which are difficult and complex otherwise to solve manually. These partial search algorithms may provide a good enough solution, if not a perfect solution. These are very useful for feature selection, as they can help us find better feature sets than otherwise possible through manually trying different combinations.

We will discuss 4 metaheuristics algorithms in this chapter. These are genetic algorithm, simulated annealing, ant colony optimization, and particle swarm optimization. We have developed a companion python library MetaheuristicsFS, which has all 4 metaheuristics feature selection algorithms. Its module FeatureSelection helps us perform the desired feature selection.

Some parameters are common across all metaheuristic algorithms in this library. For example, cross-validation dataset, validation data set, and name of all input features. Imagine a scenario where you want to try multiple metaheuristic algorithms. You will need to enter these common parameters repeatedly for all the algorithms separately. To avoid this situation, we have used the  singleton  approach in the MetaheuristicsFS library. The first step creates a feature selection object by providing common input parameters. In the second step, we can initialize any desired metaheuristic algorithm from the 4 listed algorithms. We will discuss this in the subsequent sections.

First, we will import the FeatureSelection module from the MetaheuristicsFS python library using the below syntax.

from MetaheuristicsFS import FeatureSelection

Let s understand all the required input fields for the module FeatureSelection.

columns_list: It is a python list object and contains the names of all the features as strings, separated by a comma. These feature names are present in the training, test, validation, and external validation datasets. For example, if there are 3 features x1, x2, and x3, it will be represented as columns_list = [ x1 ,  x2 ,  x3 ]. Based on this input list of features, search algorithms create different combinations, to find the best possible feature combination.

data_dict: It is a python dictionary object and contains training and test data for multiple cross-validations.

Its key represents each unique cross-validation. For example, 0,1,2,3,4 represent 5 separate cross-validations. If a user wants to perform 5-fold cross-validation, data_dict should have 5 key-value pairs with 0,1,2,3, and 4 as keys. Each pair is created by shuffling the dataframe and splitting into train and test.

Each key has a nested dictionary containing training and test data. The value against each cross-validation key is a nested dictionary and contains features and dependent variables as a dataframe object. Key 'x_train', and 'x_test' contain feature dataframe for training and test data as value pairs. Similarly, the 'y_train', and 'y_test' key contains a dependent variable as a dataframe object for training, and test data respectively.

Below is what the dictionary structure looks like for 2-fold cross-validations.

{
           
0:{
                       
"x_train":x_train_dataframe,
                       
"y_train":y_train_array,
                       
"x_test":x_test_dataframe,
                       
"y_test":y_test_array
            },
           
1:{
                       
"x_train":x_train_dataframe,
                       
"y_train":y_train_array,
                       
"x_test":x_test_dataframe,
                       
"y_test":y_test_array

            }
}

x_validation_dataframe: It has feature dataframe for validation data set.

y_validation_dataframe: It has the dependent variable, stored as a dataframe for the validation data set.

model: It is the initialized model, stored as an object. For example, for the linear regression function in the Sklearn python library, the model object can be initialized as model = LinearRegression()

This object model will then be used for training the model by using training data and predicting for test and validation data. It should have a .fit attribute for training, and a .predict attribute for predicting. Sklearn models, as well as Xgboost and other major modeling techniques, have these 2 functionalities. It does not support deep learning models, however.

cost_function_improvement: There are 2 values for this parameter, depending on the goal of the optimization. We can select either  increase  or  decrease  as the string value.

Setting the value for  increase  will enable the feature selection algorithm to look for solutions where model metric values increase in each iteration. One example can be f1 score for classification model. We will like to obtain a model that gives us highest F1 score.

Setting the value as  decrease  will enable the feature selection algorithm to search for solutions where cost is lowest. For example, for regression models, RMSE is a commonly used cost function. It is desirable to obtain a model which has lowest amount of RMSE. Setting  decrease  for this parameter will enable the algorithm to search for a feature set which gives lowest RMSE.

cost_function: Cost function is for finding cost between actual and predicted values. For a regression problem, some examples are root mean square error, mean absolute error, etc. For a classification problem, some examples are f1 score, precision, and recall. The cost function should have 2 input parameters 'actual' and 'predicted' as arrays. It should return the cost between actual and predicted values. It supports all the cost functions available in Sklearn.

It also supports custom-made cost functions, as long as there are 2 parameters in the cost function, actual value, followed by predicted values, and returns cost value.

average: In the case of multi-class classification problems, cost functions such as precision, recall, f1 score, etc. in Sklearn have a parameter  average . This criterion specifies the type of averaging to be used for the cost function if the dependent variable has multiple classes. We can assign the average value for Sklearn cost functions for multi-class classifications in this parameter.

Now let s initialize a feature selection object for a regression problem, where we will use a linear regression model with 3 features, and mean squared error as a cost function.

from Sklearn.metrics import mean_squared_error
from Sklearn.linear_model import LinearRegression
from MetaheuristicsFS import FeatureSelection


columns_list = [
'feature 1', 'feature2', 'feature 3']

data_dict = {
           
0:{
                       
"x_train":x_train_dataframe,
                       
"y_train":y_train_array,
                       
"x_test":x_test_dataframe,
                       
"y_test":y_test_array
            },
           
1:{
                       
"x_train":x_train_dataframe,
                       
"y_train":y_train_array,
                       
"x_test":x_test_dataframe,
                       
"y_test":y_test_array

            }
}

model = LinearRegression()

fsObj = FeatureSelection(columns_list = columns_list,data_dict = data_dict, x_validation_dataframe = x_validation_tree, y_validation_dataframe = y_validation_tree, model = model, cost_function_improvement =
'decrease', cost_function = mean_squared_error)

After the feature selection object has been initialized, it can be used for executing a specified metaheuristic algorithm. Before we get into metaheuristic algorithms, let's understand why we need these algorithms in the first place by going through the first section of the chapter  exhaustive feature selection .