6.4: Wrapper Method

This method uses a subset of features from the original list of features to train the model. It is a greedy method, as it evaluates all possible combinations of features to find the best possible combination. The inference is drawn from a model trained on the subset of data. Based on this, decisions are made to add or remove features from the subset. It evaluates many combinations of features against a specified model metric, such as the f1 score for classification and R square for regression. It returns a feature set that gives the best results. It is computationally more expensive than filter methods and takes more time to perform feature selection. Even after performing this search, results might not always be desirable, as the wrapper method often leads to overfitting.

4 methods come under the wrapper method. These are forward selection, backward selection, Stepwise feature selection, and recursive feature selection.

6.4.1 Forward Selection

Forward selection starts with an empty set of features and features are added until the R square keeps increasing and F-statistics are significant. It starts with a model with a single feature. Models with single features with the highest r squares are selected. In the subsequent iterations, additional features are incrementally added to the model, until R square keeps increasing, and f-statistics is significant.

6.4.2 Backward Selection

This method starts with all features. Features for whom a change in R square is not reflected with statistically significant F-statistics are removed.

6.4.3 Stepwise Selection

Stepwise feature Selection is done by creating a regression model, followed by ranking features based on p values. Features with a p-value of more than 0.05 are dropped. The removed features can still enter the model at a later step. This process continues until it meets convergence criteria. However, it could lead to overfitting and increase in false positives.

6.4.4 Recursive Feature Elimination

The recursive feature elimination (RFE) is an iterative procedure and is an instance of backward feature elimination. The first model has all features and features are removed one by one based on some scoring function, such as importance of the coefficients to maximize some target metric. This is performed until the desired number of features remains. This is the most commonly used feature selection method.

Now let s look at different datasets and the impact of each method on the model performance.