11.2: Hyperparameter Tuning

Many machine learning algorithms have hyperparameters that help the algorithm learn effectively from the data. Increasing or decreasing the value of these hyperparameters can alter the performance of the model for better or worse. For example, random forest and many other tree algorithms have a parameter for defining the number of trees that will be used for the model. By increasing or decreasing the number of trees, model performance might change.

If the number of trees is low, say 1, and we increase it to 5, the performance of the model might improve. If we further increase from 5 to 20, the model might perform even better. However, if we keep increasing the number of trees, it might not always lead to a better-performing model. After a certain number of trees, the model performance may not increase, or worse, performance might decrease. Hence it is very important to identify the number of trees in this case.

Some machine learning techniques have multiple hyperparameters. Not all hyperparameters are equally important for improving model performance. Depending on the computing power and time available, we should try hyperparameter tuning. There are many techniques available for hyperparameter tuning.

The most basic method of hyperparameter optimization is called manual search. In this method, we try all possible combinations for feature selection. A for loop can help us perform all possible combinations to fit the model with different values of hyperparameter and identify the combination at which the model gives the best performance. There is another method for hyperparameter search known as grid-search. This method is an improvised version of manual search hyperparameter optimization. The only advantage this method has is that it saves us from writing multiple nested loops.

Both manual and grid search methods are computationally expensive. Randomized Search method on the other hand trains the model on random hyperparameter combinations. It saves us from trying all combinations of hyperparameter values. One disadvantage of this method is that it may not identify the optimal values of hyperparameters.

Bayes Grid Search uses Bayesian optimization to identify optimal values of hyperparameters. It can be done with the help of the python library skopt. There is another python library optuna, that uses the define-by-run principle to enable the user dynamically construct the search space. It allows the user ways that have never been possible with previous hyperparameter tuning frameworks. It does so by combining efficient searching and pruning algorithms, and thereby greatly improves the cost-effectiveness of optimization.