11.2: Hyperparameter Tuning
Many machine learning algorithms
have hyperparameters that help the algorithm learn effectively from the data.
Increasing or decreasing the value of these hyperparameters can alter the
performance of the model for better or worse. For example, random forest and
many other tree algorithms have a parameter for defining the number of trees
that will be used for the model. By increasing or decreasing the number of
trees, model performance might change.
If the number of trees is low, say
1, and we increase it to 5, the performance of the model might improve. If we
further increase from 5 to 20, the model might perform even better. However, if
we keep increasing the number of trees, it might not always lead to a
better-performing model. After a certain number of trees, the model performance
may not increase, or worse, performance might decrease. Hence it is very
important to identify the number of trees in this case.
Some machine learning techniques
have multiple hyperparameters. Not all hyperparameters are equally important
for improving model performance. Depending on the computing power and time
available, we should try hyperparameter tuning. There are many techniques
available for hyperparameter tuning.
The most basic method of
hyperparameter optimization is called manual search. In this method, we try all
possible combinations for feature selection. A for loop can help us perform all
possible combinations to fit the model with different values of hyperparameter
and identify the combination at which the model gives the best performance.
There is another method for hyperparameter search known as grid-search. This
method is an improvised version of manual search hyperparameter optimization.
The only advantage this method has is that it saves us from writing multiple
nested loops.
Both manual and grid search methods
are computationally expensive. Randomized Search method on the other hand
trains the model on random hyperparameter combinations. It saves us from trying
all combinations of hyperparameter values. One disadvantage of this method is
that it may not identify the optimal values of hyperparameters.
Bayes Grid Search uses Bayesian
optimization to identify optimal values of hyperparameters. It can be done with
the help of the python library skopt. There is another python library optuna,
that uses the define-by-run principle to enable the user dynamically construct
the search space. It allows the user ways that have never been possible with
previous hyperparameter tuning frameworks. It does so by combining efficient
searching and pruning algorithms, and thereby greatly improves the
cost-effectiveness of optimization.