8.3: Simulated Annealing
Simulated annealing is an
optimization algorithm whose objective is to find global optima. It is inspired
by the process of annealing performed to toughen metals. For example, a piece
of steel is repeatedly heated at extremely high temperatures and then slowly
cooled. This process of heating and slow cooling helps molecules in the steel
to improve and reorganize their position, thereby strengthening the steel. It
is an improvement on the hill climbing algorithm, which is good at finding
local optima, but performs poorly on global optima.
The algorithm is executed for
several iterations as specified by the user. The temperature for annealing is
set by the user before the start of the algorithm. For the first iteration, a
random subset of features is drawn and its performance is measured for the dataset.
The given feature set is perturbed by randomly removing and adding new features
to the feature set. We can specify the percentage of features that can be
perturbed. Usually, this percentage is kept between 1 and 5. After the features
are perturbed, the new perturbed features are used for evaluating the dataset.
We then compare model performance between features drawn at the beginning of
iteration and perturbed features. If the new subset is better than the initial
subset, we replace the initial subset with the better subset. If on the other
hand, the perturbed subset performs worse than the initial subset, we calculate
an acceptance or rejection probability, which uses both the model metric as
well as temperature. The calculated probability is compared against a random
probability. For the regression problem, if the random probability is less than
accept reject probability, we take the worse-performing perturbed feature set.
For classification problems, if the random probability is more than accept reject
probability, we take the worse-performing perturbed feature set. The new
feature set thus generated is used for the next iteration, and the temperature
is cooled based on a predefined alpha value. One thing to note about perturbing
is that in an iteration, we can perturb more than once.
Simulated annealing tries to improve
the feature set randomly generated in the first iteration. In some cases, the
random feature set generated could be very close to global minima or global
maxima. Hence, for the same feature set, in some cases, the extent of
improvement obtained through simulated annealing could be worse in comparison
to some other instances. Apart from the hyperparameters of the algorithm, the
random feature set generated in the first iteration also has an impact. This is
one of the limitations of simulated annealing, as it is beyond the control of
the user.
After completion of simulated
annealing, we should test the resultant feature set on external test and
validation data. Figure 8.3.1 below is a flowchart of the simulated annealing
algorithm [1]
Figure 8.3.1: Simulated Annealing
algorithm flowchart
We can use the function SimulatedAnnealing
in the feature selection object fsObj
for performing simulated annealing. Below is how the syntax will look like.
best_columns = fsObj.SimulatedAnnealing(temperature = 1500,
iterations = 25,
n_perturb =
75,
run_time=1200,
n_features_percent_perturb=1,
alpha=0.9)
In this syntax example, we have set
a high temperature of 1500. We have executed simulated annealing for 25
iterations. Within each iteration, we have perturbed 1 percent of features. We
have allowed perturbation to happen for 75 times in each iteration. Execution
time limit is set for 1200 minutes. We have set alpha as 0.9.