5.2: SHAP
Shapley additive explanation (SHAP)[1]
method can help us detect feature interactions. SHAP is a model explanation
technique. For a specific data point, it can explain the extent of importance a
feature has on a model for the data point. It uses game theory to quantify the
contribution of each feature. Positive and negative SHAP values represent their
impact in increasing or decreasing the prediction outcome. SHAP can be extended
to find interaction amongst features. It does so by ascertaining the impact of
the main effect and interaction effect.
There are three steps for finding
the interaction effect using SHAP. The first step is to create a model with
available features. The second step is to calculate interaction values using
the SHAP python package, for each record, using the model. The result from SHAP
will be in the form of a matrix, which has interaction amongst all features,
for each record. The third step is to aggregate interaction values across all
records to get overall interaction across features. If the number of features
is a handful, we can visually inspect to understand the overall feature
interaction of features. If the number of features is numerous, we can instead
do mathematical aggregation, such as mean value to finalize the extent of
interaction.
There are a few caveats with using
SHAP feature interaction.
1) The interactions identified by
SHAP will be as good as the model. If the model is not of good performance,
interaction effects detected by SHAP will not be reliable.
2) Calculating SHAP interaction
values is computationally expensive.
3) There is no fixed cutoff value of
the SHAP interaction value. For some interactions, it can be 0 or very close to
0. If a cutoff is indeed needed to be established, it will be the discretion of
the analyst. We put a cutoff of 0.1 for the sake of convenience. We will remove
interactions for which SHAP mean score is less than 0.1.
4) The method shap_interaction_values is
available for TreeExplainer
in SHAP python library. It can only be used for tree-based bagging and boosting
technique for obtaining interactions. It cannot be used for models which do not
use a decision tree at its core, such as linear regression, or logistic
regression. etc.
One advantage of using SHAP for
interaction effects is that it can be used for both regression and
classification. This is unlike the interaction plot method discussed earlier,
which is useful only for regression.
Let s use this method for finding
the interaction effect in the coupon recommendation dataset and car sales
dataset.
5.2.1 Car Sales
For the car sales dataset, we ran
the experiment across all cross-validation training samples. Each
cross-validation has a set of interactions discovered by SHAP. To avoid
overfitting, we will consider only those interactions which are present across
all cross-validations. We shortlisted 374 interactions that were present in all
5 cross-validation samples and where the SHAP value was greater than or equal
to 0.1. Also, SHAP values for the 374 features were obtained by averaging
across 5 cross-validations. By looking at the interactions discovered by SHAP,
we can infer that it was able to detect interaction only amongst higher-order
features. Figure 5.2.1a is the plot for the bottom 5 interactions.
Figure 5.2.1a Bottom 5 interactions
Higher-order features for fuel and owner
have the lowest degree of interaction effect. For fuel, count encoding of
different categories is a numerical variable. It has a weak interaction with
the mean encoding for dependent variable for the owner feature. The rest of the
bottom 5 interaction effects are amongst higher order features and none of
these are original features.
Now let s understand the top 5
interaction effects in figure 5.2.1b.
Figure 5.2.1b Top 5 interactions
Label encoding of 2 features namely
torque and Above90 have the highest degree of interaction effect. Since both
the features are categorical, we can do a simple concatenation between the two
features to obtain an interaction effect feature. For the rest of the
interactions in the top 5, all the features are label-encoded feature of a
categorical feature. For such interaction between categorical encoded features
and other higher-order features, we can simply perform concatenation between
two features to obtain interaction effect feature.
5.2.2 Coupon
Recommendation
Now let s look at the coupon
recommendation dataset. The interaction was detected amongst higher order
encoded features of categorical features. Although the magnitude of the
interaction effect is small, there were 6 interaction effects detected in this
dataset. These interaction effects were present across all cross-validation
samples. These are shown in figure 5.2.2.
Figure 5.2.2 Interactions detected
for coupon recommendation dataset
The smallest degree of interaction
was detected between label encoding feature for bar, and label encoding for the
type of coupon. Both features are numerical. For the rest of the detected
interactions, at least one feature is label-encoded features of a categorical
feature, while the other feature is a count or mean encoded feature. The only
way to create interaction feature between these two types of features is by
creating dummy variables of the label encoded feature and multiply it with the
numerical feature.
The next step will be to create new
categorical or numerical features from these features. For newly created
categorical features, these can't be used directly. To be able to use these
features, we will need to further create higher order features from these
categorical interaction effect features.
For the rest of the datasets, the
feature matrix is huge. It is computationally very expensive to detect
interaction effects through SHAP. Also, most of the features are higher order
features for either date of check-in, revenue, and booking patterns. Hence,
there is a chance that very few interaction effects could be detected which are
meaningful and could be explained to laymen.