Chapter 5: Interaction Effect Feature Engineering
The relationship between the independent
variable and the dependent variable is called the main effect. For example, the
impact of calorie intake on cholesterol levels could be referred to as the main
effect. Some features might interact with each other and can affect each
other's values. For example, body weight can impact calorie intake. People with
higher body weights tend to consume more calories. Body weight and calorie
intake, both can impact cholesterol levels in the body. Interaction effects
indicate that a third variable influences the relationship between the
dependent variable and the independent variable. The effect of one independent
variable on the dependent variable is dependent on the value of another
independent variable.
Interaction effects can explain
additional variation in the dependent variable, more than what individual
features could do alone. Interaction effect exists amongst some independent
variables. It is important to identify these interaction effects and represent
them as new features. In the next step, we need to test the validity of the
interaction effect. If we keep adding interaction effects mechanically amongst
all existing features in a dataset, without any backing of domain knowledge or
established principles, it will lead to overfitting. In this chapter, we will
discuss methods other than domain knowledge for identifying interaction
effects.