7.4: Using Tree-Based Feature Importance for Linear Model
It might seem useful to train a
random forest model and select high-importance features from the random forest
model in a linear model. However, this can be problematic for two reasons.
Firstly, feature importance in the
random forest is not comparable to importance as measured by a linear model
using the beta coefficient. A feature can have high importance in a random
forest model, while the same feature can have a relatively lower beta
coefficient in a linear model and vice versa. Or worse, the feature could
become statistically insignificant in a linear model while it has high
importance in random forest.
Secondly, random forest is good at
finding nonlinear relationships between features and the dependent variable. On
the other hand, linear models have a linearly additive relationship between
independent variables. i.e., one feature's relationship with the dependent
variable is independent of any other feature. The same can't be true for the
random forest as it explores nonlinear relations.