7.4: Using Tree-Based Feature Importance for Linear Model

It might seem useful to train a random forest model and select high-importance features from the random forest model in a linear model. However, this can be problematic for two reasons.

Firstly, feature importance in the random forest is not comparable to importance as measured by a linear model using the beta coefficient. A feature can have high importance in a random forest model, while the same feature can have a relatively lower beta coefficient in a linear model and vice versa. Or worse, the feature could become statistically insignificant in a linear model while it has high importance in random forest.

Secondly, random forest is good at finding nonlinear relationships between features and the dependent variable. On the other hand, linear models have a linearly additive relationship between independent variables. i.e., one feature's relationship with the dependent variable is independent of any other feature. The same can't be true for the random forest as it explores nonlinear relations.