3.4: Conclusion
Before starting the actual machine
learning model training, we should spend time on understanding the data and
features. We can use EDA for identifying interesting patterns in the data. Some
of these patterns can be turned into useful features for the model. In some
cases, the original dataset has limited number of features and not much can be
done using these features. EDA assisted feature engineering can be very useful
in these situations for creating a dataset with information rich features.
A secondary usefulness of EDA is
that we discover many anomalies in the dataset. Upon verifying with the domain
knowledge experts, our understanding of the domain can be improved.
One caveat of EDA assisted feature
engineering is that we need to be careful while making deductions from EDA and
using the insights in feature engineering. If the person who derived the
insights is a subject matter expert, then we have a better chance of success.
However, if the machine learning engineer does not have expertise in the
domain, then before finalizing conclusions from EDA, SME help should be sought
to check and confirm the derived insights. Even so, we should use more than one
visualization before concluding with the insights to create a new feature.