3.4: Conclusion

Before starting the actual machine learning model training, we should spend time on understanding the data and features. We can use EDA for identifying interesting patterns in the data. Some of these patterns can be turned into useful features for the model. In some cases, the original dataset has limited number of features and not much can be done using these features. EDA assisted feature engineering can be very useful in these situations for creating a dataset with information rich features.

A secondary usefulness of EDA is that we discover many anomalies in the dataset. Upon verifying with the domain knowledge experts, our understanding of the domain can be improved.

One caveat of EDA assisted feature engineering is that we need to be careful while making deductions from EDA and using the insights in feature engineering. If the person who derived the insights is a subject matter expert, then we have a better chance of success. However, if the machine learning engineer does not have expertise in the domain, then before finalizing conclusions from EDA, SME help should be sought to check and confirm the derived insights. Even so, we should use more than one visualization before concluding with the insights to create a new feature.