Back to Systems
Machine Learning
How I Approach Feature Engineering
Philosophy
Feature engineering is where domain knowledge meets statistics. A great feature can be worth 10x more than a fancier model. I focus on creating features that capture real-world relationships in the data.
Principles
1
Start with domain intuition
What would a human expert look at? Encode that knowledge into features.
Example: For churn prediction: account age, usage trends, support tickets pattern
2
Create interactions
Relationships between features often matter more than individual features.
Example: revenue_per_user = total_revenue / num_users
3
Test feature importance
Use permutation importance or SHAP to validate that new features actually help.
Example: Run model with and without feature, compare CV scores
Anti-Patterns I Avoid
✕
Creating hundreds of features blindly
Automated feature generation without domain context creates noise and overfitting.
Instead: Start with 10-20 thoughtful features, add more based on analysis
✕
Data leakage through features
Features computed with information from the future or the target variable.
Instead: Check feature creation dates, validate against holdout set
Resources That Shaped This
- →Feature Engineering for Machine Learning (O'Reilly)
- →Kaggle feature engineering micro-course
- →SHAP documentation for feature importance