Back to Systems
Engineering
How I Organize ML Projects
Philosophy
Every ML project should be runnable by someone else. I structure projects so that code is modular, data flows are clear, and experiments are reproducible.
Principles
1
Use consistent project structure
Standardized layout makes it easy for anyone to navigate the codebase.
Example: data/, notebooks/, src/, models/, configs/, tests/
2
Separate exploration from production
Notebooks for exploration, Python modules for production-ready code.
Example: notebooks/01_eda.ipynb, src/features.py, src/model.py
3
Write tests for data assumptions
Test that data matches expectations: types, ranges, uniqueness.
Example: test_data_shape(), test_column_types(), test_no_nulls_in_critical()
Anti-Patterns I Avoid
✕
Everything in one notebook
Monolithic notebooks are impossible to test, version, or collaborate on.
Instead: Extract functions to modules, import into notebooks
✕
Magic numbers everywhere
Hardcoded thresholds and parameters make experiments hard to reproduce.
Instead: Use config files, document every magic number
Resources That Shaped This
- →Cookiecutter Data Science
- →12 Factor App principles
- →Python packaging best practices