Engineering

How I Organize ML Projects

Philosophy

Every ML project should be runnable by someone else. I structure projects so that code is modular, data flows are clear, and experiments are reproducible.

Principles

Use consistent project structure

Standardized layout makes it easy for anyone to navigate the codebase.

Example: data/, notebooks/, src/, models/, configs/, tests/

Separate exploration from production

Notebooks for exploration, Python modules for production-ready code.

Example: notebooks/01_eda.ipynb, src/features.py, src/model.py

Write tests for data assumptions

Test that data matches expectations: types, ranges, uniqueness.

Example: test_data_shape(), test_column_types(), test_no_nulls_in_critical()

Anti-Patterns I Avoid

Everything in one notebook

Monolithic notebooks are impossible to test, version, or collaborate on.

Instead: Extract functions to modules, import into notebooks

Magic numbers everywhere

Hardcoded thresholds and parameters make experiments hard to reproduce.

Instead: Use config files, document every magic number

Resources That Shaped This

→Cookiecutter Data Science
→12 Factor App principles
→Python packaging best practices