Back to Systems
Engineering

How I Organize ML Projects

Philosophy

Every ML project should be runnable by someone else. I structure projects so that code is modular, data flows are clear, and experiments are reproducible.

Principles

1

Use consistent project structure

Standardized layout makes it easy for anyone to navigate the codebase.

Example: data/, notebooks/, src/, models/, configs/, tests/
2

Separate exploration from production

Notebooks for exploration, Python modules for production-ready code.

Example: notebooks/01_eda.ipynb, src/features.py, src/model.py
3

Write tests for data assumptions

Test that data matches expectations: types, ranges, uniqueness.

Example: test_data_shape(), test_column_types(), test_no_nulls_in_critical()

Anti-Patterns I Avoid

Everything in one notebook

Monolithic notebooks are impossible to test, version, or collaborate on.

Instead: Extract functions to modules, import into notebooks

Magic numbers everywhere

Hardcoded thresholds and parameters make experiments hard to reproduce.

Instead: Use config files, document every magic number

Resources That Shaped This

  • Cookiecutter Data Science
  • 12 Factor App principles
  • Python packaging best practices