1.1. System design

Designing analytical systems thoughtfully ensures they are reliable, maintainable, and scalable over time. Good system design helps keep your work transparent, efficient, and easier to debug.

Keep it simple

It is very difficult to write transparent, easy to read code if the overall process is too complicated.

Look to find the simplest approach to your problem, and by doing so the code that is required is usually a lot simpler too.

Break down complex problems into small, manageable components
Write clear, concise code with a single responsibility per module or function
Simple systems are easier to maintain, test, and extend

This can also be an interative approach, just because the code works does not mean that it cannot be worked on further to simplify.

Directed Acyclic Graphs (DAGs)

A core principle in system design is to model workflows as Directed Acyclic Graphs (DAGs). Each node represents a distinct task or computation, and edges define dependencies.

DAGs ensure no circular dependencies, making execution order clear
Tasks run only when dependencies are met, avoiding redundant work
This clarity helps maintain reproducibility and simplifies debugging

Do Not Repeat Yourself (DRY)

Avoid duplicating code or logic, unless there is very good reason.

Create reusable functions or classes for common tasks
Share components across different projects or pipelines
Reduces the risk of inconsistencies and errors

Configuration Management

Separating configuration from code is crucial.

Store parameters like file paths, thresholds, and environment settings outside your codebase (e.g., config files, environment variables)
This allows easy adjustments without modifying core code
Supports multiple environments (development, testing, production) without code changes
Improves reproducibility and collaboration by clearly defining runtime settings