1.1. System design
Designing analytical systems thoughtfully ensures they are reliable, maintainable, and scalable over time. Good system design helps keep your work transparent, efficient, and easier to debug.
Keep it simple
It is very difficult to write transparent, easy to read code if the overall process is too complicated.
Look to find the simplest approach to your problem, and by doing so the code that is required is usually a lot simpler too.
-
Break down complex problems into small, manageable components
-
Write clear, concise code with a single responsibility per module or function
-
Simple systems are easier to maintain, test, and extend
This can also be an interative approach, just because the code works does not mean that it cannot be worked on further to simplify.
Directed Acyclic Graphs (DAGs)
A core principle in system design is to model workflows as Directed Acyclic Graphs (DAGs). Each node represents a distinct task or computation, and edges define dependencies.
-
DAGs ensure no circular dependencies, making execution order clear
-
Tasks run only when dependencies are met, avoiding redundant work
-
This clarity helps maintain reproducibility and simplifies debugging
Do Not Repeat Yourself (DRY)
Avoid duplicating code or logic, unless there is very good reason.
-
Create reusable functions or classes for common tasks
-
Share components across different projects or pipelines
-
Reduces the risk of inconsistencies and errors
Configuration Management
Separating configuration from code is crucial.
-
Store parameters like file paths, thresholds, and environment settings outside your codebase (e.g., config files, environment variables)
-
This allows easy adjustments without modifying core code
-
Supports multiple environments (development, testing, production) without code changes
-
Improves reproducibility and collaboration by clearly defining runtime settings