End-to-end machine learning pipelines — from data ingestion and feature engineering to deployment-ready code. Systems administration background means I think about infrastructure, environment isolation, and failure modes before I think about model architecture.
I treat ML problems as systems problems first. A model that can't be reproduced or re-deployed is a liability. My workflow enforces clean separation between ingestion, feature computation, training, and evaluation — by structure, not convention.
Trained and worked as a Systems Technician (ASIR). What drew me most to the field was databases — from schema design and query writing to index tuning and execution plan analysis. I find the full range compelling: modelling, SQL craft, and squeezing performance out of a slow query.
I design pipelines that don't break when schemas change, models that return calibrated probabilities rather than raw scores, and evaluation setups that catch leakage before it reaches production.
Stack is context-dependent. I default to uv for environment management, Polars for large tabular data, scikit-learn for well-understood problems. I don't use deep learning when gradient boosting will do.
uv · Copier · Sphinx · Polars/Pandas dual support · structured layoutdata/raw, features/, models/, pipelines/. Pinned lockfiles. Auto-generated docs.Predict loan default probability from applicant data, with interpretability as a first-class constraint — not an afterthought.
Predict customer abandonment probability and output ranked intervention lists — the distinction between a model result and an operationally usable product.
Identify latent risk patterns across socioeconomic indicators for countries — without a predefined label or ground truth.
Predict directional price movements in a non-stationary, low signal-to-noise domain where most published results fail out-of-sample.
Detect and exploit statistical regularities in human behavioural sequences — a controlled environment to study online frequency estimation and adaptive decision-making.