ML Engineer · Data Systems

Building ML systems that
work beyond the notebook.

End-to-end machine learning pipelines — from data ingestion and feature engineering to deployment-ready code. Systems administration background means I think about infrastructure, environment isolation, and failure modes before I think about model architecture.

Python · PyTorch · Scikit-learn · Polars Docker · Linux · uv · FastAPI Azure Certified · Power BI ASIR background
github.com/cacelass
01
about Engineering identity
Approach

I treat ML problems as systems problems first. A model that can't be reproduced or re-deployed is a liability. My workflow enforces clean separation between ingestion, feature computation, training, and evaluation — by structure, not convention.

Systems background

Trained and worked as a Systems Technician (ASIR). What drew me most to the field was databases — from schema design and query writing to index tuning and execution plan analysis. I find the full range compelling: modelling, SQL craft, and squeezing performance out of a slow query.

Production awareness

I design pipelines that don't break when schemas change, models that return calibrated probabilities rather than raw scores, and evaluation setups that catch leakage before it reaches production.

Tooling philosophy

Stack is context-dependent. I default to uv for environment management, Polars for large tabular data, scikit-learn for well-understood problems. I don't use deep learning when gradient boosting will do.

02
projects Selected systems
Featured project
A production-grade Data Science project scaffold built to eliminate environment drift, enforce reproducibility, and standardise the structure of every ML project from day one.
Problem solved
Repetitive setup and inconsistent structure are the primary cause of reproducibility failure — not the models.
Stack
uv · Copier · Sphinx · Polars/Pandas dual support · structured layout
What it enforces
Separation of data/raw, features/, models/, pipelines/. Pinned lockfiles. Auto-generated docs.
Why it's built this way
uv over pip/conda — faster resolution, reproducible lockfiles, no hidden global state that silently breaks other projects.
Convention over configuration — opinionated defaults reduce decision fatigue at project start without blocking customisation.
Production layout, not Kaggle layout — raw vs processed data separation enforced structurally, not by honour system.
Sphinx by default — documentation is a first-class deliverable. Auto-generated from docstrings, zero extra setup.
credit-risk-classifier — credit approval system
credit risk · classification

Predict loan default probability from applicant data, with interpretability as a first-class constraint — not an afterthought.

Approach
Logistic Regression + Random Forest with feature importance analysis, post-training probability calibration, and business-aligned threshold tuning via F-beta optimisation.
Key constraint
Decision threshold is a business parameter, not a model constant — decoupled into config so decisioning teams can adjust without triggering a retraining cycle.
Engineering decisions
  • Chose LR + RF over GBTs — feature importances must be explainable to non-technical stakeholders in credit contexts.
  • Probability calibration applied explicitly — raw classifier scores are not probabilities; treating them as such breaks downstream scoring.
  • Stratified k-fold CV to handle class imbalance without oversampling artefacts.
Result AUC 0.81 with interpretable models. Stakeholder trust, not benchmark performance, is the success criterion here.
retainml — churn prediction system
retention · supervised ML

Predict customer abandonment probability and output ranked intervention lists — the distinction between a model result and an operationally usable product.

Approach
Supervised classification with recency/frequency feature engineering over rolling time windows. Output is a calibrated risk score, not a binary label — downstream decisioning is decoupled from the model layer.
Architecture decision
Business logic (who to contact, what offer to assign) lives in a separate rule layer. Model retraining and rule changes have different cadences — they should not be coupled.
Engineering decisions
  • Rolling window features with Polars — avoids materialising wide intermediate DataFrames on large customer bases.
  • AUC-ROC for model selection; Precision@K for operational utility — these answer different questions and both matter.
  • Score output, not label: operations teams need to prioritise a finite call list, not act on a binary flag.
global-exclusion-risk-ml — socioeconomic clustering
unsupervised ML · clustering

Identify latent risk patterns across socioeconomic indicators for countries — without a predefined label or ground truth.

Approach
K-means + hierarchical clustering with dimensionality reduction for visualisation. Cluster validation via silhouette analysis — k selection is principled, not arbitrary.
Evaluation challenge
No ground truth labels. Internal metrics (silhouette, inertia) plus domain plausibility checks. A mathematically valid cluster that is semantically nonsensical has no utility.
Engineering decisions
  • Features normalised before clustering — Euclidean distance is scale-sensitive; unnormalised indicators break the distance metric.
  • Silhouette scores documented to show k selection is principled, not arbitrary.
  • Results cross-referenced against domain knowledge — cluster validity is necessary but not sufficient for usefulness.
Stock-Market-Prediction — financial time series
time series · classification

Predict directional price movements in a non-stationary, low signal-to-noise domain where most published results fail out-of-sample.

Approach
Binary classification with strict temporal validation — walk-forward CV to simulate realistic deployment conditions. No random splits that cause future data to leak into training.
Honest framing
Marginal beat over naïve baseline is the expected outcome in efficient markets, not a failure. The value is the rigorous evaluation setup, not the model performance.
Engineering decisions
  • Temporal validation is non-negotiable — random k-fold on time series is data leakage that inflates every single metric.
  • Baseline comparison required — results that don't beat a naïve majority-class predictor are not results.
Result Marginal improvement over baseline. The project demonstrates evaluation rigour in a domain where most practitioners overfit silently.
rps-predictive-agent — behavioural pattern detection
online learning · agents

Detect and exploit statistical regularities in human behavioural sequences — a controlled environment to study online frequency estimation and adaptive decision-making.

Approach
Agent that builds frequency models of opponent move history, uses them to bias action selection, and updates online as new observations arrive — no batch retraining.
Engineering value
Core pattern — online frequency estimation → decision under uncertainty → adaptive update — is structurally identical to recommendation engines and multi-armed bandit systems.
Engineering decisions
  • Intentionally no ML libraries — forces explicit implementation of the probabilistic mechanism rather than wrapping a library call.
  • Architecture is the signal: demonstrates sequential decision-making under uncertainty, not game-playing ability.
03
stack Capability model
Core engineering
Python SQL PL/SQL Hive Linux Docker Git Bash uv FastAPI Nginx
ML / Data pipeline
Scikit-learn PyTorch Polars Pandas NumPy Cookiecutter Sphinx Matplotlib Seaborn
Cloud / Infrastructure
Azure MySQL SQLAlchemy Apache NoSQL Power BI
Azure Data Fundamentals Power BI / DAX ASIR
Domain knowledge
Credit risk Churn modelling Time series Clustering Calibration Leakage detection Climate / social AI
What I can build
End-to-end supervised pipelines (classification + regression) Reproducible project environments with versioned dependencies via uv Feature engineering for tabular financial and behavioural data Evaluation frameworks with explicit data leakage controls Batch scoring systems designed for operational consumption Containerised ML services (Docker + FastAPI) for cloud or on-premise Business outputs — calibrated probabilities, ranked scores, configurable thresholds Unsupervised pattern discovery with principled cluster validation