Alejandro Cancelas — ML Engineer

projects Selected systems

Featured project

A production-grade Data Science project scaffold built to eliminate environment drift, enforce reproducibility, and standardise the structure of every ML project from day one.

Problem solved

Repetitive setup and inconsistent structure are the primary cause of reproducibility failure — not the models.

Stack

uv · Copier · Sphinx · Polars/Pandas dual support · structured layout

What it enforces

Separation of data/raw, features/, models/, pipelines/. Pinned lockfiles. Auto-generated docs.

Why it's built this way

→ uv over pip/conda — faster resolution, reproducible lockfiles, no hidden global state that silently breaks other projects.

→ Convention over configuration — opinionated defaults reduce decision fatigue at project start without blocking customisation.

→ Production layout, not Kaggle layout — raw vs processed data separation enforced structurally, not by honour system.

→ Sphinx by default — documentation is a first-class deliverable. Auto-generated from docstrings, zero extra setup.

credit-risk-classifier — credit approval system

credit risk · classification

Predict loan default probability from applicant data, with interpretability as a first-class constraint — not an afterthought.

Approach

Logistic Regression + Random Forest with feature importance analysis, post-training probability calibration, and business-aligned threshold tuning via F-beta optimisation.

Key constraint

Decision threshold is a business parameter, not a model constant — decoupled into config so decisioning teams can adjust without triggering a retraining cycle.

Engineering decisions

Chose LR + RF over GBTs — feature importances must be explainable to non-technical stakeholders in credit contexts.
Probability calibration applied explicitly — raw classifier scores are not probabilities; treating them as such breaks downstream scoring.
Stratified k-fold CV to handle class imbalance without oversampling artefacts.

Result AUC 0.81 with interpretable models. Stakeholder trust, not benchmark performance, is the success criterion here.

retainml — churn prediction system

retention · supervised ML

Predict customer abandonment probability and output ranked intervention lists — the distinction between a model result and an operationally usable product.

Approach

Supervised classification with recency/frequency feature engineering over rolling time windows. Output is a calibrated risk score, not a binary label — downstream decisioning is decoupled from the model layer.

Architecture decision

Business logic (who to contact, what offer to assign) lives in a separate rule layer. Model retraining and rule changes have different cadences — they should not be coupled.

Engineering decisions

Rolling window features with Polars — avoids materialising wide intermediate DataFrames on large customer bases.
AUC-ROC for model selection; Precision@K for operational utility — these answer different questions and both matter.
Score output, not label: operations teams need to prioritise a finite call list, not act on a binary flag.

global-exclusion-risk-ml — socioeconomic clustering

unsupervised ML · clustering

Identify latent risk patterns across socioeconomic indicators for countries — without a predefined label or ground truth.

Approach

K-means + hierarchical clustering with dimensionality reduction for visualisation. Cluster validation via silhouette analysis — k selection is principled, not arbitrary.

Evaluation challenge

No ground truth labels. Internal metrics (silhouette, inertia) plus domain plausibility checks. A mathematically valid cluster that is semantically nonsensical has no utility.

Engineering decisions

Features normalised before clustering — Euclidean distance is scale-sensitive; unnormalised indicators break the distance metric.
Silhouette scores documented to show k selection is principled, not arbitrary.
Results cross-referenced against domain knowledge — cluster validity is necessary but not sufficient for usefulness.

Stock-Market-Prediction — financial time series

time series · classification

Predict directional price movements in a non-stationary, low signal-to-noise domain where most published results fail out-of-sample.

Approach

Binary classification with strict temporal validation — walk-forward CV to simulate realistic deployment conditions. No random splits that cause future data to leak into training.

Honest framing

Marginal beat over naïve baseline is the expected outcome in efficient markets, not a failure. The value is the rigorous evaluation setup, not the model performance.

Engineering decisions

Temporal validation is non-negotiable — random k-fold on time series is data leakage that inflates every single metric.
Baseline comparison required — results that don't beat a naïve majority-class predictor are not results.

Result Marginal improvement over baseline. The project demonstrates evaluation rigour in a domain where most practitioners overfit silently.

rps-predictive-agent — behavioural pattern detection

online learning · agents

Detect and exploit statistical regularities in human behavioural sequences — a controlled environment to study online frequency estimation and adaptive decision-making.

Approach

Agent that builds frequency models of opponent move history, uses them to bias action selection, and updates online as new observations arrive — no batch retraining.

Engineering value

Core pattern — online frequency estimation → decision under uncertainty → adaptive update — is structurally identical to recommendation engines and multi-armed bandit systems.

Engineering decisions

Intentionally no ML libraries — forces explicit implementation of the probabilistic mechanism rather than wrapping a library call.
Architecture is the signal: demonstrates sequential decision-making under uncertainty, not game-playing ability.

Building ML systems thatwork beyond the notebook.

Building ML systems that
work beyond the notebook.