Rafael COELHO
← Work

JANUARY 2026

COELHO RealTime

Production-grade Real-Time MLOps platform on Kubernetes combining incremental ML and batch ML across fraud detection, ETA prediction, and customer segmentation.

Outcome
39 FastAPI endpoints across three versioned routers · sub-millisecond Redis-cached inference · end-to-end GitOps (Terraform → Helm → ArgoCD → GitLab CI) · full observability stack (Prometheus + Grafana + Alertmanager + Karma).
Kubernetes k3d Kafka Spark Delta Lake MinIO MLflow River ML CatBoost scikit-learn FastAPI SvelteKit Redis PostgreSQL Prometheus Grafana Alertmanager Terraform Helm ArgoCD GitLab CI DuckDB

Executive summary

COELHO RealTime is a production-grade Real-Time MLOps platform running on Kubernetes that combines incremental machine learning with batch learning to solve three concurrent ML use cases:

  • Transaction Fraud Detection (TFD) — binary classification
  • Estimated Time of Arrival (ETA) — regression
  • E-Commerce Customer Interactions (ECCI) — clustering

The platform implements a dual ML paradigm where River ML handles real-time incremental training directly from Kafka streams, while CatBoost and scikit-learn handle batch training on data accumulated in a Delta Lake. All experiments are tracked with MLflow, models are cached in Redis for sub-millisecond inference, and the entire system is monitored through Prometheus, Grafana, Alertmanager, and Karma.

See it deployed

The platform demands 16+ GB RAM to run, so you can’t spin it up casually — these 53 slides are the verifiable record of the system running in production: live dashboards, alert routing, MLflow tracking, CI/CD flows. Navigate with arrows or open fullscreen for the full read.

Loading viewer…

Platform architecture

COELHO RealTime platform architecture — data flow from Kafka producers through dual incremental (River) and batch (CatBoost / sklearn) ML paths, with MLflow tracking, Redis serving, and full observability.

Data flow

  1. Generation — Kafka producers emit realistic synthetic data for all three use cases via Faker
  2. Streaming — data lands in Kafka topics (KRaft mode, 3 partitions each, 1-week retention)
  3. Incremental processing — FastAPI consumers read Kafka streams and train River models in real time
  4. Batch processing — Spark Structured Streaming writes to Delta Lake on MinIO; DuckDB preprocesses for CatBoost / sklearn training
  5. Inference — trained models cached in Redis for sub-millisecond predictions
  6. Tracking — every experiment logged to MLflow with S3 artifacts on MinIO
  7. Monitoring — Prometheus scrapes metrics from all services; Grafana renders dashboards; Alertmanager fires alerts

ML use cases at a glance

Use caseTaskIncremental modelBatch modelMetric
TFD Transaction Fraud DetectionBinary classificationAdaptive Random Forest (River)CatBoostClassifierF-Beta (β = 2.0)
ETA Estimated Time of ArrivalRegressionAdaptive Random Forest (River)CatBoostRegressorMAE
ECCI E-Commerce Customer InteractionsClusteringDBSTREAM (River)KMeans (scikit-learn)Silhouette

Key components

Unified FastAPI backend

A single service consolidates all ML functionality into 39 endpoints across three versioned routers:

RouterPurposeEndpoints
/api/v1/incrementalRiver ML real-time training, predictions, metrics16
/api/v1/batchCatBoost / sklearn batch training + YellowBrick / Scikit-Plot visualizations20
/api/v1/sqlDuckDB SQL queries against Delta Lake tables3

Includes MLflow model selection (best model by metric), Redis caching, Prometheus instrumentation, and visualization generation on demand.

SvelteKit frontend

Interactive dashboard for training, predictions, and model diagnostics via YellowBrick. Project pages for TFD, ETA, and ECCI with nested tabs (Incremental ML / Batch ML / SQL).

Infrastructure & deployment

The entire platform is deployed on a k3d Kubernetes cluster provisioned with Terraform and packaged as a Helm umbrella chart with seven dependencies:

DependencyVersionSource
MLflow1.8.1community-charts
Redis24.0.8Bitnami
MinIO5.4.0MinIO Official
PostgreSQL18.1.14Bitnami
kube-prometheus-stack80.6.0prometheus-community
Kafka32.4.3Bitnami
Spark10.0.3Bitnami

CI/CD & GitOps

Developer push → GitLab CI builds images → Push to registry

ArgoCD auto-sync ← Git commit [skip ci] ← ArgoCD Image Updater detects new tags

Deploy to cluster → Prometheus monitors → Grafana dashboards → Alertmanager
  • GitLab CI — automated container image builds on commit
  • ArgoCD — GitOps continuous delivery with automated Kubernetes sync
  • ArgoCD Image Updater — automatic detection & deployment of new image versions

Observability

Prometheus — 50+ custom metrics

Instrumented across all services:

  • FastAPI — training status, prediction count / latency / errors, cache hits/misses, model load duration, MLflow operation duration, SQL query duration, visualization generation time
  • Kafka producers — messages sent, errors, send duration, connection status, retries, fraud ratio (TFD), active sessions (ECCI)

Grafana — 11 dashboards

All provisioned via ConfigMaps with sidecar auto-discovery:

  1. COELHORealTime Overview — service health, total CPU/RAM aggregate panels with sparklines
  2. ML Pipeline — training metrics, predictions, model performance
  3. FastAPI Detailed — latency, error rates, throughput per endpoint
  4. Kafka Producers — message rates, send latency, errors, connections
  5. Kafka — consumer lag, throughput, partitions
  6. PostgreSQL — connections, queries, replication
  7. Redis — memory, connections, ops/sec
  8. MinIO — S3 operations, storage, buckets
  9. Spark — performance metrics
  10. Spark Streaming — structured streaming metrics
  11. SvelteKit — frontend performance

Alerting

  • 30+ Prometheus alerting rules across 10 rule groups (FastAPI, Kafka, Kafka Producers, MLflow, PostgreSQL, Redis, MinIO, SvelteKit, Spark, Application General)
  • Alertmanager with routing and inhibition rules
  • Karma UI for alert visualization
  • Pre-configured receivers for Slack, Discord, Email, and PagerDuty

Hardware footprint

ComponentSpecification
RAM64 GB
CPU8 cores (modern x86_64)
Storage2 TB NVMe SSD
Orchestrationk3d (local Kubernetes)

Minimum 32 GB RAM required. The platform runs multiple memory-intensive services concurrently (FastAPI ~8 GB, Spark Worker ~5 GB, Kafka ~2 GB, Prometheus ~2 GB, etc.). Systems with less than 32 GB will experience OOMKills and pod restart loops. 64 GB recommended for comfortable headroom.

What this project proves

  • End-to-end MLOps lifecycle ownership — data engineering → training → serving → observability, all owned by one engineer in one repo
  • Real-time and batch ML co-existing in a single platform — not theoretical; tested under continuous synthetic load
  • Production rigor — tracing, alerting, GitOps, IaC, dashboards, alerting routing — not a notebook demo

Source on GitHub →