JANUARY 2026

COELHO RealTime

Production-grade Real-Time MLOps platform on Kubernetes combining incremental ML and batch ML across fraud detection, ETA prediction, and customer segmentation.

Outcome

39 FastAPI endpoints across three versioned routers · sub-millisecond Redis-cached inference · end-to-end GitOps (Terraform → Helm → ArgoCD → GitLab CI) · full observability stack (Prometheus + Grafana + Alertmanager + Karma).

Kubernetes k3d Kafka Spark Delta Lake MinIO MLflow River ML CatBoost scikit-learn FastAPI SvelteKit Redis PostgreSQL Prometheus Grafana Alertmanager Terraform Helm ArgoCD GitLab CI DuckDB

Source ↗ Presentation ↗

Executive summary

COELHO RealTime is a production-grade Real-Time MLOps platform running on Kubernetes that combines incremental machine learning with batch learning to solve three concurrent ML use cases:

Transaction Fraud Detection (TFD) — binary classification
Estimated Time of Arrival (ETA) — regression
E-Commerce Customer Interactions (ECCI) — clustering

The platform implements a dual ML paradigm where River ML handles real-time incremental training directly from Kafka streams, while CatBoost and scikit-learn handle batch training on data accumulated in a Delta Lake. All experiments are tracked with MLflow, models are cached in Redis for sub-millisecond inference, and the entire system is monitored through Prometheus, Grafana, Alertmanager, and Karma.

See it deployed

The platform demands 16+ GB RAM to run, so you can’t spin it up casually — these 53 slides are the verifiable record of the system running in production: live dashboards, alert routing, MLflow tracking, CI/CD flows. Navigate with arrows or open fullscreen for the full read.

Loading viewer…

Open PDF in new tab ↗

Platform architecture

Data flow

Generation — Kafka producers emit realistic synthetic data for all three use cases via Faker
Streaming — data lands in Kafka topics (KRaft mode, 3 partitions each, 1-week retention)
Incremental processing — FastAPI consumers read Kafka streams and train River models in real time
Batch processing — Spark Structured Streaming writes to Delta Lake on MinIO; DuckDB preprocesses for CatBoost / sklearn training
Inference — trained models cached in Redis for sub-millisecond predictions
Tracking — every experiment logged to MLflow with S3 artifacts on MinIO
Monitoring — Prometheus scrapes metrics from all services; Grafana renders dashboards; Alertmanager fires alerts

ML use cases at a glance

Use case	Task	Incremental model	Batch model	Metric
TFD Transaction Fraud Detection	Binary classification	Adaptive Random Forest (River)	CatBoostClassifier	F-Beta (β = 2.0)
ETA Estimated Time of Arrival	Regression	Adaptive Random Forest (River)	CatBoostRegressor	MAE
ECCI E-Commerce Customer Interactions	Clustering	DBSTREAM (River)	KMeans (scikit-learn)	Silhouette

Key components

Unified FastAPI backend

A single service consolidates all ML functionality into 39 endpoints across three versioned routers:

Router	Purpose	Endpoints
`/api/v1/incremental`	River ML real-time training, predictions, metrics	16
`/api/v1/batch`	CatBoost / sklearn batch training + YellowBrick / Scikit-Plot visualizations	20
`/api/v1/sql`	DuckDB SQL queries against Delta Lake tables	3

Includes MLflow model selection (best model by metric), Redis caching, Prometheus instrumentation, and visualization generation on demand.

SvelteKit frontend

Interactive dashboard for training, predictions, and model diagnostics via YellowBrick. Project pages for TFD, ETA, and ECCI with nested tabs (Incremental ML / Batch ML / SQL).

Infrastructure & deployment

The entire platform is deployed on a k3d Kubernetes cluster provisioned with Terraform and packaged as a Helm umbrella chart with seven dependencies:

Dependency	Version	Source
MLflow	1.8.1	community-charts
Redis	24.0.8	Bitnami
MinIO	5.4.0	MinIO Official
PostgreSQL	18.1.14	Bitnami
kube-prometheus-stack	80.6.0	prometheus-community
Kafka	32.4.3	Bitnami
Spark	10.0.3	Bitnami

CI/CD & GitOps

Developer push → GitLab CI builds images → Push to registry
                                                ↓
ArgoCD auto-sync ← Git commit [skip ci] ← ArgoCD Image Updater detects new tags
       ↓
Deploy to cluster → Prometheus monitors → Grafana dashboards → Alertmanager

GitLab CI — automated container image builds on commit
ArgoCD — GitOps continuous delivery with automated Kubernetes sync
ArgoCD Image Updater — automatic detection & deployment of new image versions

Observability

Prometheus — 50+ custom metrics

Instrumented across all services:

FastAPI — training status, prediction count / latency / errors, cache hits/misses, model load duration, MLflow operation duration, SQL query duration, visualization generation time
Kafka producers — messages sent, errors, send duration, connection status, retries, fraud ratio (TFD), active sessions (ECCI)

Grafana — 11 dashboards

All provisioned via ConfigMaps with sidecar auto-discovery:

COELHORealTime Overview — service health, total CPU/RAM aggregate panels with sparklines
ML Pipeline — training metrics, predictions, model performance
FastAPI Detailed — latency, error rates, throughput per endpoint
Kafka Producers — message rates, send latency, errors, connections
Kafka — consumer lag, throughput, partitions
PostgreSQL — connections, queries, replication
Redis — memory, connections, ops/sec
MinIO — S3 operations, storage, buckets
Spark — performance metrics
Spark Streaming — structured streaming metrics
SvelteKit — frontend performance

Alerting

30+ Prometheus alerting rules across 10 rule groups (FastAPI, Kafka, Kafka Producers, MLflow, PostgreSQL, Redis, MinIO, SvelteKit, Spark, Application General)
Alertmanager with routing and inhibition rules
Karma UI for alert visualization
Pre-configured receivers for Slack, Discord, Email, and PagerDuty

Hardware footprint

Component	Specification
RAM	64 GB
CPU	8 cores (modern x86_64)
Storage	2 TB NVMe SSD
Orchestration	k3d (local Kubernetes)

Minimum 32 GB RAM required. The platform runs multiple memory-intensive services concurrently (FastAPI ~8 GB, Spark Worker ~5 GB, Kafka ~2 GB, Prometheus ~2 GB, etc.). Systems with less than 32 GB will experience OOMKills and pod restart loops. 64 GB recommended for comfortable headroom.

What this project proves

End-to-end MLOps lifecycle ownership — data engineering → training → serving → observability, all owned by one engineer in one repo
Real-time and batch ML co-existing in a single platform — not theoretical; tested under continuous synthetic load
Production rigor — tracing, alerting, GitOps, IaC, dashboards, alerting routing — not a notebook demo

Source on GitHub →