Rafael COELHO
← Work

NOVEMBER 2023

Formula 1 Analytics

F1 analytics platform combining historical narrative, statistical insights, full season reviews, and an AI Space applying ML to race data — running live on Streamlit with end-to-end data pipelines.

Outcome
Four-section live platform · public Streamlit app · ML on race outcomes + pit strategy modeling · sandbox that doubled as my testbed for sports-analytics tooling.
Python Streamlit pandas NumPy scikit-learn Plotly matplotlib Ergast F1 API

Executive summary

Formula 1 Analytics is an end-to-end data platform for the world of F1 racing — historical narrative, statistical pattern analysis, full season reviews, and an AI Space applying ML to race-outcome prediction and pit strategy modeling. The platform is publicly live at f1analytics.streamlit.app — anyone can open it and explore.

The project also served as a deliberate sandbox: every new data-science library or ML framework I wanted to test got prototyped here against real F1 race data before going into production work.

See it deployed

The Streamlit app is the primary live verification — just open it. These 19 slides are the deeper deployment record: the four sections in operation, the AI Space delivering predictions on real seasons, and the visualization patterns used throughout.

Loading viewer…

Four sections, one data substrate

SectionWhat it deliversWhy it earns the slot
OverviewF1 history, iconic circuits, season legacy storytellingHook for non-stat-heads — narrative makes the data accessible
InsightsStatistical pattern analysis on track and driver performanceWhere the quantitative fans land — distributions, correlations, anomalies
SeasonsPer-season recaps with race strategies, driver arcs, contextCombines narrative + stats — the season as both story and dataset
AI SpaceML applied to race-outcome prediction and pit-stop strategy modelingThe differentiator — most F1 sites surface stats; few apply ML on top

What this project really was

A production sandbox for testing tooling. Every time I wanted to evaluate a new library — a different plotting backend, a new tabular-data framework, a fresh ML algorithm — I prototyped it here against real F1 data before deciding whether it belonged in client / production work. The cost was low (race data is clean and public via Ergast), the iteration was fast, and the public Streamlit app meant others could see and use what I built.

Several stack choices for later projects (Plotly over matplotlib for interactive charts, Streamlit for fast prototyping of analytical UIs) originated here.

Stack

  • Python — implementation
  • Streamlit — public-facing live interface
  • pandas + NumPy — data wrangling
  • scikit-learn — ML for race-outcome prediction and pit strategy modeling
  • Plotly + matplotlib — interactive and static visualizations
  • Ergast F1 API — open historical race data source

What this project proves

  • End-to-end ownership of a live public product — data ingestion through ML through a live Streamlit UI, all maintained by one engineer
  • Sandbox discipline — using a personal project as a deliberate testbed for tooling decisions that later landed in production work
  • Sports analytics as a domain — adds another vertical to the breadth of domains shipped (Logistics, Finance, Real Estate, CV, Cybersecurity, Sports)

Live demo → · Source on GitHub →