NOVEMBER 2023
Formula 1 Analytics
F1 analytics platform combining historical narrative, statistical insights, full season reviews, and an AI Space applying ML to race data — running live on Streamlit with end-to-end data pipelines.
Executive summary
Formula 1 Analytics is an end-to-end data platform for the world of F1 racing — historical narrative, statistical pattern analysis, full season reviews, and an AI Space applying ML to race-outcome prediction and pit strategy modeling. The platform is publicly live at f1analytics.streamlit.app — anyone can open it and explore.
The project also served as a deliberate sandbox: every new data-science library or ML framework I wanted to test got prototyped here against real F1 race data before going into production work.
See it deployed
The Streamlit app is the primary live verification — just open it. These 19 slides are the deeper deployment record: the four sections in operation, the AI Space delivering predictions on real seasons, and the visualization patterns used throughout.
Four sections, one data substrate
| Section | What it delivers | Why it earns the slot |
|---|---|---|
| Overview | F1 history, iconic circuits, season legacy storytelling | Hook for non-stat-heads — narrative makes the data accessible |
| Insights | Statistical pattern analysis on track and driver performance | Where the quantitative fans land — distributions, correlations, anomalies |
| Seasons | Per-season recaps with race strategies, driver arcs, context | Combines narrative + stats — the season as both story and dataset |
| AI Space | ML applied to race-outcome prediction and pit-stop strategy modeling | The differentiator — most F1 sites surface stats; few apply ML on top |
What this project really was
A production sandbox for testing tooling. Every time I wanted to evaluate a new library — a different plotting backend, a new tabular-data framework, a fresh ML algorithm — I prototyped it here against real F1 data before deciding whether it belonged in client / production work. The cost was low (race data is clean and public via Ergast), the iteration was fast, and the public Streamlit app meant others could see and use what I built.
Several stack choices for later projects (Plotly over matplotlib for interactive charts, Streamlit for fast prototyping of analytical UIs) originated here.
Stack
- Python — implementation
- Streamlit — public-facing live interface
- pandas + NumPy — data wrangling
- scikit-learn — ML for race-outcome prediction and pit strategy modeling
- Plotly + matplotlib — interactive and static visualizations
- Ergast F1 API — open historical race data source
What this project proves
- End-to-end ownership of a live public product — data ingestion through ML through a live Streamlit UI, all maintained by one engineer
- Sandbox discipline — using a personal project as a deliberate testbed for tooling decisions that later landed in production work
- Sports analytics as a domain — adds another vertical to the breadth of domains shipped (Logistics, Finance, Real Estate, CV, Cybersecurity, Sports)