SEPTEMBER 2024

COELHO GenAI

Privacy-first Generative AI platform connecting users to open-source LLMs (Llama 3.1, Gemma 2) via Ollama or Groq — with five purpose-built tools spanning chat, retrieval, autonomous data science, document analysis, and agentic plan-and-solve.

Outcome

Five distinct GenAI tools shipped end-to-end · runs fully local on Ollama or low-latency via Groq · autonomous data-science agents and document analysis via Docling.

LangChain Ollama Groq Llama 3.1 Gemma 2 Docling DuckDuckGo Wikipedia PubMed Streamlit Python

Source ↗ Presentation ↗

Executive summary

COELHO GenAI connects users to open-source Large Language Models — Llama 3.1 (Meta), Gemma 2 (Google), and others — through Ollama for fully local inference or Groq for low-latency hosted serving. Five purpose-built tools sit on top of that LLM substrate:

Assistant — general-purpose chatbot
Information Retrieval — LLM-grounded online tool calling (DuckDuckGo, Wikipedia, PubMed)
Data Science — autonomous agents for data exploration and modeling
Document Assistant — document RAG via Docling + LangChain document loaders
Plan & Solve — agentic decomposition of complex requests into executable strategy plans

The platform’s design principle is LLM choice + privacy: pick where your tokens go (local on Ollama, hosted on Groq) and which model runs them.

See it deployed

The platform needs Ollama running locally OR Groq API access plus a Python environment with the full LangChain stack — not trivial to spin up casually. These 13 slides are the verifiable record of the five tools running with real workflows: live chat, web-grounded retrieval, autonomous data-science notebook generation, document RAG, and agentic strategy planning. Navigate with arrows or open fullscreen for the full read.

Loading viewer…

Open PDF in new tab ↗

Five tools, one local LLM substrate

Tool	What it does	Distinctive design
Assistant	General chatbot for brainstorming, problem-solving, ideation	Runs against your chosen local or hosted LLM — no vendor lock-in
Information Retrieval	LLM-grounded online tool-calling — DuckDuckGo (web), Wikipedia (encyclopedia), PubMed (medical literature)	Maintains privacy by routing LLM reasoning locally while only the queries go out
Data Science	Autonomous AI agents explore and analyze user-supplied data — EDA, plots, modeling suggestions	LLM-driven notebook generation grounded in the actual dataset
Document Assistant	Document RAG powered by Docling + a dozen LangChain document loaders (PDFs, web pages, Wikipedia, more)	Docling handles the hard part: extracting structure from messy PDFs into clean Markdown for retrieval
Plan & Solve	Agentic decomposition: turns a user request into an executable, step-by-step strategy plan	Acts as a meta-tool — produces a plan others tools or the user can execute

Why “privacy-first” matters here

Most GenAI tools in 2024 forced you to send every prompt to OpenAI / Anthropic. That worked for individuals; it failed for organizations handling sensitive documents (legal, medical, internal R&D). COELHO GenAI lets the user keep inference fully local on Ollama when needed, swap to Groq for low-latency public-data use cases, or mix providers per tool. The substrate is the same; the trust boundary moves with the task.

This is the same multi-provider routing principle that became the production rotator in COELHO Nexus — just an earlier, narrower form.

Stack

LangChain — agent and tool composition primitives, document loaders
Ollama — fully local LLM inference
Groq — low-latency hosted inference for open models
Llama 3.1 (Meta) and Gemma 2 (Google) — open foundation models
Docling — structure-aware document parsing for RAG
Streamlit — interactive interface
Python — implementation

What this project proves

Five orthogonal GenAI patterns in one codebase — chat, retrieval, autonomous data science, document RAG, and agentic planning — running on the same LLM substrate
Local-first LLMOps was viable in 2024 — Ollama + open models gave a real privacy story before the LLMOps category had a name
Multi-provider routing started here — the Ollama/Groq toggle pattern became the production rotator powering COELHO Nexus

Source on GitHub →