SEPTEMBER 2024
COELHO GenAI
Privacy-first Generative AI platform connecting users to open-source LLMs (Llama 3.1, Gemma 2) via Ollama or Groq — with five purpose-built tools spanning chat, retrieval, autonomous data science, document analysis, and agentic plan-and-solve.
Executive summary
COELHO GenAI connects users to open-source Large Language Models — Llama 3.1 (Meta), Gemma 2 (Google), and others — through Ollama for fully local inference or Groq for low-latency hosted serving. Five purpose-built tools sit on top of that LLM substrate:
- Assistant — general-purpose chatbot
- Information Retrieval — LLM-grounded online tool calling (DuckDuckGo, Wikipedia, PubMed)
- Data Science — autonomous agents for data exploration and modeling
- Document Assistant — document RAG via Docling + LangChain document loaders
- Plan & Solve — agentic decomposition of complex requests into executable strategy plans
The platform’s design principle is LLM choice + privacy: pick where your tokens go (local on Ollama, hosted on Groq) and which model runs them.
See it deployed
The platform needs Ollama running locally OR Groq API access plus a Python environment with the full LangChain stack — not trivial to spin up casually. These 13 slides are the verifiable record of the five tools running with real workflows: live chat, web-grounded retrieval, autonomous data-science notebook generation, document RAG, and agentic strategy planning. Navigate with arrows or open fullscreen for the full read.
Five tools, one local LLM substrate
| Tool | What it does | Distinctive design |
|---|---|---|
| Assistant | General chatbot for brainstorming, problem-solving, ideation | Runs against your chosen local or hosted LLM — no vendor lock-in |
| Information Retrieval | LLM-grounded online tool-calling — DuckDuckGo (web), Wikipedia (encyclopedia), PubMed (medical literature) | Maintains privacy by routing LLM reasoning locally while only the queries go out |
| Data Science | Autonomous AI agents explore and analyze user-supplied data — EDA, plots, modeling suggestions | LLM-driven notebook generation grounded in the actual dataset |
| Document Assistant | Document RAG powered by Docling + a dozen LangChain document loaders (PDFs, web pages, Wikipedia, more) | Docling handles the hard part: extracting structure from messy PDFs into clean Markdown for retrieval |
| Plan & Solve | Agentic decomposition: turns a user request into an executable, step-by-step strategy plan | Acts as a meta-tool — produces a plan others tools or the user can execute |
Why “privacy-first” matters here
Most GenAI tools in 2024 forced you to send every prompt to OpenAI / Anthropic. That worked for individuals; it failed for organizations handling sensitive documents (legal, medical, internal R&D). COELHO GenAI lets the user keep inference fully local on Ollama when needed, swap to Groq for low-latency public-data use cases, or mix providers per tool. The substrate is the same; the trust boundary moves with the task.
This is the same multi-provider routing principle that became the production rotator in COELHO Nexus — just an earlier, narrower form.
Stack
- LangChain — agent and tool composition primitives, document loaders
- Ollama — fully local LLM inference
- Groq — low-latency hosted inference for open models
- Llama 3.1 (Meta) and Gemma 2 (Google) — open foundation models
- Docling — structure-aware document parsing for RAG
- Streamlit — interactive interface
- Python — implementation
What this project proves
- Five orthogonal GenAI patterns in one codebase — chat, retrieval, autonomous data science, document RAG, and agentic planning — running on the same LLM substrate
- Local-first LLMOps was viable in 2024 — Ollama + open models gave a real privacy story before the LLMOps category had a name
- Multi-provider routing started here — the Ollama/Groq toggle pattern became the production rotator powering COELHO Nexus