Rafael COELHO
← Work

SEPTEMBER 2024

COELHO GenAI

Privacy-first Generative AI platform connecting users to open-source LLMs (Llama 3.1, Gemma 2) via Ollama or Groq — with five purpose-built tools spanning chat, retrieval, autonomous data science, document analysis, and agentic plan-and-solve.

Outcome
Five distinct GenAI tools shipped end-to-end · runs fully local on Ollama or low-latency via Groq · autonomous data-science agents and document analysis via Docling.
LangChain Ollama Groq Llama 3.1 Gemma 2 Docling DuckDuckGo Wikipedia PubMed Streamlit Python

Executive summary

COELHO GenAI connects users to open-source Large Language Models — Llama 3.1 (Meta), Gemma 2 (Google), and others — through Ollama for fully local inference or Groq for low-latency hosted serving. Five purpose-built tools sit on top of that LLM substrate:

  1. Assistant — general-purpose chatbot
  2. Information Retrieval — LLM-grounded online tool calling (DuckDuckGo, Wikipedia, PubMed)
  3. Data Science — autonomous agents for data exploration and modeling
  4. Document Assistant — document RAG via Docling + LangChain document loaders
  5. Plan & Solve — agentic decomposition of complex requests into executable strategy plans

The platform’s design principle is LLM choice + privacy: pick where your tokens go (local on Ollama, hosted on Groq) and which model runs them.

See it deployed

The platform needs Ollama running locally OR Groq API access plus a Python environment with the full LangChain stack — not trivial to spin up casually. These 13 slides are the verifiable record of the five tools running with real workflows: live chat, web-grounded retrieval, autonomous data-science notebook generation, document RAG, and agentic strategy planning. Navigate with arrows or open fullscreen for the full read.

Loading viewer…

Five tools, one local LLM substrate

ToolWhat it doesDistinctive design
AssistantGeneral chatbot for brainstorming, problem-solving, ideationRuns against your chosen local or hosted LLM — no vendor lock-in
Information RetrievalLLM-grounded online tool-calling — DuckDuckGo (web), Wikipedia (encyclopedia), PubMed (medical literature)Maintains privacy by routing LLM reasoning locally while only the queries go out
Data ScienceAutonomous AI agents explore and analyze user-supplied data — EDA, plots, modeling suggestionsLLM-driven notebook generation grounded in the actual dataset
Document AssistantDocument RAG powered by Docling + a dozen LangChain document loaders (PDFs, web pages, Wikipedia, more)Docling handles the hard part: extracting structure from messy PDFs into clean Markdown for retrieval
Plan & SolveAgentic decomposition: turns a user request into an executable, step-by-step strategy planActs as a meta-tool — produces a plan others tools or the user can execute

Why “privacy-first” matters here

Most GenAI tools in 2024 forced you to send every prompt to OpenAI / Anthropic. That worked for individuals; it failed for organizations handling sensitive documents (legal, medical, internal R&D). COELHO GenAI lets the user keep inference fully local on Ollama when needed, swap to Groq for low-latency public-data use cases, or mix providers per tool. The substrate is the same; the trust boundary moves with the task.

This is the same multi-provider routing principle that became the production rotator in COELHO Nexus — just an earlier, narrower form.

Stack

  • LangChain — agent and tool composition primitives, document loaders
  • Ollama — fully local LLM inference
  • Groq — low-latency hosted inference for open models
  • Llama 3.1 (Meta) and Gemma 2 (Google) — open foundation models
  • Docling — structure-aware document parsing for RAG
  • Streamlit — interactive interface
  • Python — implementation

What this project proves

  • Five orthogonal GenAI patterns in one codebase — chat, retrieval, autonomous data science, document RAG, and agentic planning — running on the same LLM substrate
  • Local-first LLMOps was viable in 2024 — Ollama + open models gave a real privacy story before the LLMOps category had a name
  • Multi-provider routing started here — the Ollama/Groq toggle pattern became the production rotator powering COELHO Nexus

Source on GitHub →