Skip to content

Technician Cheat Sheet

Technician Cheat Sheet: Local LLM Deployment

Section titled “Technician Cheat Sheet: Local LLM Deployment”

Version: 1.01.26
Audience: Technician / Engineer


Use this as a fast mapping:

  • SOP #1 — LLM Container + Goose UI

    • When: UI needed, Docker allowed, client cares about isolation.
    • Notes: Docker/LLM_Inference, Goose installed on host.
  • SOP #2 — Terminal-Only LLM Container

    • When: Max privacy, technical user, CLI OK.
    • Notes: Docker/LLM_Inference, curl/Invoke-RestMethod.
  • SOP #3 — LM Studio Local Runner

    • When: Easiest local ChatGPT-style app, single user.
    • Notes: No Docker, Windows/Linux app.
  • SOP #4 — Goose + n8n + LLM+Agent Containers

    • When: Automation / scheduling needed (cron-style workflows).
    • Notes: Docker/LLM_Agent_Stack, Docker/Automations_n8n.
  • SOP #5 — Goose Standalone (Windows)

    • When: Non-technical Windows user, wants one app and no Docker.
    • Notes: Windows-only, 7–8B model suggested for 8 GB VRAM.

  • < 8 GB VRAM

    • Use 3B–7B Q4 models.
    • Prefer LM Studio or Goose Standalone; keep context small.
  • 8–12 GB VRAM

    • 7B–8B Q4 models comfortable, 14B possible with trade-offs.
    • All SOPs possible; choose by UX and complexity.
  • 12–24 GB VRAM

    • 8B–14B Q4 models are fine.
    • Container-based solutions work well (SOP #1/#2/#4).
  • > 24 GB VRAM

    • High-end or professional cards.
    • Any SOP, heavy workloads, long contexts.
  • AMD GPU

    • Assume CPU fallback unless explicitly validated.
    • Do not promise GPU acceleration.

Terminal window
cd Docker/LLM_Inference
docker compose up -d
docker compose down

For n8n + Agent:

Terminal window
cd Docker/LLM_Agent_Stack
docker compose up -d
cd ../Automations_n8n
docker compose up -d
Terminal window
curl http://localhost:8000/v1/models

Windows alt:

Terminal window
Invoke-RestMethod -Uri "http://localhost:8000/v1/models" -Method Get

If data is doctor–patient, lawyer–client, privileged legal, PHI, or Secret-class:

  • Prefer SOP #2 (Terminal-Only) or SOP #3 (LM Studio).
  • If Goose is ever used, firewall it completely from the internet and document the exception.

  • Promising AMD GPU support (always caveat).
  • Forgetting to mount Models/ directory.
  • Using synced folders (OneDrive/Dropbox) for model storage.
  • Enabling cloud providers in LM Studio or Goose without explicit client sign-off.

  • Ensure the SOP version in use is 1.01.26.
  • If making local changes, bump version as per scheme (e.g., 1.012.26 for second revision in same month/year).