Skip to content

Technician Cheat Sheet: Local LLM Deployment

Version: 1.01.26
Audience: Technician / Engineer


1. Quick SOP Selector

Use this as a fast mapping:

  • SOP #1 — LLM Container + Goose UI
  • When: UI needed, Docker allowed, client cares about isolation.
  • Notes: Docker/LLM_Inference, Goose installed on host.

  • SOP #2 — Terminal-Only LLM Container

  • When: Max privacy, technical user, CLI OK.
  • Notes: Docker/LLM_Inference, curl/Invoke-RestMethod.

  • SOP #3 — LM Studio Local Runner

  • When: Easiest local ChatGPT-style app, single user.
  • Notes: No Docker, Windows/Linux app.

  • SOP #4 — Goose + n8n + LLM+Agent Containers

  • When: Automation / scheduling needed (cron-style workflows).
  • Notes: Docker/LLM_Agent_Stack, Docker/Automations_n8n.

  • SOP #5 — Goose Standalone (Windows)

  • When: Non-technical Windows user, wants one app and no Docker.
  • Notes: Windows-only, 7–8B model suggested for 8 GB VRAM.

2. Hardware Triage

  • < 8 GB VRAM
  • Use 3B–7B Q4 models.
  • Prefer LM Studio or Goose Standalone; keep context small.

  • 8–12 GB VRAM

  • 7B–8B Q4 models comfortable, 14B possible with trade-offs.
  • All SOPs possible; choose by UX and complexity.

  • 12–24 GB VRAM

  • 8B–14B Q4 models are fine.
  • Container-based solutions work well (SOP #1/#2/#4).

  • > 24 GB VRAM

  • High-end or professional cards.
  • Any SOP, heavy workloads, long contexts.

  • AMD GPU

  • Assume CPU fallback unless explicitly validated.
  • Do not promise GPU acceleration.

3. Common Commands (Reference)

Docker Start/Stop (Any SOP using Docker)

cd Docker/LLM_Inference
docker compose up -d
docker compose down

For n8n + Agent:

cd Docker/LLM_Agent_Stack
docker compose up -d

cd ../Automations_n8n
docker compose up -d

Quick Health Check

curl http://localhost:8000/v1/models

Windows alt:

Invoke-RestMethod -Uri "http://localhost:8000/v1/models" -Method Get

4. Extreme Sensitivity Rule of Thumb

If data is doctor–patient, lawyer–client, privileged legal, PHI, or Secret-class:

  • Prefer SOP #2 (Terminal-Only) or SOP #3 (LM Studio).
  • If Goose is ever used, firewall it completely from the internet and document the exception.

5. Pitfalls to Avoid

  • Promising AMD GPU support (always caveat).
  • Forgetting to mount Models/ directory.
  • Using synced folders (OneDrive/Dropbox) for model storage.
  • Enabling cloud providers in LM Studio or Goose without explicit client sign-off.

6. Version / SOP Sync

  • Ensure the SOP version in use is 1.01.26.
  • If making local changes, bump version as per scheme (e.g., 1.012.26 for second revision in same month/year).