Technician Cheat Sheet: Local LLM Deployment¶
Version: 1.01.26
Audience: Technician / Engineer
1. Quick SOP Selector¶
Use this as a fast mapping:
- SOP #1 — LLM Container + Goose UI
- When: UI needed, Docker allowed, client cares about isolation.
-
Notes:
Docker/LLM_Inference, Goose installed on host. -
SOP #2 — Terminal-Only LLM Container
- When: Max privacy, technical user, CLI OK.
-
Notes:
Docker/LLM_Inference,curl/Invoke-RestMethod. -
SOP #3 — LM Studio Local Runner
- When: Easiest local ChatGPT-style app, single user.
-
Notes: No Docker, Windows/Linux app.
-
SOP #4 — Goose + n8n + LLM+Agent Containers
- When: Automation / scheduling needed (cron-style workflows).
-
Notes:
Docker/LLM_Agent_Stack,Docker/Automations_n8n. -
SOP #5 — Goose Standalone (Windows)
- When: Non-technical Windows user, wants one app and no Docker.
- Notes: Windows-only, 7–8B model suggested for 8 GB VRAM.
2. Hardware Triage¶
- < 8 GB VRAM
- Use 3B–7B Q4 models.
-
Prefer LM Studio or Goose Standalone; keep context small.
-
8–12 GB VRAM
- 7B–8B Q4 models comfortable, 14B possible with trade-offs.
-
All SOPs possible; choose by UX and complexity.
-
12–24 GB VRAM
- 8B–14B Q4 models are fine.
-
Container-based solutions work well (SOP #1/#2/#4).
-
> 24 GB VRAM
- High-end or professional cards.
-
Any SOP, heavy workloads, long contexts.
-
AMD GPU
- Assume CPU fallback unless explicitly validated.
- Do not promise GPU acceleration.
3. Common Commands (Reference)¶
Docker Start/Stop (Any SOP using Docker)¶
cd Docker/LLM_Inference
docker compose up -d
docker compose down
For n8n + Agent:
cd Docker/LLM_Agent_Stack
docker compose up -d
cd ../Automations_n8n
docker compose up -d
Quick Health Check¶
curl http://localhost:8000/v1/models
Windows alt:
Invoke-RestMethod -Uri "http://localhost:8000/v1/models" -Method Get
4. Extreme Sensitivity Rule of Thumb¶
If data is doctor–patient, lawyer–client, privileged legal, PHI, or Secret-class:
- Prefer SOP #2 (Terminal-Only) or SOP #3 (LM Studio).
- If Goose is ever used, firewall it completely from the internet and document the exception.
5. Pitfalls to Avoid¶
- Promising AMD GPU support (always caveat).
- Forgetting to mount
Models/directory. - Using synced folders (OneDrive/Dropbox) for model storage.
- Enabling cloud providers in LM Studio or Goose without explicit client sign-off.
6. Version / SOP Sync¶
- Ensure the SOP version in use is 1.01.26.
- If making local changes, bump version as per scheme (e.g.,
1.012.26for second revision in same month/year).