Skip to content

Wizard Tech Vault

Technician Cheat Sheet: Local LLM Deployment

wizard-tech-vault

Technician Cheat Sheet: Local LLM Deployment¶

Version: 1.01.26
Audience: Technician / Engineer

1. Quick SOP Selector¶

Use this as a fast mapping:

SOP #1 — LLM Container + Goose UI
When: UI needed, Docker allowed, client cares about isolation.
Notes: Docker/LLM_Inference, Goose installed on host.
SOP #2 — Terminal-Only LLM Container
When: Max privacy, technical user, CLI OK.
Notes: Docker/LLM_Inference, curl/Invoke-RestMethod.
SOP #3 — LM Studio Local Runner
When: Easiest local ChatGPT-style app, single user.
Notes: No Docker, Windows/Linux app.
SOP #4 — Goose + n8n + LLM+Agent Containers
When: Automation / scheduling needed (cron-style workflows).
Notes: Docker/LLM_Agent_Stack, Docker/Automations_n8n.
SOP #5 — Goose Standalone (Windows)
When: Non-technical Windows user, wants one app and no Docker.
Notes: Windows-only, 7–8B model suggested for 8 GB VRAM.

2. Hardware Triage¶

< 8 GB VRAM
Use 3B–7B Q4 models.
Prefer LM Studio or Goose Standalone; keep context small.
8–12 GB VRAM
7B–8B Q4 models comfortable, 14B possible with trade-offs.
All SOPs possible; choose by UX and complexity.
12–24 GB VRAM
8B–14B Q4 models are fine.
Container-based solutions work well (SOP #1/#2/#4).
> 24 GB VRAM
High-end or professional cards.
Any SOP, heavy workloads, long contexts.
AMD GPU
Assume CPU fallback unless explicitly validated.
Do not promise GPU acceleration.

3. Common Commands (Reference)¶

Docker Start/Stop (Any SOP using Docker)¶

cd Docker/LLM_Inference
docker compose up -d
docker compose down

For n8n + Agent:

cd Docker/LLM_Agent_Stack
docker compose up -d

cd ../Automations_n8n
docker compose up -d

Quick Health Check¶

curl http://localhost:8000/v1/models

Windows alt:

Invoke-RestMethod -Uri "http://localhost:8000/v1/models" -Method Get

4. Extreme Sensitivity Rule of Thumb¶

If data is doctor–patient, lawyer–client, privileged legal, PHI, or Secret-class:

Prefer SOP #2 (Terminal-Only) or SOP #3 (LM Studio).
If Goose is ever used, firewall it completely from the internet and document the exception.

5. Pitfalls to Avoid¶

Promising AMD GPU support (always caveat).
Forgetting to mount Models/ directory.
Using synced folders (OneDrive/Dropbox) for model storage.
Enabling cloud providers in LM Studio or Goose without explicit client sign-off.

6. Version / SOP Sync¶

Ensure the SOP version in use is 1.01.26.
If making local changes, bump version as per scheme (e.g., 1.012.26 for second revision in same month/year).