SOP: Local LLM Container (Llama-3.1-14B) with Goose UI on Host¶
Document Type: Standard Operating Procedure (SOP)
Version: 1.01.26
Status: Approved for Use
Audience: Technician + Client
Confidentiality: Internal / Client Delivery
Platforms Supported: Windows 11 + Linux
1. Purpose¶
To deploy a private, offline-capable local Large Language Model (LLM) container running Llama-3.1-14B (Q4) using Docker Compose, with Goose installed on the host as the user-facing interface.
2. Scope¶
This SOP applies to private workstation deployments where: - No cloud dependency is desired - Reasoning-oriented local inference is needed - A graphical or desktop UI is preferred
Not included: - Cloud AI services - Remote multi-user inference - Regulatory compliance configurations - Air-gapped deployments (see Optional Lockdown)
3. Responsibilities¶
Technician Responsibilities - Deploy and maintain local model container - Validate Goose → LLM connectivity - Confirm performance expectations with client - Communicate hardware limitations and privacy constraints
Client Responsibilities - Provide hardware + OS environment - Approve intended use cases and privacy sensitivity - Accept performance limitations based on hardware selection
(Optional) IT/Compliance Responsibilities - Approve local-only AI usage policies if applicable - Validate network and storage isolation per organization policy
4. Requirements¶
4.1 Minimum Hardware¶
- CPU: 8 cores
- RAM: 16 GB
- Disk: 20 GB free
- GPU: Optional (CPU fallback supported)
4.2 Recommended Hardware¶
- CPU: 12+ cores
- RAM: 32–64 GB
- GPU: NVIDIA RTX 3090 or better
- SSD/NVMe for model storage
4.3 GPU Practical Notes (NVIDIA vs AMD)¶
- NVIDIA strongly preferred for llama.cpp inference due to CUDA ecosystem maturity
- AMD may not work for this use-case unless ROCm/HIP/Vulkan toolchain succeeds; compatibility varies by model, quant, driver, and distro
- AMD may fall back to CPU or significantly degraded Vulkan performance
- CPU-only operation is viable for light workloads but slower
4.4 Supported OS¶
| Component | Windows 11 | Linux |
|---|---|---|
| Goose UI | Supported | Supported |
| Docker | Docker Desktop | Docker Engine |
| Compose | docker compose |
docker compose or Portainer |
| GPU Accel | NVIDIA Preferred | NVIDIA Preferred |
5. Model Selection Note¶
Example model used in this SOP:
Llama-3.1-14B-Instruct-Q4_K_M, roughly comparable to high-end GPT-4-class cloud models in reasoning (not coding), widely deployed on consumer hardware.
After this section, referred to as Llama-3.1-14B (Q4).
6. Directory Structure (Standardized)¶
All deployment resources should be organized as follows:
Docker/
Portainer_Management/
LLM_Inference/
Models/
Docker/Portainer_Management/= Compose file for Portainer stackDocker/LLM_Inference/= Compose file for LLM containerModels/= Offline GGUF models stored on host
7. Procedure — Windows 11¶
7.1 Install Docker Desktop¶
Download from: https://www.docker.com/products/docker-desktop/
Enable WSL2 backend when prompted.
7.2 Prepare Model Storage¶
mkdir C:\Models
.gguf model into C:\Models.
7.3 Create Compose File¶
Path: Docker\LLM_Inference\docker-compose.yml
services:
llm:
image: ghcr.io/ggerganov/llama.cpp:latest
volumes:
- C:\Models:/models
ports:
- "8000:8000"
command: >
--model /models/llama-3-14b-instruct-q4_k_m.gguf
--host 0.0.0.0
--port 8000
--chat
restart: unless-stopped
7.4 Start Container¶
cd Docker\LLM_Inference
docker compose up -d
7.5 Install Goose on Host¶
Option A — Winget:
winget install block.goose
Option B — Direct Installer:
Download .exe from: https://block.github.io/goose
7.6 Connect Goose to LLM Endpoint¶
Set endpoint:
http://localhost:8000/v1
8. Procedure — Linux¶
8.1 Install Docker Engine + Compose¶
sudo apt install docker.io docker-compose-plugin -y
8.2 Portainer Deployment (Compose Method)¶
Path: Docker/Portainer_Management/docker-compose.yml
services:
portainer:
image: portainer/portainer-ce
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- portainer_data:/data
ports:
- "9443:9443"
volumes:
portainer_data:
Deploy:
cd Docker/Portainer_Management
docker compose up -d
8.3 Model Storage¶
mkdir -p /opt/Models
8.4 LLM Compose File¶
Path: Docker/LLM_Inference/docker-compose.yml
Same as Windows but path adjusted to /opt/Models.
8.5 Deploy LLM¶
cd Docker/LLM_Inference
docker compose up -d
8.6 Install Goose on Host¶
Install from official instructions, point UI to:
http://localhost:8000/v1
9. Validation / Verification¶
Technician verifies:
- LLM responds at /v1/chat/completions
- Goose sends prompts and receives responses
- Restart persistency works:
docker compose restart llm
Client verifies: - Reasoning responses meet expectations - UI is functional and local
10. Troubleshooting (Common)¶
| Problem | Cause | Fix |
|---|---|---|
| Slow responses | CPU fallback | Confirm GPU capability |
| No connection | Port issue | Verify 8000:8000 mapping |
| AMD not utilized | Expected | Use CPU or NVIDIA hardware |
| Goose errors | Incorrect endpoint | Reconfigure to localhost |
| No model | Wrong path | Check .gguf placement |
11. Optional Lockdown (High Privacy)¶
- Apply Windows/Linux firewall outbound deny for Goose
- Remove outbound rules for Docker service
- Disable updates for Goose + model containers
- Require client approval for workflow changes
12. Maintenance¶
- Update models manually (offline)
- Restart containers after updates
- Backup
Models/if versioning matters
13. Notes / Warnings¶
- AMD support not guaranteed; may not function
- CPU fallback acceptable for light reasoning
- Offline-first behavior is standard, not optional
14. Revision Control¶
- Version: 1.01.26
- Editor: Elijah B
- Next Review: Within 90 Days