SOP: Local LLM Container with Goose UI
SOP: Local LLM Container (Llama-3.1-14B) with Goose UI on Host
Section titled “SOP: Local LLM Container (Llama-3.1-14B) with Goose UI on Host”Document Type: Standard Operating Procedure (SOP)
Version: 1.01.26
Status: Approved for Use
Audience: Technician + Client
Confidentiality: Internal / Client Delivery
Platforms Supported: Windows 11 + Linux
1. Purpose
Section titled “1. Purpose”To deploy a private, offline-capable local Large Language Model (LLM) container running Llama-3.1-14B (Q4) using Docker Compose, with Goose installed on the host as the user-facing interface.
2. Scope
Section titled “2. Scope”This SOP applies to private workstation deployments where:
- No cloud dependency is desired
- Reasoning-oriented local inference is needed
- A graphical or desktop UI is preferred
Not included:
- Cloud AI services
- Remote multi-user inference
- Regulatory compliance configurations
- Air-gapped deployments (see Optional Lockdown)
3. Responsibilities
Section titled “3. Responsibilities”Technician Responsibilities
- Deploy and maintain local model container
- Validate Goose → LLM connectivity
- Confirm performance expectations with client
- Communicate hardware limitations and privacy constraints
Client Responsibilities
- Provide hardware + OS environment
- Approve intended use cases and privacy sensitivity
- Accept performance limitations based on hardware selection
(Optional) IT/Compliance Responsibilities
- Approve local-only AI usage policies if applicable
- Validate network and storage isolation per organization policy
4. Requirements
Section titled “4. Requirements”4.1 Minimum Hardware
Section titled “4.1 Minimum Hardware”- CPU: 8 cores
- RAM: 16 GB
- Disk: 20 GB free
- GPU: Optional (CPU fallback supported)
4.2 Recommended Hardware
Section titled “4.2 Recommended Hardware”- CPU: 12+ cores
- RAM: 32–64 GB
- GPU: NVIDIA RTX 3090 or better
- SSD/NVMe for model storage
4.3 GPU Practical Notes (NVIDIA vs AMD)
Section titled “4.3 GPU Practical Notes (NVIDIA vs AMD)”- NVIDIA strongly preferred for llama.cpp inference due to CUDA ecosystem maturity
- AMD may not work for this use-case unless ROCm/HIP/Vulkan toolchain succeeds; compatibility varies by model, quant, driver, and distro
- AMD may fall back to CPU or significantly degraded Vulkan performance
- CPU-only operation is viable for light workloads but slower
4.4 Supported OS
Section titled “4.4 Supported OS”| Component | Windows 11 | Linux |
|---|---|---|
| Goose UI | Supported | Supported |
| Docker | Docker Desktop | Docker Engine |
| Compose | docker compose | docker compose or Portainer |
| GPU Accel | NVIDIA Preferred | NVIDIA Preferred |
5. Model Selection Note
Section titled “5. Model Selection Note”Example model used in this SOP:
Llama-3.1-14B-Instruct-Q4_K_M, roughly comparable to high-end GPT-4-class cloud models in reasoning (not coding), widely deployed on consumer hardware.
After this section, referred to as Llama-3.1-14B (Q4).
6. Directory Structure (Standardized)
Section titled “6. Directory Structure (Standardized)”All deployment resources should be organized as follows:
Docker/ Portainer_Management/ LLM_Inference/Models/Docker/Portainer_Management/= Compose file for Portainer stackDocker/LLM_Inference/= Compose file for LLM containerModels/= Offline GGUF models stored on host
7. Procedure — Windows 11
Section titled “7. Procedure — Windows 11”7.1 Install Docker Desktop
Section titled “7.1 Install Docker Desktop”Download from: https://www.docker.com/products/docker-desktop/
Enable WSL2 backend when prompted.
7.2 Prepare Model Storage
Section titled “7.2 Prepare Model Storage”mkdir C:\ModelsDownload .gguf model into C:\Models.
7.3 Create Compose File
Section titled “7.3 Create Compose File”Path: Docker\LLM_Inference\docker-compose.yml
services: llm: image: ghcr.io/ggerganov/llama.cpp:latest volumes: - C:\Models:/models ports: - "8000:8000" command: > --model /models/llama-3-14b-instruct-q4_k_m.gguf --host 0.0.0.0 --port 8000 --chat restart: unless-stopped7.4 Start Container
Section titled “7.4 Start Container”cd Docker\LLM_Inferencedocker compose up -d7.5 Install Goose on Host
Section titled “7.5 Install Goose on Host”Option A — Winget:
winget install block.gooseOption B — Direct Installer:
Download .exe from: https://block.github.io/goose
7.6 Connect Goose to LLM Endpoint
Section titled “7.6 Connect Goose to LLM Endpoint”Set endpoint:
http://localhost:8000/v18. Procedure — Linux
Section titled “8. Procedure — Linux”8.1 Install Docker Engine + Compose
Section titled “8.1 Install Docker Engine + Compose”sudo apt install docker.io docker-compose-plugin -y8.2 Portainer Deployment (Compose Method)
Section titled “8.2 Portainer Deployment (Compose Method)”Path: Docker/Portainer_Management/docker-compose.yml
services: portainer: image: portainer/portainer-ce volumes: - /var/run/docker.sock:/var/run/docker.sock - portainer_data:/data ports: - "9443:9443"volumes: portainer_data:Deploy:
cd Docker/Portainer_Managementdocker compose up -d8.3 Model Storage
Section titled “8.3 Model Storage”mkdir -p /opt/Models8.4 LLM Compose File
Section titled “8.4 LLM Compose File”Path: Docker/LLM_Inference/docker-compose.yml
Same as Windows but path adjusted to /opt/Models.
8.5 Deploy LLM
Section titled “8.5 Deploy LLM”cd Docker/LLM_Inferencedocker compose up -d8.6 Install Goose on Host
Section titled “8.6 Install Goose on Host”Install from official instructions, point UI to:
http://localhost:8000/v19. Validation / Verification
Section titled “9. Validation / Verification”Technician verifies:
- LLM responds at
/v1/chat/completions - Goose sends prompts and receives responses
- Restart persistency works:
docker compose restart llm- No cloud dependency present
Client verifies:
- Reasoning responses meet expectations
- UI is functional and local
10. Troubleshooting (Common)
Section titled “10. Troubleshooting (Common)”| Problem | Cause | Fix |
|---|---|---|
| Slow responses | CPU fallback | Confirm GPU capability |
| No connection | Port issue | Verify 8000:8000 mapping |
| AMD not utilized | Expected | Use CPU or NVIDIA hardware |
| Goose errors | Incorrect endpoint | Reconfigure to localhost |
| No model | Wrong path | Check .gguf placement |
11. Optional Lockdown (High Privacy)
Section titled “11. Optional Lockdown (High Privacy)”- Apply Windows/Linux firewall outbound deny for Goose
- Remove outbound rules for Docker service
- Disable updates for Goose + model containers
- Require client approval for workflow changes
12. Maintenance
Section titled “12. Maintenance”- Update models manually (offline)
- Restart containers after updates
- Backup
Models/if versioning matters
13. Notes / Warnings
Section titled “13. Notes / Warnings”- AMD support not guaranteed; may not function
- CPU fallback acceptable for light reasoning
- Offline-first behavior is standard, not optional
14. Revision Control
Section titled “14. Revision Control”- Version: 1.01.26
- Editor: Elijah B
- Next Review: Within 90 Days