SOP: Local LLM Container (Llama-3.1-14B) with Goose UI on Host¶

Document Type: Standard Operating Procedure (SOP)
Version: 1.01.26
Status: Approved for Use
Audience: Technician + Client
Confidentiality: Internal / Client Delivery
Platforms Supported: Windows 11 + Linux

1. Purpose¶

To deploy a private, offline-capable local Large Language Model (LLM) container running Llama-3.1-14B (Q4) using Docker Compose, with Goose installed on the host as the user-facing interface.

2. Scope¶

This SOP applies to private workstation deployments where: - No cloud dependency is desired - Reasoning-oriented local inference is needed - A graphical or desktop UI is preferred

Not included: - Cloud AI services - Remote multi-user inference - Regulatory compliance configurations - Air-gapped deployments (see Optional Lockdown)

3. Responsibilities¶

Technician Responsibilities - Deploy and maintain local model container - Validate Goose → LLM connectivity - Confirm performance expectations with client - Communicate hardware limitations and privacy constraints

Client Responsibilities - Provide hardware + OS environment - Approve intended use cases and privacy sensitivity - Accept performance limitations based on hardware selection

(Optional) IT/Compliance Responsibilities - Approve local-only AI usage policies if applicable - Validate network and storage isolation per organization policy

4. Requirements¶

4.1 Minimum Hardware¶

CPU: 8 cores
RAM: 16 GB
Disk: 20 GB free
GPU: Optional (CPU fallback supported)

4.2 Recommended Hardware¶

CPU: 12+ cores
RAM: 32–64 GB
GPU: NVIDIA RTX 3090 or better
SSD/NVMe for model storage

4.3 GPU Practical Notes (NVIDIA vs AMD)¶

NVIDIA strongly preferred for llama.cpp inference due to CUDA ecosystem maturity
AMD may not work for this use-case unless ROCm/HIP/Vulkan toolchain succeeds; compatibility varies by model, quant, driver, and distro
AMD may fall back to CPU or significantly degraded Vulkan performance
CPU-only operation is viable for light workloads but slower

4.4 Supported OS¶

Component	Windows 11	Linux
Goose UI	Supported	Supported
Docker	Docker Desktop	Docker Engine
Compose	`docker compose`	`docker compose` or Portainer
GPU Accel	NVIDIA Preferred	NVIDIA Preferred

5. Model Selection Note¶

Example model used in this SOP:

Llama-3.1-14B-Instruct-Q4_K_M, roughly comparable to high-end GPT-4-class cloud models in reasoning (not coding), widely deployed on consumer hardware.

After this section, referred to as Llama-3.1-14B (Q4).

6. Directory Structure (Standardized)¶

All deployment resources should be organized as follows:

Docker/
  Portainer_Management/
  LLM_Inference/
Models/

Docker/Portainer_Management/ = Compose file for Portainer stack
Docker/LLM_Inference/ = Compose file for LLM container
Models/ = Offline GGUF models stored on host

7. Procedure — Windows 11¶

7.1 Install Docker Desktop¶

Download from: https://www.docker.com/products/docker-desktop/
Enable WSL2 backend when prompted.

7.2 Prepare Model Storage¶

mkdir C:\Models

Download .gguf model into C:\Models.

7.3 Create Compose File¶

Path: Docker\LLM_Inference\docker-compose.yml

services:
  llm:
    image: ghcr.io/ggerganov/llama.cpp:latest
    volumes:
      - C:\Models:/models
    ports:
      - "8000:8000"
    command: >
      --model /models/llama-3-14b-instruct-q4_k_m.gguf
      --host 0.0.0.0
      --port 8000
      --chat
    restart: unless-stopped

7.4 Start Container¶

cd Docker\LLM_Inference
docker compose up -d

7.5 Install Goose on Host¶

Option A — Winget:

winget install block.goose

Option B — Direct Installer: Download .exe from: https://block.github.io/goose

7.6 Connect Goose to LLM Endpoint¶

Set endpoint:

http://localhost:8000/v1

8. Procedure — Linux¶

8.1 Install Docker Engine + Compose¶

sudo apt install docker.io docker-compose-plugin -y

8.2 Portainer Deployment (Compose Method)¶

Path: Docker/Portainer_Management/docker-compose.yml

services:
  portainer:
    image: portainer/portainer-ce
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - portainer_data:/data
    ports:
      - "9443:9443"
volumes:
  portainer_data:

Deploy:

cd Docker/Portainer_Management
docker compose up -d

8.3 Model Storage¶

mkdir -p /opt/Models

8.4 LLM Compose File¶

Path: Docker/LLM_Inference/docker-compose.yml
Same as Windows but path adjusted to /opt/Models.

8.5 Deploy LLM¶

cd Docker/LLM_Inference
docker compose up -d

8.6 Install Goose on Host¶

Install from official instructions, point UI to:

http://localhost:8000/v1

9. Validation / Verification¶

Technician verifies: - LLM responds at /v1/chat/completions - Goose sends prompts and receives responses - Restart persistency works:

docker compose restart llm

- No cloud dependency present

Client verifies: - Reasoning responses meet expectations - UI is functional and local

10. Troubleshooting (Common)¶

Problem	Cause	Fix
Slow responses	CPU fallback	Confirm GPU capability
No connection	Port issue	Verify `8000:8000` mapping
AMD not utilized	Expected	Use CPU or NVIDIA hardware
Goose errors	Incorrect endpoint	Reconfigure to localhost
No model	Wrong path	Check `.gguf` placement

11. Optional Lockdown (High Privacy)¶

Apply Windows/Linux firewall outbound deny for Goose
Remove outbound rules for Docker service
Disable updates for Goose + model containers
Require client approval for workflow changes

12. Maintenance¶

Update models manually (offline)
Restart containers after updates
Backup Models/ if versioning matters

13. Notes / Warnings¶

AMD support not guaranteed; may not function
CPU fallback acceptable for light reasoning
Offline-first behavior is standard, not optional

14. Revision Control¶

Version: 1.01.26
Editor: Elijah B
Next Review: Within 90 Days