SOP: Local LLM Container with Goose UI

SOP: Local LLM Container (Llama-3.1-14B) with Goose UI on Host

Document Type: Standard Operating Procedure (SOP)
Version: 1.01.26
Status: Approved for Use
Audience: Technician + Client
Confidentiality: Internal / Client Delivery
Platforms Supported: Windows 11 + Linux

1. Purpose

To deploy a private, offline-capable local Large Language Model (LLM) container running Llama-3.1-14B (Q4) using Docker Compose, with Goose installed on the host as the user-facing interface.

2. Scope

This SOP applies to private workstation deployments where:

No cloud dependency is desired
Reasoning-oriented local inference is needed
A graphical or desktop UI is preferred

Not included:

Cloud AI services
Remote multi-user inference
Regulatory compliance configurations
Air-gapped deployments (see Optional Lockdown)

3. Responsibilities

Technician Responsibilities

Deploy and maintain local model container
Validate Goose → LLM connectivity
Confirm performance expectations with client
Communicate hardware limitations and privacy constraints

Client Responsibilities

Provide hardware + OS environment
Approve intended use cases and privacy sensitivity
Accept performance limitations based on hardware selection

(Optional) IT/Compliance Responsibilities

Approve local-only AI usage policies if applicable
Validate network and storage isolation per organization policy

4. Requirements

4.1 Minimum Hardware

CPU: 8 cores
RAM: 16 GB
Disk: 20 GB free
GPU: Optional (CPU fallback supported)

4.2 Recommended Hardware

CPU: 12+ cores
RAM: 32–64 GB
GPU: NVIDIA RTX 3090 or better
SSD/NVMe for model storage

4.3 GPU Practical Notes (NVIDIA vs AMD)

NVIDIA strongly preferred for llama.cpp inference due to CUDA ecosystem maturity
AMD may not work for this use-case unless ROCm/HIP/Vulkan toolchain succeeds; compatibility varies by model, quant, driver, and distro
AMD may fall back to CPU or significantly degraded Vulkan performance
CPU-only operation is viable for light workloads but slower

4.4 Supported OS

Component	Windows 11	Linux
Goose UI	Supported	Supported
Docker	Docker Desktop	Docker Engine
Compose	`docker compose`	`docker compose` or Portainer
GPU Accel	NVIDIA Preferred	NVIDIA Preferred

5. Model Selection Note

Example model used in this SOP:

Llama-3.1-14B-Instruct-Q4_K_M, roughly comparable to high-end GPT-4-class cloud models in reasoning (not coding), widely deployed on consumer hardware.

After this section, referred to as Llama-3.1-14B (Q4).

6. Directory Structure (Standardized)

All deployment resources should be organized as follows:

Docker/
  Portainer_Management/
  LLM_Inference/
Models/

Docker/Portainer_Management/ = Compose file for Portainer stack
Docker/LLM_Inference/ = Compose file for LLM container
Models/ = Offline GGUF models stored on host

7. Procedure — Windows 11

7.1 Install Docker Desktop

Download from: https://www.docker.com/products/docker-desktop/
Enable WSL2 backend when prompted.

7.2 Prepare Model Storage

mkdir C:\Models

Download .gguf model into C:\Models.

7.3 Create Compose File

Path: Docker\LLM_Inference\docker-compose.yml

services:
  llm:
    image: ghcr.io/ggerganov/llama.cpp:latest
    volumes:
      - C:\Models:/models
    ports:
      - "8000:8000"
    command: >
      --model /models/llama-3-14b-instruct-q4_k_m.gguf
      --host 0.0.0.0
      --port 8000
      --chat
    restart: unless-stopped

7.4 Start Container

cd Docker\LLM_Inference
docker compose up -d

7.5 Install Goose on Host

Option A — Winget:

winget install block.goose

Option B — Direct Installer: Download .exe from: https://block.github.io/goose

7.6 Connect Goose to LLM Endpoint

Set endpoint:

http://localhost:8000/v1

8. Procedure — Linux

8.1 Install Docker Engine + Compose

sudo apt install docker.io docker-compose-plugin -y

8.2 Portainer Deployment (Compose Method)

Path: Docker/Portainer_Management/docker-compose.yml

services:
  portainer:
    image: portainer/portainer-ce
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - portainer_data:/data
    ports:
      - "9443:9443"
volumes:
  portainer_data:

Deploy:

cd Docker/Portainer_Management
docker compose up -d

8.3 Model Storage

mkdir -p /opt/Models

8.4 LLM Compose File

Path: Docker/LLM_Inference/docker-compose.yml
Same as Windows but path adjusted to /opt/Models.

8.5 Deploy LLM

cd Docker/LLM_Inference
docker compose up -d

8.6 Install Goose on Host

Install from official instructions, point UI to:

http://localhost:8000/v1

9. Validation / Verification

Technician verifies:

LLM responds at /v1/chat/completions
Goose sends prompts and receives responses
Restart persistency works:

docker compose restart llm

No cloud dependency present

Client verifies:

Reasoning responses meet expectations
UI is functional and local

10. Troubleshooting (Common)

Problem	Cause	Fix
Slow responses	CPU fallback	Confirm GPU capability
No connection	Port issue	Verify `8000:8000` mapping
AMD not utilized	Expected	Use CPU or NVIDIA hardware
Goose errors	Incorrect endpoint	Reconfigure to localhost
No model	Wrong path	Check `.gguf` placement

11. Optional Lockdown (High Privacy)

Apply Windows/Linux firewall outbound deny for Goose
Remove outbound rules for Docker service
Disable updates for Goose + model containers
Require client approval for workflow changes

12. Maintenance

Update models manually (offline)
Restart containers after updates
Backup Models/ if versioning matters

13. Notes / Warnings

AMD support not guaranteed; may not function
CPU fallback acceptable for light reasoning
Offline-first behavior is standard, not optional

14. Revision Control

Version: 1.01.26
Editor: Elijah B
Next Review: Within 90 Days