e42.uk Circle Device

 

Quick Reference

llama.cpp on Strix Halo

llama.cpp on AMD Ryzen AI Max+ 395 w/Radeon 8060S

Configure the iGPU memory in Advanced to be Auto and iGPU Memory Size to be 0.5GB. This will allow the ROCm software to manage the memory split.

Using Artix Linux with OpenRC with the rocm-hip-sdk installed.

pacman -S rocm-hip-sdk

Get llama.cpp from github:

git clone --depth=1 https://github.com/ggml-org/llama.cpp

Build with cmake:

cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1151 -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release -- -j$(nproc)

Run with a model:

GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 build/bin/llama-server --host 0.0.0.0 \
    --port 8080 \
    --flash-attn on \
    --cache-prompt \
    --cache-type-k q8_0 \
    --cache-type-v q8_0 \
    --temp 0.6 \
    --top-p 0.95 \
    --top-k 20 \
    --min-p 0.0 \
    --presence-penalty 0.0 \
    --repeat-penalty 1.0 \
    --gpu-layers 99 \
    --ctx-size 32768 \
    --mmproj ../models/Huihui-Qwen3.6-35B-A3B-abliterated-mmproj-BF16.gguf \
    --model ../models/Huihui-Qwen3.6-35B-A3B-abliterated-Q8_0.gguf

Running llama.cpp with Radeon RX 9070 XT (gfx1201)

TODO: write some stuff here

Running an MCP Server for File System Access

On Artix Linux and using podman.

pacman -S podman crun

Update /etc/containers/registries.conf so that docker.io is searched when an unqualified image is present in Dockerfile or specified on the command line.

unqualified-search-registries = ["docker.io"]

References

Quick Links: Techie Stuff | General | Personal | Quick Reference