Thinking of ditching Ollama and running Llama.cpp directly in a systemd user service. It's great for tinkering but it adds overhead and can be slower to adopt new llama.cpp features. Thanks to Arch linux it's easy to just build it from github using this AUR package.