Gonna be presenting a demo of teaching local a #llm to search wikipedia with "Function Calling"
Video of my talk about making an #OpenSource #LLM perform function calling on my machine.
@mauve Recomend using podman, to acess the GPU inside the container use sudo setsebool container_use_devices=true to make SE Linux comply with it.
If its a GPD device you can allocate more VRAM in the BIOS
@fredy_pferdi Oh that's great to know TY. I'll look into it. Is this going to use Vulkan for the GPU acceleration? I wasn't sure what my options would be since Ollama seems to only support Cuda and Metal
@mauve It also supports AMD ROCm the equivalent to cuda
@fredy_pferdi Interesting I may be able to get it running without a container too. https://github.com/rocm-arch/rocm-arch
@mauve Personally recommend strongly to not install the ROCm drivers on your device but using them in a container instead, they are not that stabile on those chips and it can lead to your device crashing. Also officially only like an LTS Ubunut and Cent OS and a couple of GPU's are supported.
Container on the other hand is one command (use the amd rcom version further info here:
https://hub.docker.com/r/ollama/ollama
https://ollama.com/blog/amd-preview
There is no substantial performance lose using a container
@fredy_pferdi Sweet just followed this guide to install it in my ubuntu distrobox container and it's working great :o
https://www.reddit.com/r/steamdeck_linux/comments/102hzav/guide_how_to_install_rocm_for_gpu_julia/
@fredy_pferdi Spoke too soon, ollama dies when I try to load the model. Will need to mess with it another day :) TY again for the tip.
@mauve Distrobox is just an interface interface for Podman i think just running the already made images or building them yourself is way easier then to recreate the install manually with Distrobox.
@mauve
first allow podman to use GPU
`sudo setsebool -P container_use_devices=true`
and then just run this to start the container
`podman create --name=ollama --security-opt seccomp=unconfined --device /dev/dri --device /dev/kfd -e HSA_OVERRIDE_GFX_VERSION=10.3.0 -e HCC_AMDGPU_TARGETS=gfx1035 -e OLLAMA_DEBUG=1 -v .ollama:/root/.ollama:U,rw -p 11434:11434 -i --tty --restart unless-stopped docker.io/ollama/ollama:rocm`
@fredy_pferdi Cool TY, I found the HSA_OVERRIDE online and it ended up working great in my ubuntu container. 😁 Wish I had this for last night's demo! Also I don't have nearly enough RAM on this thing with 16 GB. TT_TT
@mauve for testing you could allocate 8gb that should be enough to run a small model while still using the os.
Screenshot of the option to allocate more vram:
https://www.reddit.com/r/gpdwin/comments/yfivv5/anyone_know_what_do_these_options_mean_in_gpd_win/
@fredy_pferdi Yeah the issue is my Matrix client ends up eating way too much RAM and then I start eating swap. Might also have a memory leak somewhere in my OS wasting RAM after going out of sleep mode
@mauve Yeah there are suspend issues with those GPD devices.
@fredy_pferdi Alas! It's still worth it for me to not have to use Windows or a regular laptop :P
@fredy_pferdi Yeah exactly! I'm running #ChimeraOS on it in desktop mode. Lately been thinking of just installing Manjaro on it instead since the steam bits are a bit janky for me.
@mauve how do you do the speech to text ?
@fredy_pferdi It hooks into your entire OS. I use https://github.com/ideasman42/nerd-dictation with a custom script to make it easier to codehttps://github.com/RangerMauve/mauve-dictation
Since my OS is a steam OS derivative I needed to be fancy and install it in userspace: https://github.com/atcq/steam-dictation
Then I have a global shortcut to toggle it on/off
@mauve uiii that sounds interesting using immutable distro to and was not able to get it to work.
@mauve Well presented and thanks for sharing. I was looking exactly for something like this.
BTW for anyone interested in this stuff, come join my Matrix Channel about open source AI. https://matrix.to/#/#userless-agents:mauve.moe