Tried getting a fully local multi-modal model to tell me what it seems in my logo and it's honestly mind blowing that it can identify anything at all. i used `ollama run llava` on my steam deck. Might be a useful tool to integrate with caption generators or for #blind folks wanting to get a description without needing an online service