Tried getting a fully local multi-modal model to tell me what it seems in my logo and it's honestly mind blowing that it can identify anything at all. i used `ollama run llava` on my steam deck. Might be a useful tool to integrate with caption generators or for #blind folks wanting to get a description without needing an online service
@mauve Thanks for the report. If I may ask, how responsive is the model on the steamdeck, and which LLAVA is it?
@techsinger I will say that out of the box it fully ate my system resources to do so and locked my music player and browser. :P I may experiment with giving it fewer CPU cores so I can have space for processing UI stuff
@techsinger TBH I wouldn't be surprised if there was a faster option out there. I found that for text generation it was way slower than LM Studio for example. But that doesn't support multi modal stuff.
@mauve thanks for the info. I appreciate knowing how well it's handled.