Tried getting a fully local multi-modal model to tell me what it seems in my logo and it's honestly mind blowing that it can identify anything at all. i used `ollama run llava` on my steam deck. Might be a useful tool to integrate with caption generators or for folks wanting to get a description without needing an online service

@mauve Thanks for the report. If I may ask, how responsive is the model on the steamdeck, and which LLAVA is it?

@techsinger It's not super fast but it's fast enough. I think the main slowdown is loading the model into memory, and from there generation was about one word per second for me. I'm not sure which llava it is specifically, but it's probably a 7b one.

Docs on it are here:


@techsinger I will say that out of the box it fully ate my system resources to do so and locked my music player and browser. :P I may experiment with giving it fewer CPU cores so I can have space for processing UI stuff

@mauve thanks for the info. I appreciate knowing how well it's handled.

@techsinger TBH I wouldn't be surprised if there was a faster option out there. I found that for text generation it was way slower than LM Studio for example. But that doesn't support multi modal stuff.

Sign in to participate in the conversation

Escape ship from centralized social media run by Mauve.