Tried getting a fully local multi-modal model to tell me what it seems in my logo and it's honestly mind blowing that it can identify anything at all. i used `ollama run llava` on my steam deck. Might be a useful tool to integrate with caption generators or for folks wanting to get a description without needing an online service

@mauve what does multi-modal mean in this context, trained on data encompassing text, images, code samples?

Follow

@cblgh From what I understand it just means text+images. You can have it take an image as input and then ask it questions about what's in it. There might be other modes of operation too? I really want to make one that interacts with my shell to load/edit files.

· · Web · 0 · 0 · 1
Sign in to participate in the conversation
Mauvestodon

Escape ship from centralized social media run by Mauve.