**Mauve 👁💜** @mauve@mastodon.mauve.moe · Feb 14, 2024, 20:07

**Mauve 👁💜** @mauve@mastodon.mauve.moe · Feb 14, 2024, 20:07

Mauve 👁💜 @mauve@mastodon.mauve.moe

Feb 14, 2024, 20:07

Tried getting a fully local multi-modal model to tell me what it seems in my logo and it's honestly mind blowing that it can identify anything at all. i used `ollama run llava` on my steam deck. Might be a useful tool to integrate with caption generators or for #blind folks wanting to get a description without needing an online service

**Alexander Cobleigh** @cblgh@merveilles.town · Feb 14, 2024, 22:32

**Alexander Cobleigh** @cblgh@merveilles.town · Feb 14, 2024, 22:32

Feb 14, 2024, 22:32

Alexander Cobleigh @cblgh@merveilles.town

@mauve what does multi-modal mean in this context, trained on data encompassing text, images, code samples?

**Mauve 👁💜** @mauve@mastodon.mauve.moe · 2024-02-14T23:00:57Z

Mauve 👁💜 @mauve@mastodon.mauve.moe

@cblgh From what I understand it just means text+images. You can have it take an image as input and then ask it questions about what's in it. There might be other modes of operation too? I really want to make one that interacts with my shell to load/edit files.

Feb 14, 2024, 23:00 · · Web · · ·

Resources

Developers

What is Mastodon?

mastodon.mauve.moe

More…