**Mauve 👁💜** @mauve@mastodon.mauve.moe · 2024-03-06T23:27:04Z

Mauve 👁💜 @mauve@mastodon.mauve.moe

Bluh, I wish I had time to fuss with details of model conversion and stuff. I'm trying to see how small a model I could get to do function calling. Tried messing with dolphin-phi but it's a bit too dumb. There's this "adapter" over it that's tuned for function calling, but it's in the `safetensors` format whereas I need it to be in GGUF. 🤷

Mar 06, 2024, 23:27 · · Web · · ·

**Mauve 👁💜** @mauve@mastodon.mauve.moe · Mar 06, 2024, 23:28

**Mauve 👁💜** @mauve@mastodon.mauve.moe · Mar 06, 2024, 23:28

Mar 06, 2024, 23:28

Mauve 👁💜 @mauve@mastodon.mauve.moe

If anyone wants to do it for me I'd be appreciative >:P #AI #LLM

https://huggingface.co/Yhyu13/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora

**Daniel Demmel** @daaain@fosstodon.org · Mar 06, 2024, 23:35

**Daniel Demmel** @daaain@fosstodon.org · Mar 06, 2024, 23:35

Mar 06, 2024, 23:35

Daniel Demmel @daaain@fosstodon.org

@mauve can you even load LoRA on top of a model when using GGUF? Not having too high hopes for reliable function calling at this size yet though. Or maybe with a grammar forcing it?

**Mauve 👁💜** @mauve@mastodon.mauve.moe · Mar 06, 2024, 23:38

**Mauve 👁💜** @mauve@mastodon.mauve.moe · Mar 06, 2024, 23:38

Mar 06, 2024, 23:38

Mauve 👁💜 @mauve@mastodon.mauve.moe

@daaain Yeah I have no clue sadly. I've only used LoRA on stable diffusion models and even then only via DiffusionBee. Apparently ollama supports it though.

https://github.com/ollama/ollama/blob/main/docs/modelfile.md#adapter

Also, I had it wrong, it's supposed to be GGML not GGUF. 😅

Apparently this mistral 7b derivative is better? https://huggingface.co/Yhyu13/dolphin-2.6-mistral-7b-dpo-laser-function-calling

I've managed to get OpenHermes 2.5 to do pretty well at that level though just by prompting.

**Daniel Demmel** @daaain@fosstodon.org · Mar 06, 2024, 23:46

**Daniel Demmel** @daaain@fosstodon.org · Mar 06, 2024, 23:46

Mar 06, 2024, 23:46

Daniel Demmel @daaain@fosstodon.org

@mauve oh nice, that Ollama method looks easy, bakes the adapter integration into a Modelfile ready to use!

But yeah, that Mistral 7B one looks much more promising! Can't see a GGUF version on 🤗 but the conversion should be a relatively simple thing with llama.cpp: https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#prepare-and-quantize – I've never done it but tempted to do it tomorrow just for the sake of it!

That said, while I haven't tried, I did hear good things about the OpenHermes 2.5 finetune and function calling.

**Mauve 👁💜** @mauve@mastodon.mauve.moe · Mar 06, 2024, 23:50

**Mauve 👁💜** @mauve@mastodon.mauve.moe · Mar 06, 2024, 23:50

Mar 06, 2024, 23:50

Mauve 👁💜 @mauve@mastodon.mauve.moe

@daaain Yeah I'm on steam OS so installing deps for the llama.cpp stuff is kind of a PITA right now. I've considered setting stuff up on my mac mini but sadly I only have sparing time to work on this in between actual client work obligations.

If you end up converting this I would v much appreciate a copy. 🙇

TBH OpenHermes is hands down my fave model.

I'm also looking at NexusRaven since it's specifically made for function calling.

https://ollama.com/library/nexusraven:latest

Resources

Developers

What is Mastodon?

mastodon.mauve.moe

More…