Bluh, I wish I had time to fuss with details of model conversion and stuff. I'm trying to see how small a model I could get to do function calling. Tried messing with dolphin-phi but it's a bit too dumb. There's this "adapter" over it that's tuned for function calling, but it's in the `safetensors` format whereas I need it to be in GGUF. 🤷
If anyone wants to do it for me I'd be appreciative >:P #AI #LLM
https://huggingface.co/Yhyu13/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora
@daaain Yeah I have no clue sadly. I've only used LoRA on stable diffusion models and even then only via DiffusionBee. Apparently ollama supports it though.
https://github.com/ollama/ollama/blob/main/docs/modelfile.md#adapter
Also, I had it wrong, it's supposed to be GGML not GGUF. 😅
Apparently this mistral 7b derivative is better? https://huggingface.co/Yhyu13/dolphin-2.6-mistral-7b-dpo-laser-function-calling
I've managed to get OpenHermes 2.5 to do pretty well at that level though just by prompting.
@daaain Yeah I'm on steam OS so installing deps for the llama.cpp stuff is kind of a PITA right now. I've considered setting stuff up on my mac mini but sadly I only have sparing time to work on this in between actual client work obligations.
If you end up converting this I would v much appreciate a copy. 🙇
TBH OpenHermes is hands down my fave model.
I'm also looking at NexusRaven since it's specifically made for function calling.
@mauve oh nice, that Ollama method looks easy, bakes the adapter integration into a Modelfile ready to use!
But yeah, that Mistral 7B one looks much more promising! Can't see a GGUF version on 🤗 but the conversion should be a relatively simple thing with llama.cpp: https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#prepare-and-quantize – I've never done it but tempted to do it tomorrow just for the sake of it!
That said, while I haven't tried, I did hear good things about the OpenHermes 2.5 finetune and function calling.