Having tested a bunch of #OpenSource #LLM projects, I gotta say that OpenHermes 2.5 is the most helpful out of the ones I can run locally.
I recently wasted a bunch of time getting Phi-2 to do some summarization work, and it just couldn't stay focused for more than a sentence or two.
@mauve I was tinkering with ollama for a bit, but my local hardware just isn't fast enough to make it useful.
@skryking What have you been using to run the models? I find LM Studio really nice for tinkering. https://lmstudio.ai/
I find Q4 quantized models work pretty well on my steam deck.
@mauve ollama will download and host the models and setup a api port for interacting with them. I've done it in a VM and locally...alas I don't have any hardware that will do much acceleration at the moment. I'm stuck with an old rx580 card and its on a windows box so rocm doesn't work very well if at all.
@skryking Nice. I only do CPU workloads. Try running phi 2 some time! It's super low in resurce usage. Particually the Q4 quantized models.
@mauve Thanks for the suggestion, I just fired it up...that one is definitely faster than llama2 on cpu mode only.
@skryking it has less innate knowledge of facts but it is pretty good at "reasoning". I'm gonna teach it to make function calls and traverse datasets + summarize stuff. 😁
@mauve do you have any documentation / links of how you teach it to use functions?
@skryking This post by @simon is what exposed me to the idea for the first time: https://til.simonwillison.net/llms/python-react-pattern
I also have a slightly improved prompt here: https://gist.github.com/RangerMauve/19be7dca9ced8e1095ed2e00608ded5e
I'll likely be publishing any new work as open source on Github. :) Probably with Rust.
@skryking Nice. I've been wanting to get into Rust for years but didn't have much of a use case. Now with the candle library from HuggingFace and my latest adventures with LLMs I've had an actual reason to write something in it. :) https://github.com/huggingface/candle/