@simon Any clue why your #LLM tool with gpt4all's ggml-replit-code-v1-3b would perform worse than this replicate demo?
Is there a need to tweak the parameters for the model somewhere maybe?
https://replicate.com/replit/replit-code-v1-3b?prediction=zarihvjb2xfluvwsplgye4bude
@mauve worse in terms of speed or quality?
What operating system are you running?
@mauve yeah it would be interesting to figure out if they are using different parameters
@simon I'll dig around. Are there already llm plugins that specify how to change parameters? Might have a pr up my sleeve if there's time this weekend.
@simon Looked into this, I think the top_p and top_k are the main differences. The default in gpt4all is way more "loose".
Would a PR that sets different defaults be welcome? Or would you prefer to just have the flags exposed like your llama-cpp example?
https://github.com/simonw/llm-gpt4all/blob/main/llm_gpt4all.py#L112
https://docs.gpt4all.io/gpt4all_python.html#the-generate-method-api
I'll try hardcoding some values and running a generation again to see if it's "better" in the meantime.
@mauve exposing options like llama-cop does would be fantastic
@simon PR: https://github.com/simonw/llm-gpt4all/pull/17
Gonna need to mess with the parameters more another day though. But my gut feeling is we can up the quality of output significantly by turning down the temperature a bit and reducing the top_p to 1 and top_k to 4 like in the replicate.com demo
@mauve my plan at the moment is to make it much easier for people to experiment with and share alternative configurations for different models
@simon Nice, like a file format for the configs so folks could pass them around and track changes in git?
@simon I'm on a steam deck and the speed is actually great!