@simon Quality of results. The same query on the ggml model via LLM vs the raw model on replicate has vastly different results. It feels like the replicate instance is more stable and has less jitter? I feel like the parameters for max tokens and heat might be the culprit.

Oct 06, 2023, 22:37 · · Tusky · · ·

**Mauve 👁💜** @mauve@mastodon.mauve.moe · Oct 06, 2023, 22:37

**Mauve 👁💜** @mauve@mastodon.mauve.moe · Oct 06, 2023, 22:37

Oct 06, 2023, 22:37

Mauve 👁💜 @mauve@mastodon.mauve.moe

@simon I'm on a steam deck and the speed is actually great!

**Simon Willison** @simon@simonwillison.net · Oct 07, 2023, 14:50

**Simon Willison** @simon@simonwillison.net · Oct 07, 2023, 14:50

Oct 07, 2023, 14:50

Simon Willison @simon@simonwillison.net

@mauve yeah it would be interesting to figure out if they are using different parameters

**Mauve 👁💜** @mauve@mastodon.mauve.moe · Oct 07, 2023, 15:21

**Mauve 👁💜** @mauve@mastodon.mauve.moe · Oct 07, 2023, 15:21

Oct 07, 2023, 15:21

Mauve 👁💜 @mauve@mastodon.mauve.moe

@simon I'll dig around. Are there already llm plugins that specify how to change parameters? Might have a pr up my sleeve if there's time this weekend.

**Simon Willison** @simon@simonwillison.net · Oct 07, 2023, 19:03

**Simon Willison** @simon@simonwillison.net · Oct 07, 2023, 19:03

Oct 07, 2023, 19:03

Simon Willison @simon@simonwillison.net

@mauve this one does https://github.com/simonw/llm-llama-cpp/blob/main/README.md#options

**Mauve 👁💜** @mauve@mastodon.mauve.moe · Nov 02, 2023, 19:47

**Mauve 👁💜** @mauve@mastodon.mauve.moe · Nov 02, 2023, 19:47

Nov 02, 2023, 19:47

Mauve 👁💜 @mauve@mastodon.mauve.moe

@simon Looked into this, I think the top_p and top_k are the main differences. The default in gpt4all is way more "loose".

Would a PR that sets different defaults be welcome? Or would you prefer to just have the flags exposed like your llama-cpp example?

https://github.com/simonw/llm-gpt4all/blob/main/llm_gpt4all.py#L112

https://docs.gpt4all.io/gpt4all_python.html#the-generate-method-api

I'll try hardcoding some values and running a generation again to see if it's "better" in the meantime.

**Simon Willison** @simon@simonwillison.net · Nov 02, 2023, 19:55

**Simon Willison** @simon@simonwillison.net · Nov 02, 2023, 19:55

Nov 02, 2023, 19:55

Simon Willison @simon@simonwillison.net

@mauve exposing options like llama-cop does would be fantastic

**Mauve 👁💜** @mauve@mastodon.mauve.moe · Nov 02, 2023, 22:01

**Mauve 👁💜** @mauve@mastodon.mauve.moe · Nov 02, 2023, 22:01

Nov 02, 2023, 22:01

Mauve 👁💜 @mauve@mastodon.mauve.moe

@simon PR: https://github.com/simonw/llm-gpt4all/pull/17

Gonna need to mess with the parameters more another day though. But my gut feeling is we can up the quality of output significantly by turning down the temperature a bit and reducing the top_p to 1 and top_k to 4 like in the replicate.com demo