**Mauve 👁💜** @mauve@mastodon.mauve.moe · 2025-10-21T15:08:27Z

Mauve 👁💜 @mauve@mastodon.mauve.moe

Idea: language model tokenizer that only supports the top 255 most common words and punctuation. Using this to limit an existing LLM's output via a grammar gbnf definition. Have the existing LLM "reword" training data in this limited grammar. Use it to train nanochat with a highly compressed token space. ??? Profit?

Oct 21, 2025, 15:08 · · Web · · ·

**Dr. Brandon Wiley** @brandon@blanu.net · Oct 21, 2025, 18:35

**Dr. Brandon Wiley** @brandon@blanu.net · Oct 21, 2025, 18:35

Oct 21, 2025, 18:35

Dr. Brandon Wiley @brandon@blanu.net

@mauve Like Thing Explainer (https://en.wikipedia.org/wiki/Thing_Explainer), but you need 1000 words. 255 is not enough. Might as well make it 1024!

**Mauve 👁💜** @mauve@mastodon.mauve.moe · Oct 21, 2025, 20:40

**Mauve 👁💜** @mauve@mastodon.mauve.moe · Oct 21, 2025, 20:40

Oct 21, 2025, 20:40

Mauve 👁💜 @mauve@mastodon.mauve.moe

@brandon yeah like an automated thing explainer. I guess if you're going up to 1024 words you might as well just use an existing model and enforce the grammar restrictions on it 😅

Resources

Developers

What is Mastodon?

mastodon.mauve.moe

More…