Idea: language model tokenizer that only supports the top 255 most common words and punctuation. Using this to limit an existing LLM's output via a grammar gbnf definition. Have the existing LLM "reword" training data in this limited grammar. Use it to train nanochat with a highly compressed token space. ??? Profit?