Bouta skip the language model and autistically read all the training data into my own brain instead.
@mauve how's your retention %% ?
@dym I only have 2 braincells so NGL it's pretty low. :P I am becoming less aligned since it's an "uncensored" dataset tho so at least there's that.
@mauve this looks like parsing out the "Answer the following question:" was the trigger to pick one of "-"-items at the end of the text
@mauve probably a result of getting to the long tail of the probabilities. When there aren't too many occurrences of the previous thread of words..
Seriously though what the hell are some of these outputs they're teaching these things "he was staring at the beautiful mexican girl" as an "answer" to a random rant.
https://huggingface.co/datasets/cognitivecomputations/dolphin?row=30
Maybe this is a side effect of using AI to generate datasets?