@indutny I'm not sure, I think folks are excited by the prospect of an open source alternative to o1 in which case that'd just be the massive 600b model which IIRC is what powers the deepseek app. I found the distilled models to not be less useful than regular qwen2.5 for my use cases 😅 I think you could get it more useful with the right prompting and multi shot approach. Maybe have it ask for more humab guidance instead of looping.
@indutny If you have 20 GB of RAM this model might be more representative of the capabilities: https://unsloth.ai/blog/deepseekr1-dynamic
@mauve I didn’t, but isn’t distilled model the one everyone runs and being excited about?