These 30B-A3B models really act like 10 3B models trying to yap over each other at the same time instead of one more competent model.

@mauve They are useful for deep research web scraping or generally to collect and structure data. Not to actually synthesize "creatively" but for the collection of information together with some for example MCP tools it is possible to leverage them and get more reliable tool use output than from small dense models. For the synthesis of all the collected data a model with more active parameters is need to get reliable enough result.

Follow

@fredy_pferdi I found their tool call capabilities to also be lacking. It works way faster than dense models but I think the percentage of bullshit it adds to the context makes it moot relative to tighter contexts with smarter models. 🤷

Sign in to participate in the conversation
Mauvestodon

Escape ship from centralized social media run by Mauve.