I've been thinking about how to implement naive Bayesian filtering on the fediverse.

One option is to have filter hooks built into the servers. Unfortunately, not a lot of servers have this (Pleroma only, I think).

Another option is a proxy that goes *in front of* the server. That way, it will work with any AP-compatible server.

On the downside, there's more setup required by the admin. But since it's the way that CDNs usually work, folks are at least familiar withe the idea.

@TakeV naive bayesian spam filtering is a common technique for filtering email. It concentrates on the content of the message, plus metadata in email headers, to determine if a message is "spam" or "ham".

en.wikipedia.org/wiki/Naive_Ba

Modified for the different payload format, the technique works really well for Activity Streams content. Not just for spam, but also for abuse and harassment.

@evan @TakeV is this technique of spam filtering still effective? I see spam very occasionally show up on my local timeline, but it's usually promoting very random stuff and swiftly dealt with. I haven't seen any relentless promotion of repl!ca v!aagra w@tches on here at all.

@aeva It's a technique for filtering out any unwanted content; not just commercial spam. So, abuse and harassment, too. And it depends on the content, not on the origin.

@evan do you see this being implemented like email where it's up to the user to train their filter, or do you think we'll all end up stuck with shared pre-trained models policing everything? I worry because the broader trend seems to be automated policing with proprietary ai models that end up with horrible demographic biases and zero accountability, and it seems like it's only a matter of time before we're stuck with them here because they're more convenient if you aren't affected by the bias.

@evan like, idk I just think of the herculean (exploitative outsourced) efforts that "open ai" put into gpt to get it to stop spewing racist filth at the drop of a hat (gpt 2 vs chat gpt) and even then they still ended up with a model that readily makes very banal sexist assumptions about gender. I worry what the crowd sourced version of that will look like when we can't even keep personal beef out of shared block lists (it keeps happening).

@evan now, of course there's a huge difference in training cost between naive bayes spam filters and large language models, but I think the general concerns of audibility still hold if people are likely to flock to shared pre-trained filters for being more turn-key.

@evan there's also another thing, which is that a classic email filter operates on a per message basis on the theory that false positives do happen, and you don't want your email to hare trigger block a legitimate sender because of a false positive. But the zeitgeist these days is to classify individuals into a stark binary of always good vs always bad (and there's usually no opportunity to appeal if you get put into the bad category).

@evan this shows up all the time with automated policing systems (eg, youtube automatically demonetizing LGBT channels etc, twitter marking ordinary posts as TOS violations, etc), but this also shows up almost exactly the same when people approach community moderation with a zero tolerance approach. I'm not above this either, I have no idea how many people I've blocked here and elsewhere over a simple misunderstanding on my part.

@evan Anyways, this has been on my mind for a while. I picked up a shared block list on twitter during gamer gate, but stopped signing up for them when it was pointed out that these always have very high false positive rates. After that I switched to recursive blocking right wing influencer types, but I'm sure that also had it's own problems. By the time I left twitter I ended up with a block list so long it broke the exporter tool lol.

@evan interestingly enough I haven't needed that kind of thing here at all, so my block list is usually just manual blocks, and I usually take the time to fill out a reason on the account note form in case I second guess myself. I think the difference is mainly the structural aspects of the network that inhibit sealioning, my local instance being very well moderated, and maybe a bit of luck idk.

@aeva @evan IMO instance blocking makes it so much easier since jerks often congregate. Just blocking the top 10 or 100 most blocked instances can get you very far while leaving a lot of room for nuance.

@mauve @evan well yeah, but so far I've blocked 11 instances it looks like it looks. it looks like one journalism instance, the bbc instance, a small handful of single user instances, one that's personal beef that I should probably unblock once I've calmed down about it, and the rest were obvious troll instances that showed up in my mentions. Absolutely peanuts compared to what I needed to do to make twitter usable.

Follow

@aeva Yeah very similar experience on my end. I think not being super vocal or being an obvious "target" for outrage helps a lot too? I think a lot of PoC and trans folks for example get extra harassment just for existing within view of folks that want to get out of their way to hate them.

I think who people on your instance follow/are followed by can also have a huge effect so large instances like .social increase the opportunities for beef.

· · Web · 0 · 0 · 1
Sign in to participate in the conversation
Mauvestodon

Escape ship from centralized social media run by Mauve.