I've been thinking about how to implement naive Bayesian filtering on the fediverse.
One option is to have filter hooks built into the servers. Unfortunately, not a lot of servers have this (Pleroma only, I think).
Another option is a proxy that goes *in front of* the server. That way, it will work with any AP-compatible server.
On the downside, there's more setup required by the admin. But since it's the way that CDNs usually work, folks are at least familiar withe the idea.
@evan What would be the purpose for that?
@TakeV naive bayesian spam filtering is a common technique for filtering email. It concentrates on the content of the message, plus metadata in email headers, to determine if a message is "spam" or "ham".
https://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering
Modified for the different payload format, the technique works really well for Activity Streams content. Not just for spam, but also for abuse and harassment.
@aeva It's a technique for filtering out any unwanted content; not just commercial spam. So, abuse and harassment, too. And it depends on the content, not on the origin.
@evan do you see this being implemented like email where it's up to the user to train their filter, or do you think we'll all end up stuck with shared pre-trained models policing everything? I worry because the broader trend seems to be automated policing with proprietary ai models that end up with horrible demographic biases and zero accountability, and it seems like it's only a matter of time before we're stuck with them here because they're more convenient if you aren't affected by the bias.
@evan like, idk I just think of the herculean (exploitative outsourced) efforts that "open ai" put into gpt to get it to stop spewing racist filth at the drop of a hat (gpt 2 vs chat gpt) and even then they still ended up with a model that readily makes very banal sexist assumptions about gender. I worry what the crowd sourced version of that will look like when we can't even keep personal beef out of shared block lists (it keeps happening).
@evan now, of course there's a huge difference in training cost between naive bayes spam filters and large language models, but I think the general concerns of audibility still hold if people are likely to flock to shared pre-trained filters for being more turn-key.
@evan there's also another thing, which is that a classic email filter operates on a per message basis on the theory that false positives do happen, and you don't want your email to hare trigger block a legitimate sender because of a false positive. But the zeitgeist these days is to classify individuals into a stark binary of always good vs always bad (and there's usually no opportunity to appeal if you get put into the bad category).
@evan this shows up all the time with automated policing systems (eg, youtube automatically demonetizing LGBT channels etc, twitter marking ordinary posts as TOS violations, etc), but this also shows up almost exactly the same when people approach community moderation with a zero tolerance approach. I'm not above this either, I have no idea how many people I've blocked here and elsewhere over a simple misunderstanding on my part.
@evan Anyways, this has been on my mind for a while. I picked up a shared block list on twitter during gamer gate, but stopped signing up for them when it was pointed out that these always have very high false positive rates. After that I switched to recursive blocking right wing influencer types, but I'm sure that also had it's own problems. By the time I left twitter I ended up with a block list so long it broke the exporter tool lol.
@aeva Yeah very similar experience on my end. I think not being super vocal or being an obvious "target" for outrage helps a lot too? I think a lot of PoC and trans folks for example get extra harassment just for existing within view of folks that want to get out of their way to hate them.
I think who people on your instance follow/are followed by can also have a huge effect so large instances like .social increase the opportunities for beef.
@mauve @evan well yeah, but so far I've blocked 11 instances it looks like it looks. it looks like one journalism instance, the bbc instance, a small handful of single user instances, one that's personal beef that I should probably unblock once I've calmed down about it, and the rest were obvious troll instances that showed up in my mentions. Absolutely peanuts compared to what I needed to do to make twitter usable.