Simulating network requests to test a new feature in @distributed

You know how when you follow an account on Mastodon you don't get to see any of the users older posts unless someone else on your instance follows them? Well if it's a distributed press site it'll attempt to "backfill" your instance with all the older posts once your follow request is accepted. 😎

@mauve like flooding peoples timelines with the backfilling of posts

@thisismissem I assumed timelines would sort by published time no? I'm gonna run some tests before deploying this to production ofc

@thisismissem In mastodon? i'd love to see the source code for it if you know what part of it it'd be in.

@mauve I don't have it off top of my head but follow from /app/lib/activitypub/activity/create.rb

Follow

@thisismissem It seems to be taking the created_at part out at least. 🤔 github.com/mastodon/mastodon/b

I'll look more after testing. Gonna let my personal instance take the bulk o9f the damage :P

· · Web · 1 · 0 · 0

@mauve yeah, I've heard of like, birdsite bridge that did similar backfilling clogging up people's timelines, but that might be specific to that implementation.

@thisismissem @mauve so... imo the ActivityPub way of doing this is to publish your last X posts at an "outbox" endpoint specified in your profile. Then a remote server following you can parse the outbox and get some content to fill things with. I did this with some software I was building and Mastodon just... didn't grab any of the outbox posts and I was confused as to why not

@darius @thisismissem @mauve yeah the outbox is there to support "web browser" use cases, fetch the latest activities and page through to see history

the thing with mastodon and timeline sorting is that the timeline sorting algorithm isn't using creation date, it's using arrival date. this is so someone can't create a post set in the far future to have it pinned at the top of your timeline.

@darius @thisismissem @mauve so delivering an activity ~now would insert the post at the top of the TL even if it was backdated in the `published` property.

@trwnh @thisismissem @mauve Oh interesting. But also: why not use arrival date for future dated posts and creation date for back dated posts? (Genuine question btw. I'm sure I'm missing some weird edge case race condition)

@darius @trwnh @thisismissem @mauve this is always a bit messy as posts can get skewed if servers don't correctly label timezones, but that is a good practice, yes.

@darius @trwnh @thisismissem @mauve Because of Announce, as the obvious case. But more generally, when an object was created doesn't have any real connection to when it should be presented to people. Mastodon is doing more or less the right thing, here.

I wish mastodon would be more prompt in backfilling from the outbox, so that this wouldn't even seem necessary. But even that is basically the right behavior.

@jenniferplusplus @trwnh @thisismissem @mauve I see what you mean by time created vs when it should be presented. That said, if we consider the context of pulling from an outbox to backfill, then the context seems pretty clearly (to me, I could be wrong) "do not show this in the home timeline, this is just for filling in our database for views more generally"

DOES Mastodon pull from the outbox? I didn't observe it doing it at all

@darius @jenniferplusplus @thisismissem @mauve nope, mastodon p much ignores the outbox

and yeah it makes sense to *not* do timeline insertion when backfilling, backfilling seems like the kind of thing that is only/primarily useful for viewing a profile and not keeping up with current activities

@darius @jenniferplusplus @thisismissem @mauve another way to look at it is that there are really two dates, one is the date the object claims to be published, and the other is the date that the activity arrived in your inbox. most fedi dev is used to thinking in terms of objects, but it would be more technically correct and spec-accurate to think in terms of activities

@trwnh @jenniferplusplus @thisismissem @mauve one of my goals for my next year of funded work is to make it easier for devs to think in AP terms rather than name-your-API terms

@darius @jenniferplusplus @thisismissem @mauve hey if there's room for me lemme know! that's a laudable goal that i'd be happy to cooperate on

@trwnh @darius @thisismissem @mauve objects have dereferenceable URIs. Why wouldn't I make that a first class entity in my data model?

@jenniferplusplus @darius @thisismissem @mauve activities do too, as well! or they should. activities are also objects.

the advantage of thinking in terms of activities is that it's a better representation of reality, with AP serving as a specialization of LDN (Linked Data Notifications), you're basically notifying your followers/audience/recipients that "something happened". it's the reason we POST Create activities and not just raw non-activity objects.

@trwnh @darius @thisismissem @mauve
Unless the activity is intentionally transient. The spec says this twice in two paragraphs. I don't know what it means for an activity to *not* be transient. Activities are actions, not ongoing persistent state. The objects the refer to are the persistent state.

@jenniferplusplus @trwnh @thisismissem @mauve I see the transient language in the spec but it's about ids and guids and referencing things globally. I don't read it as saying anything about activities being temporary things that can be thrown away (though they allow for the possibility). Every time I've worked with an Event/Object model (at least in years as an MMORPG dev) we consider both Events & Objects to be first-class entities. All of these things should be first-class and treated as such

@erincandescent @darius @jenniferplusplus @thisismissem @mauve idk, i can see uses for it, the spec calls out "intentionally transient" and not just transience in general. without going too far from current usage, say you want to let someone know you Liked something but you don't want them to be able to Add it to the `likes` collection.

@jenniferplusplus @darius @thisismissem @mauve @trwnh there are so many things which get easier when your activities have IDs

For example, is Undo(Follow) undoing the active follow or is it stale?

@mauve @erincandescent @darius @thisismissem @trwnh Undo is supposed to unwind the side effects of the arbitrary undone activity. Consider even for a simple case: Undo(Like)? You can't unsend the notification. Now what about less simple cases? What if you receive an Undo(Delete)? What if you receive an Undo(Undo)? The logical complexity it would take to support this is extreme.

And the idea that you can simply undo things in a distributed poly-central system like this is fantasy.

Show newer

@jenniferplusplus @darius @thisismissem @mauve @trwnh I can’t say I’m a big fan of the Undo acitvity in practice, but the problem applies to an Unfollow activity all the same.

Having ids makes certain forms of state resolution easier. What would really make them easier is monotonically increasing but otherwise opaque ordering token included in every activity, but I have a hell of a lot more distributed systems experience in 2024 than I did in 2014.

@erincandescent @darius @thisismissem @mauve @trwnh
I don't think it even makes sense to view the AP network as a distributed system. It's a heterogenous network of independent systems. There is no correct state for the system because there's no singular system or even singular state.

The mechanisms you're talking about are useful for preventing desynchronization, but that's not actually a goal. At least not a goal that makes sense to me. Some desync will happen intentionally, by design.

Show newer

@jenniferplusplus @darius @thisismissem @mauve transient means you don't want to look at it later. for example an in-game notification.

activities are not really "actions", they are "notifications with side effects", and that notification resource should be persistent if you want to refer to it later. consider a old-facebook "activity feed" or "activity stream" -- there, you see things like "x created a post" or "x liked a post" as discrete entries in the feed. many cases demand activities.

@trwnh @darius @thisismissem @mauve that's only true if your application model is to just store and replay activities. I treat activitypub as a wire protocol. It's not my domain model.

@jenniferplusplus @darius @thisismissem @mauve that's fine but also doing that should come with the recognition that by discarding the activity you are losing some information. it's not "enough" to concern yourself only with what was created, sometimes the act of creation itself is important. this is why Activities come with extra properties that aren't present on the Object base class -- `result` comes to mind as a particularly interesting one.

@trwnh @darius @thisismissem @mauve result *could* be interesting, if it was actually well specified. But it's not. And either way, it's entirely possible to retain that result without requiring that the activity itself be dereferenceable.

@darius @trwnh @thisismissem @mauve My understanding is that mastodon does pull from the outbox, but it may not happen immediately. There's background tasks that fetch remote objects into the local cache on some schedule, and I haven't looked into the implementation details.

@jenniferplusplus @darius @thisismissem @mauve p sure mastodon will pull from the following

- `featured` collection (pinned posts) when a profile is discovered for the first time
- `replies` collection when a post is encountered for the first time (but only the first page of that collection, which in mastodon always contains self-replies, with all other replies being forced into page 2 at least)

@trwnh @jenniferplusplus @darius @thisismissem @mauve this is good to know! when we were testing with users they were confused they couldn't find any previous posts on their instance and thought the integration wasn't working, even though mastodon's ui has a really small message (!) saying it isn't pulling this information. that's why we're backfilling but clearly missed out on this bit of activitypub folklore :B

@jenniferplusplus @darius @trwnh @thisismissem Are you sure? I have yet to see this happen in any impls. We did it in reader.distributed.press specifically because nobody else seemed to be doing it which is a major PITA for me. :P

@mauve @darius @trwnh @thisismissem I'm pretty sure mastodon sends a request to the outbox endpoint. I don't know what they do with that, in part because I don't have a useful outbox yet.

@jenniferplusplus @mauve @darius @thisismissem at most i think they might pull the totalItems to get a post count for your profile?

@trwnh @jenniferplusplus @mauve @thisismissem yeah. They definitely pull featured/pinned posts. that is the hack that I had to do to get around the lack of outbox parsing

@darius @trwnh @jenniferplusplus @thisismissem I think I'd rather "spam" the timeline with a few posts in a row than have users miss those posts forever. Maybe we can increase the interval between posts for backfill to limit "spam", but not having them at all is awful UX for anyone trying to read stuff on the fediverse.

@mauve @trwnh @jenniferplusplus @thisismissem idk, I think we need to distinguish between pushed and pulled data. We say there is no polling in ActivityPub but stuff like the outbox does provide a place to poll and I think pushed data should be handled fundamentally different from pulled data.

If you're fetching a profile for the first time it seems obvious to me it should not drop straight into anyone's live feeds. But also: that's a fetch and not someone publishing something to a local inbox

@darius @mauve @trwnh @jenniferplusplus @thisismissem Heh I never understood the "no polling in ActivityPub" part -- the Outbox is *right* there :)

@dmitri @darius @mauve @jenniferplusplus @thisismissem i think evan would like to remind people that AP supports both push and pull/poll, yeah :p

@trwnh @dmitri @darius @mauve @jenniferplusplus there was actually a whole big controversy a few months ago where people didn't understand that activitypub wasn't just push based, and that pulling data was core to the protocol too.

Lead to accusations of scraping, when that wasn't the case at all.

@darius @trwnh @jenniferplusplus @mauve @thisismissem

it might make a difference if the user is known on the instance and then they pull (?)

Normally it does not.
Just curious how much you fill, Darius (doing the same with 10 last).

It is super-hinering that mastodon does not implement the Activitypub Client To Server API.
Currently doing a thing for Europeans Public Broadcasters and any Actor in AP was thought as just a link.
That was a clever design decision and somehow destroyed by mastodon. We can't force webfinger. Any broadcaster wants just their existing domains as Actors, no matter what a user would choose.

just btw: Unfortunately our book is in german but we recently did a Talk about AP at the Public Spaces Conf digitalcourage.social/@sl007/1

@jenniferplusplus @darius @thisismissem @mauve yeah you can simulate the same effect with spinning up a *oma instance and following some people then taking it offline for a bit. when you come back online your home tl will be out of order because post deliveries get retried at different intervals lol

@trwnh @darius @mauve tbh, I'd think it'd be entirely reasonable to reject activities that are not within 15 minutes or so of the current server time.

@thisismissem @darius @mauve this is a good way to have data loss from time zone related issues

i think mastodon has a window of 12 hours for HTTP signatures for this reason

@thisismissem @trwnh @darius @mauve that sounds like it would break things after you've had an outage and now the other servers are redivering the backlog to you
Sign in to participate in the conversation
Mauvestodon

Escape ship from centralized social media run by Mauve.