There's been some interesting discussion lately in the ATProto dev community regarding what to do about data that rightfully 'belongs' in some sense, to a group of people, rather than an individual. I'll give a little summary of some ideas I encountered recently, and share how it connects to our thinking around building a real-time messaging and collaboration spaces in Roomy.
Brittany Ellich recently shared some interesting work focused on the identity aspect of this problem, which in ATProto, in practice, comes down to control of the rotation keys for a DID:PLC. DID:PLC supports multiple rotation keys, which means you can have multiple 'admins' controlling a DID, but there are some limitations meaning you may need to delegate control to a service that can manage group access for you. This is related to what we have been building with Leaf, where we also now identify group 'streams' with a DID:PLC corresponding to a group identity, control over which is (as with most PDS users) delegated to the server by default.
In a very similar vein, a team connected with Protocol Labs have been experimenting with a 'Shared Data Server', or SDS (as opposed to a Personal Data Server or PDS, which ATProto is built around)
In response in the same thread, Boris Mann had this 'aha moment' crystallising one approach to the problem focused on authorisation:
if you make these âshared accountsâ fully work as first class regular accounts, then they will automatically work with any app, in which case using the PDS service entry in the DID doc makes sense.
This is exciting! The idea that group identities could look to the broader network just like individual users, in terms of meeting the same APIs, etc, would create a clear pathway towards interop. If the community went in this direction, we might one day want to reconsider the term 'Personal' in PDS.
Also in the thread I first linked to, Nick Gerakines shared this reflection focused on how we can model group data while preserving user ownership:
There has been an evolving and maturing idea where a communal identity âownsâ wrapper records that had an inner reference to community specific content records.
For example, the community did:web:book.club exists and has a PDS. Nick creates a week 2 thoughts post at at://did:plc:nick/club.book.post/abce1234 through the book club AppView and as a part of that process, the AppView also creates the record at://did:web:book.club/club.book.post wrapper/bcde2345 that has a strong ref to Nickâs post.
The AppView does this to have moderation controls for the wrapper records it maintains, but users âownâ their own posts.
This makes a lot of sense as an ideal to build towards. We also really want users to be able to have meaningful control over their contributions to a group. But to me, the tricky part here is around when there is collaborative group state, and what we do when we want a consistent view on that state.
For context, Roomy is a realtime community messaging and collaboration app we're building with Muni Town. We expect users to have an ATProto account, or create one for them, and use this as our identity system, but we store most application data off-protocol on a Leaf server, which we have been designing with a focus on generality as much as possible. We take a lot of inspiration from ATProto for Leaf, and we ultimately hope to interoperate with ATProto on the app data we produce as much as it helps and makes sense for users, so we are conscious to try to design things in a way that has âresonancesâ with the protocol (e.g. we always think through the design by comparison to ATProto concepts), but without trying to predict how the protocol will evolve. Weâre just trying to build it and work out The two main things that would make the PDS not quite right for our app data currently are the lack of solid approaches to privacy and group identity.
In our evolving work on Leaf and Roomy, we have taken inspiration from ATProto, especially around the modular system architecture and focus on authenticated data that can be replicated across the network, and we've also drawn on a lot of different approaches to group data and particularly collaborative data structures from the local-first world. The topic of group data - particularly, the problem of managing multi-writer state - opens up a crossing between these related but distinct developer communities.
Collaboration as a set of operations: CRDTs and Event Sourcing
In local-first there has been a lot of excitement around Conflict-Free Replicated Data Types, or CRDTs, but many teams have found that the complexity and overhead of CRDTs have not been appropriate for what they were trying to build. A related but simpler concept is event sourcing, which is how we ultimately decided to approach building Roomy. We think both are really cool, and gel with our intention to make experiences that give users very direct control and ownership over the data they create, in line with the ideals set out by Ink and Switch in their essay coining the term local-first.
Event sourcing and CRDTs are both systems for deterministically arriving at the current state of a system by applying a set of operations on that system. The main difference is that event sourcing requires that that entire set of operations has a canonical linear order, and only guarantees the state will be correct if the order is the same. This makes event sourcing very simple to implement, because there are never conflicts.
CRDTs, on the other hand, are actually built to handle conflicts and resolve them in a deterministic way. To keep things as ~temporally flexible~ as possible they encode only the bare minimum of information about what came before into each operation itself, and then are very careful about how to handle conflicts and merges. This makes it possible to tolerate different users applying operations from different users in different orders, while ensuring that once all the communication is done, they are all still looking at the same document. This is amazing, especially for documents! Documents are a very particular example of a large, complex data structure where there very well may be a lot of people trying to change it at once, some of whom may be offline. Absent broader technological support for peer-to-peer networking, having a solid way to make sense of offline changes to data structures is probably the best use case we currently have for CRDTs, but it's also not the only solution for that.
In Roomy, our primary focus is on real-time community messaging. Users of real-time chat apps don't typically have such a flexible notion of temporality, that they would accept that a message sent mid-conversation yesterday, that happened to be sent on a device that suddenly went offline until today, really belongs in the middle of yesterday's conversation. If nobody else saw it then, it's arguable that in some sense it didn't really happen then, from the perspective of the community. We still want to support offline use of the app, and personally I still really hope to see wider support for peer-to-peer networking, but we've accepted that in general for normal app usage, it's ok for the server to be the authority of what happened and when.
Event-sourced XRPC ops
In Roomy, a space is where a community keeps all its communal activities, and a client arrives at the current state of a space by applying the log of events in a Leaf stream from the first to the most recent. By 'applying', I mean that each of these events maps to a set of SQL statements, which can be run on a local SQLite db to (more-or-less) deterministically arrive at a consistent final state.
We have been iterating on NSIDs for these Roomy events for a while, but recently we decided to more explicitly work towards publishing schemas for these events as ATProto Lexicons. In doing so, we realised that our stream had been mixing and matching notions of ATProto 'records' (which live on the PDS) and XRPC 'procedures' (which are methods altering data enacted by the PDS or by an AppView). If we take a step back and think about what's going on conceptually from the user's perspective, all of these events are more like the latter. The difference here is that the primary subject of these XRPC operations is a local DB on the user's device, rather than something like a PDS which is responsible for serving the current state of any given record, as seen at the head of the chain of commits to the user's data repo.
That said, Leaf is actually very flexible in that it also supports materialisation of events to produce the current state on the server. Could Leaf implement PDS XRPC methods and materialise the current state of any given data to serve as ATProto records? If that were possible, would it help us interoperate with any of these other emerging projects focused on community data on ATProto?
Another, perhaps more ATProto-y approach we've discussed is, rather than thinking of the source of truth for the log of events as a single stream on a Leaf database, we think of a stream as a feed, that listens to the firehose for users posting event records to their PDS, and then aggregates these into a stream.
Our primary focus right now is getting Roomy and Leaf working on our own terms, but I wanted to share how our thinking is evolving in case it's useful to the broader community as we are ideating towards convergence on solutions to group data, and to express our strong desire to participate more deeply in these conversations as we find our footing.