New Architecture Design Proposal for Lemmy

hruzgar@feddit.de · 1 year ago

New Architecture Design Proposal for Lemmy

souperk@reddthat.com · edit-2 1 year ago

I admire your enthusiasm, so I would like to chime in with my 2 cents. I see a solution to an undefined problem, thus we cannot evaluate if said problem is solved by the solution.

Were I to redesign Lemmy, I would start by defining the requirements of that software. Things to consider here would be:

What would be the total number of users?
What would be the total number of communities?
What would be the total number of instances?
What would be the spread of users across instances? Are there categories we can define? (For example, a large instance may have millions of users, but a small instance may have 1-1000)
The same about communities.
What would be the number of posts, comments, and upvotes/downvotes for each instance or community category?
What’s the average size of a post or comment?
Probably countless more, but you got to restrict yourself to the ones with the most impact.

Then, I would define operations like:

Creating a post, a comment, or upvoting/downvoting
Retrieving posts (ordered) for a community.
Retrieving comments (ordered) for a post.
Retrieving posts (ordered) for a specific feed (subscribed, local, all).
Reporting a user.
Banning a user.

Then, I would look deep into Lemmy’s architecture in order to understand the complexity of these operations (time, memory, and developer effort). My understanding is that Lemmy is using a database to store all data you subscribe to, including posts, comments, upvotes/downvotes and stats across time. With all the data in a database, most read operations become a SQL query. On the other hand, write operations are relayed using the ActivityPub protocol.

Here I would stop for a bit, and see how I can help Lemmy right now. What’s the most value I can offer with as little effort as possible, i.e. the lowest hanging fruit. For the time being, I believe that would be moderation, basic features are missing, and there are many moderation issues someone could help with ideation, testing or implementation. However, a deep dive in moderation domain logic may not be for everyone, nor does it have to be. There are plenty of performance issues to contribute to.

This experience would give you the context needed to design a better architecture for Lemmy.

Last but not least, I suggest starting small. Distributed systems are complex, even seasoned veterans have difficulty getting their heads around it. For example, counting becomes a problem with large enough data.