New Architecture Design Proposal for Lemmy - eviltoast

Hey Lemmy community!

Hope you’re all doing well. As some of you might know, I’ve been diving into Computer Science at the Technical University of Munich. Lately, I’ve been really getting into the whole ‘decentralization’ vibe, and it got me thinking about Lemmy. So, here’s my take on how we might be able to tweak things. Would love to hear your thoughts!

Storage: Let’s talk data. I’m thinking decentralized storage – could be IPFS, could be something else. The main idea? Every comment, every post, we store it there. Now, I’m a little on the fence about pictures, but that’s something we can circle back to.

Moderation: While we’re all for storing data, we probably shouldn’t dump everything on the user, right? What if we had a local database that could filter what goes out and what stays in the shadows? A simple mechanism, like post/comment ID to True/False, could let us curate what folks see, making Lemmy’s experience unique for every instance. Instances could also open source these databases to have full transparency with users on what is getting blocked.

Speed: Here’s the catch – decentralized storage can sometimes feel like a snail race. So how about integrating something like Redis? It’d act like a speed booster, ensuring we aren’t always reaching out to the vast decentralized storage and giving users a snappier experience.

Obstacles & Possible Solutions:

Data Deletion: We want our storage sleek, not bloated. How about a voting system? If a bunch of verified servers (like 20 of them with a 90% agreement rate) think a post isn’t up to snuff, it’s out.

Spam: The decentralized world is fantastic, but spam can be a pain. The voting system might help, but we might have to look for other solutions on this regard.

This is my rough sketch on what Lemmy’s architecture could evolve into – a blend of decentralization with a solid user experience. But hey, this isn’t a monologue. Let’s turn it into a conversation. Eager to hear your insights and suggestions!

  • souperk@reddthat.com
    link
    fedilink
    English
    arrow-up
    17
    ·
    edit-2
    1 year ago

    I admire your enthusiasm, so I would like to chime in with my 2 cents. I see a solution to an undefined problem, thus we cannot evaluate if said problem is solved by the solution.

    Were I to redesign Lemmy, I would start by defining the requirements of that software. Things to consider here would be:

    1. What would be the total number of users?
    2. What would be the total number of communities?
    3. What would be the total number of instances?
    4. What would be the spread of users across instances? Are there categories we can define? (For example, a large instance may have millions of users, but a small instance may have 1-1000)
    5. The same about communities.
    6. What would be the number of posts, comments, and upvotes/downvotes for each instance or community category?
    7. What’s the average size of a post or comment?
    8. Probably countless more, but you got to restrict yourself to the ones with the most impact.

    Then, I would define operations like:

    1. Creating a post, a comment, or upvoting/downvoting
    2. Retrieving posts (ordered) for a community.
    3. Retrieving comments (ordered) for a post.
    4. Retrieving posts (ordered) for a specific feed (subscribed, local, all).
    5. Reporting a user.
    6. Banning a user.

    Then, I would look deep into Lemmy’s architecture in order to understand the complexity of these operations (time, memory, and developer effort). My understanding is that Lemmy is using a database to store all data you subscribe to, including posts, comments, upvotes/downvotes and stats across time. With all the data in a database, most read operations become a SQL query. On the other hand, write operations are relayed using the ActivityPub protocol.

    Here I would stop for a bit, and see how I can help Lemmy right now. What’s the most value I can offer with as little effort as possible, i.e. the lowest hanging fruit. For the time being, I believe that would be moderation, basic features are missing, and there are many moderation issues someone could help with ideation, testing or implementation. However, a deep dive in moderation domain logic may not be for everyone, nor does it have to be. There are plenty of performance issues to contribute to.

    This experience would give you the context needed to design a better architecture for Lemmy.

    Last but not least, I suggest starting small. Distributed systems are complex, even seasoned veterans have difficulty getting their heads around it. For example, counting becomes a problem with large enough data.