Update: Pushing back against the wave of bot accounts on Lemmy - eviltoast

This is an update to my previous post about suspicious inactive accounts on a handful of instances: (https://sh.itjust.works/post/998307).

I ended up messaging the admins at the 16 instances show in the attached image. I pointed out their wild user numbers, and referenced the lemmy.ninja post detailing how that instance scrubbed suspicious accounts from their user database.

6 admins responded. They had all noticed the odd accounts and either thought the numbers were wrong, or weren’t sure how to purge the suspicious accounts without nuking their databases. In the end they managed to delete a combined total of about 338k dormant accounts from their instances. (One of the instances seems to have gone down since then.)

I never received a reply from the other 10 instance admins, though 8 of those 10 instances appear to be down (as of 27 July 2023). 2 instances are still up and unchanged.

Between the actively removed accounts and the downed instances, this represents a loss of 930,004 inactive Lemmy accounts!

You can see the drop in the graphs on The Federation. The total number of Lemmy accounts has been cut in half over the past 3 weeks, from a peak of 2.18M to today’s 1.09M. The change is mostly from these 16 instances.

I have to admit, I did not expect such a large change when I started this! Hopefully this bodes well for Lemmy’s future as a place where actual humans interact, rather than a cesspool of automated comments and upvote/downvote brigading.

That’s all I have for now. Keep your stick on the ice; we’re all in this together.

  • PriorProject@lemmy.world
    link
    fedilink
    English
    arrow-up
    21
    ·
    edit-2
    1 year ago

    No major social media site publishes estimates on bot activity, so unless someone is citing a research paper with a reasonable bot-id technique, they’re speculating. That said, there are a few useful things we can say with only modest speculation:

    1. No commercial social media site has as trivial a sign up process as these instances. They had no email verification, no captcha, and no validation or gating process of any kind. Scripts created this users with a single API call, hitting it as fast as the server would respond. So on the account validation front, reddit does better than these instances of keeping bots out.
    2. Every commercial social media site has a security team that attempts to monitor bots and has the capability to remove them. Some of these admins were aware of the signups, and others didn’t know how to respond. So on the monitoring and response front, reddit is more sophisticated at detecting and responding to bots.
    3. These instances I believe had zero or one active users vs 100k+ bot accounts. It’s hard to say what the bot rates are on commercial social media sites, but I think we can confidently bound it to something lower than 100k to 1 in favor of bots.
    4. The aggregate number of bots represented about half the total lemmyverse. I’m sure someone will disagree with me, but I would be pretty surprised if half the signups at commercial sites are malicious. But that’s more plausible than 100k to 1.
    5. But one the other hand, the activity of these bots is public, and they demonstrably didn’t do anything. At least some of the malicious/clandestine bot accounts on commercial social media sites are active… so maybe here Lemmy gets the win since this massive wave of bots went unused. Now, that doesn’t mean that OTHER more sophisticated and undetected bits aren’t active on Lemmy just as they are on other social sites. But my bet is there is little to none because Lemmy doesn’t matter enough to be worth attacking by the people who are able to run sophisticated bots. But this is hard to prove one way or another.

    TLDR: This signup wave was so unsophisticated it would never have been possible on a major social site with a security team. But it also didn’t do any altanfible damage, unlike clandestine bot activity on major social sites. Depending on what metrics you use to compare (and how made up your metrics are, since this is all about activity that attempts to stay hidden), either side can come out on top.