Why wordfreq will not be updated - AI spam - eviltoast
  • UnseriousAcademic@awful.systems
    link
    fedilink
    English
    arrow-up
    36
    ·
    2 months ago

    Man I feel this, particularly the sudden shutting down of data access because all the platforms want OpenAI money. I spent three years building a tool that pulled follower relation data from Twitter and exponentially crawled it’s way outwards from a few seed accounts to millions of users. Using that data it was able to make a compressed summary network, identify community structures, give names to the communities based on words in user profiles, and then use sampled tweet data to tell us the extent to which different communities interacted.

    I spent 8 months in ethics committees to get approval to do it, I got a prototype working, but rather than just publish I wanted to make it accessible to the academic community so I spent even more time building an interface, making it user friendly, improving performance, making it more stable etc.

    I wanted to ensure that when we published our results I could also say “here is this method we’ve developed, and here you can test it and use it too for free, even if you don’t know how to code”. Some people at my institution wanted me to explore commercialising but I always intended to go open source. I’m not a professional developer by any means so the project was always going to be a janky academic thing, but it worked for our purposes and was a new way of working with social media data to ask questions that couldn’t be answered before.

    Then the API got put behind a $48K a month paywall and the project was dead. Then everywhere else started shutting their doors too. I don’t do social media research anymore.

    • ahopefullycuterrobot@awful.systems
      link
      fedilink
      English
      arrow-up
      12
      ·
      2 months ago

      After my own heart right here. I followed some version of Luca Hammer’s guide to categorise everyone I followed on Twitter into communities, then created rss feeds of them using nitter. It was fascinating seeing how they clustered together. I think I still have an old gephi file with that output. I did this before Musk bought Twitter, since I knew he was going to wreck it.

      Basically, I would have killed for this tool.

      (I’m now wondering if anyone’s published a guide on this for bluesky.)

    • YourNetworkIsHaunted@awful.systems
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 months ago

      I would wager that, more than the costs of serving these API calls, preserving the opacity of the resultant network is probably part of the advantage these companies get from locking down their APIs. Given how much flak they already get for the mental and social damage done by social media and Twitter specifically, I suspect they’re very happy to preserve as much of the black boxiness as they can so they can point to the value users get and their ad revenue and say that all the costs are unfortunate coincidents rather than central problems with the paradigm.

  • Serinus@lemmy.world
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    6
    ·
    2 months ago

    It’s an excellent read on a safe site. I appreciate OP doing the opposite of clickbait, but if this interests you at all, check it out.

    • V0ldek@awful.systems
      link
      fedilink
      English
      arrow-up
      7
      ·
      2 months ago

      but if this interests you at all, check it out.

      ye that’s how most normal people use the internet? what’s the alternative strategy, checking it out if it doesn’t interest you?

      • UnseriousAcademic@awful.systems
        link
        fedilink
        English
        arrow-up
        6
        ·
        2 months ago

        To be fair I’ve spent an inordinate amount of time looking at stuff on the Internet that doesn’t interest me. Especially since my workplace moved their employee training online.

      • Serinus@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        2 months ago

        The alternative is reading the headline and skipping the article.

        The punchline is in the title, yes, but the article is still worth reading. Maybe I didn’t phrase that well.