I'm Starting A Search Engine For The Fediverse - eviltoast

Hey everyone,

This isn’t an announcement, just wanted peoples thoughts on this.

I think everyone knows searching the fediverse can be better. Googling doesn’t work too well, etc. So I wanted to do my part and help out.

Indexing all posts, etc is quite a lot to handle, so I wanted to start small and just focus on video search. I’ve started indexing videos from Peertube and other video websites. (Even YouTube but this could be removed to just focus on independent sites)

I know Peertube has their own search engine for videos. I will be reaching out to them. Compared to my site I’m planning it’ll have other video sources and be easier to use.

So that leads to feedback from you guys.

  • What do you think about indexing videos posted on the fediverse and other independent platforms?
  • Are there similar services?
  • Am I just wasting my time?
    • MHLoppy@fedia.io
      link
      fedilink
      arrow-up
      11
      ·
      11 months ago

      It’s worth noting that since FedSearch, Mastodon has actually natively implemented opt-in search on posts.

    • lautan@lemmy.caOP
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      2
      ·
      11 months ago

      That’s a good point. But those people can be banned? I guess Reddit handles this by moderation and archiving old posts.

      • JustEnoughDucks@feddit.nl
        link
        fedilink
        English
        arrow-up
        4
        ·
        11 months ago

        Yes, but moderation teams on the fediverse are very small, and by nature of it, can make hundreds of account of different servers all trailing that would need to be individually sought out and banned.

        It is a game of cat & 100 mice

      • gabe [he/him]@literature.cafe
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        4
        ·
        11 months ago

        People will take the harassment off site especially if they are dedicated enough or use it to scrape for potential personal info to publicly release.

        • deweydecibel@lemmy.world
          link
          fedilink
          English
          arrow-up
          12
          ·
          11 months ago

          How is that different from Reddit? If trolls want to search and scrape and find information on people, they’re going to. You can’t put your information on the open Internet and not appreciate there’s always a danger of that.

          • Lucia
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            11 months ago

            deleted by creator

          • TimLovesTech (AuDHD)(he/him)@badatbeing.social
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            3
            ·
            11 months ago

            There is more effort barrier if the trolls have to do all the scraping and sorting themselves than just popping a term that is a right wing lightning rod into search and getting a list of targets.

    • 0x1C3B00DA@kbin.social
      link
      fedilink
      arrow-up
      5
      arrow-down
      1
      ·
      11 months ago

      That post wasn’t claiming that a search engine would only be used by trolls; it was explaining that they shut down their project because a chunk of the fediverse thinks that and complain about any search engine projects. Discoverability is one of the network’s biggest challenges and a search engine could really help with that.

      • TimLovesTech (AuDHD)(he/him)@badatbeing.social
        link
        fedilink
        English
        arrow-up
        4
        ·
        11 months ago

        Yes, not only used by trolls, but would be a tool that could be leveraged by trolls. And I think the fediverse makes it easier to establish instances for marginalized groups, but also has more admins that just don’t want trolls because nobody here is making $ off them like the corporate socials are. I think if adding search that is going to try and vacuum up everyone’s posts in the fediverse and make them easily sortable/targetable without instance admins permission, then that isn’t cool. If someone is running a general instance that covers nothing that a troll could latch onto and wants the instance catalogued and searchable then that’s fine by me. I don’t think boys should be doing that to the fediverse as a whole without admin permission though.

        • 0x1C3B00DA@kbin.social
          link
          fedilink
          arrow-up
          2
          ·
          11 months ago

          I don’t think an admin’s permission has anything to do with it. If you post publicly on the fediverse, your posts are public. You should have the option to opt out of any indexing (just like you do for the rest of the open web). But saying its ok for you to read this post if it happens to come across your feed but you shouldn’t be allowed to find it via a search is ridiculous. Users get to make the choice with each post whether its public or not, but they don’t get to control how people consume those public posts.

          • TimLovesTech (AuDHD)(he/him)@badatbeing.social
            link
            fedilink
            English
            arrow-up
            1
            ·
            11 months ago

            Reading a post and having a bot thrashing a server indexing everything are 2 different things. If a user used the site like that they would be throttled and if repeated afterwards, banned. It is also one thing to read/interact with a site as that adds value to the site as a whole. A bot that just mass hits links cataloging everything is just a strain on the server an Admin needs to support, with no upside for the instance, as it’s a bot ingesting and no real interaction actually took place.

            • 0x1C3B00DA@kbin.social
              link
              fedilink
              arrow-up
              1
              ·
              11 months ago

              and having a bot thrashing a server indexing everything

              This is a completely separate argument and one that we already have mechanisms for. Servers can use status codes and headers to warn about rate limits and block offenders.

              It is also one thing to read/interact with a site as that adds value to the site as a whole

              A search index adds value as well; that’s why this keeps coming up. And, again, there are existing mechanisms to handle this. A robots.txt file can indicate you don’t want to be crawled and offenders can be IP blocked

              • Rednax@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                11 months ago

                Should a dedicated search not use/index ActivityPub instead of the html interface?

                If so, instances can simply defederate from search engine instances. So the point you are trying to make still holds.

    • ggsu7@futurology.today
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      21
      ·
      11 months ago

      >muh trolls

      God shut up please. Why do you have to ruin something amazing like searching the entire fediverse with a meaningless arguments about muh trolls.

      • gabe [he/him]@literature.cafe
        link
        fedilink
        English
        arrow-up
        11
        arrow-down
        8
        ·
        edit-2
        11 months ago

        But they are correct. There are vulnerable groups of people who have a harassment risk against them. We share the fediverse with others, be mindful of that. Making a search engine or an archiver for lemmy is such a good idea with how it functions! but for the wider fediverse… that’s just directly contradictory to its culture unless it can be opted in by instance and users

        • rglullis@communick.news
          link
          fedilink
          English
          arrow-up
          11
          arrow-down
          1
          ·
          11 months ago

          There are vulnerable groups of people who have a harassment risk against them.

          People that are at risk for what they write on the public internet should be protected and empowered by having better privacy tools, not by pretending that they can have a “safe space” on the public internet.

          There is no such thing as privacy on the internet. The Fediverse makes it seem that it mitigates the surveillance problem by spreading the information around and not having it under the control of one single large entity, but the truth is that the Fediverse makes it actually easier for dedicated malicious actors to collect data and reach their targets.

          • smeg@feddit.uk
            link
            fedilink
            English
            arrow-up
            5
            arrow-down
            2
            ·
            11 months ago

            Exactly. If you’re worried about someone searching for your post then you should not be posting it online (or at least ensuing you use an account that’s anonymous enough that it can’t be associated with you). If you want private chats then set up a group in matrix or signal!

        • 0x1C3B00DA@kbin.social
          link
          fedilink
          arrow-up
          3
          arrow-down
          1
          ·
          11 months ago

          Those vulnerable groups should have the tools to protect themselves, but that shouldn’t stop the rest of us from having a functional and discoverable system. The internet, and the fediverse specifically, have always been a semi-public space and searchability has been a part of that since the beginning.

        • ggsu7@futurology.today
          link
          fedilink
          English
          arrow-up
          9
          arrow-down
          25
          ·
          11 months ago

          Unpopular opinion: overly sensitive people should not be allowed to use the internet. Why should everything revolve around their insecurities? Grow a thicker skin or stop using the internet.

          • kaffiene@lemmy.world
            link
            fedilink
            English
            arrow-up
            7
            arrow-down
            1
            ·
            11 months ago

            If people have set up communities that work with standards that you don’t like, you don’t have to be part of it. Equally you don’t get to dictate standards for them.

            • Womble@lemmy.world
              link
              fedilink
              English
              arrow-up
              3
              arrow-down
              2
              ·
              11 months ago

              But equally equally, if they set up their own communities in public but just an obscure location, they shouldn’t complain that their public posts are public. Security by obscurity is no security. Frankly its the worst of all worlds to have a place like that as it encourages feeling safe while having the possibility of having the rug pulled out from under you at any moment.

          • spiderman@ani.social
            link
            fedilink
            English
            arrow-up
            5
            arrow-down
            2
            ·
            11 months ago

            i don’t support this statement but if there is a safe space on some part of fediverse that is dedicated for people who suffer from insecurities or other problems, why can’t it be indexed so that more people can find a safe space for them too?

            most of the fediverse platforms have good moderation tools that can even ban a whole instance, so why are we still trying to gatekeep the safe spaces?

            • TimLovesTech (AuDHD)(he/him)@badatbeing.social
              link
              fedilink
              English
              arrow-up
              2
              ·
              11 months ago

              And once you have this index of a vulnerable group of people how do you gatekeep that from the trolls? And moderation tools only work on this platform, the problem is the trolls that take that info to everything these people interact with and make it a game to give them no peace.

      • spiderman@ani.social
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        4
        ·
        edit-2
        11 months ago

        yes, anything you post on internet can be indexed. if someone wants to post some thing on their little private garden they are options for that too. fediverse has potential to grow and if we try to stop everything that could help to grow it as “no only trolls will use it”, after some point no body except people who complaint won’t use it. do you want fediverse to be your own little echo chamber?