I'm Starting A Search Engine For The Fediverse - eviltoast

Hey everyone,

This isn’t an announcement, just wanted peoples thoughts on this.

I think everyone knows searching the fediverse can be better. Googling doesn’t work too well, etc. So I wanted to do my part and help out.

Indexing all posts, etc is quite a lot to handle, so I wanted to start small and just focus on video search. I’ve started indexing videos from Peertube and other video websites. (Even YouTube but this could be removed to just focus on independent sites)

I know Peertube has their own search engine for videos. I will be reaching out to them. Compared to my site I’m planning it’ll have other video sources and be easier to use.

So that leads to feedback from you guys.

  • What do you think about indexing videos posted on the fediverse and other independent platforms?
  • Are there similar services?
  • Am I just wasting my time?
  • Valmond@lemmy.mindoki.com
    link
    fedilink
    English
    arrow-up
    8
    ·
    11 months ago

    I love the idea, especially from a technical standpoint!

    How big is the fediverse today? How many posts are there? What kind of algorithms atmre you using to store the results? Do you scan sites and then their connected sites or do you have a premade list?

    More technical information please 😊!

    • lautan@lemmy.caOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      11 months ago

      The fediverse is a few thousand servers, from Mastodon, Lemmy, etc. Can’t say the amount of posts but there are a lot.

      So on the more technical side, I plan on using a light weight fast search engine called Sonic (It’s written in rust). I have already used it in other projects and it can handle billions of messages / posts. But it has a cost it doesn’t have faceted search, like for example if you want to exclude certain texts from the results. I think this is a fair trade off. The other solution would be to use something more mature like ElasticSearch but it’ll be expensive (I’m assuming not much money will be made from this and I’m talking about donations)

      For scanning sites there are premade lists to start with and it’ll be possible to scan new sites from other instances if found. So a bit of both.