meirl

idunnololz_test@lemmy.ml · 1 year ago

meirl

idunnololz_test@lemmy.ml · 1 year ago

Could still be a software issue. Someone said this already but it could be possible that Lemmy.world is using a load balancer and multiple servers. These two servers’ authentication tokens may be out of sync. So if you hit server 1 and you are sign in to server 1, you’re good. If you hit server 2, you’re signed out all of a sudden. This can also explain why the issue started to happen abruptly today. It’s possible the load on the server wasn’t that bad yesterday so the load balancer didn’t kick in. This is all speculation. Will have to wait for an official message to confirm anything.

azvasKvklenko@sh.itjust.works · 1 year ago

I set up infrastructure for web apps and what you are describing is still most likely server config issue, not Lemmy issue itself, unless Lemmy is lacking something to allow load balancing (then the bug is missing feature actually, also I don’t think so). I don’t know how Lemmy keeps/reads its sessions, but usually it doesn’t matter from the application code standpoint. Preparing multi-host setup as an admin you need to take care about each instance accessing the same session data or whatever application data needs to be shared anyeay. There are many options:

Database: it’s not good for DB performance and is usually avoided. The problem cannot occur here as all the instances have to access same database (or replicas) in the first place
Filesystem: the problem can occur here, but can be worked around with CIFS or NFS, which hits performance
Redis: good for performance, as many hosts as you want can access the same Redis instance (unless Redis is overloaded, which is pretty hard for small session values)
Memcached: also an option, but all sessions would be gone on service restart
…?

The load balancing scenario where all requests are handled by one host and the other only takes requests when the other is overloaded, is very unlikely. The most common algorithms for balancing are roundrobin - which means (more-or-less) split connections (not load!) equally across all targets, and leastconn - which means hit the host that is least busy with active connections. I mean of course they could’ve used ‘fallback’ alhorithm, but it’s rather inefficient in most scenarios.

Or maybe the issue is somewhere else, is caused by full-page/CDN cache etc.