The Rule - eviltoast
  • Ibaudia@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    ·
    2 months ago

    That must be a crazy model. I ran one of their models on my 1660 and it worked just fine.

    • AdrianTheFrog@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      2 months ago

      I don’t have access to llama 3.1 405b but I can see that llama 3 70b takes up ~145 gb, so 405b would probably take 840 gigabytes, just to download the uncompressed fp16 (16 bits / weight) model. With 8 bit quantization it would probably take closer to 420 gb, and with 4 bit it would probably take closer to 210 gb. 4 bit quantization is really going to start harming the model outputs, and its still probably not going to fit in your RAM, let alone VRAM.

      So yes, it is a crazy model. You’d probably need at least 3 or 4 a100s to have a good experience with it.