Self hosted LLM - eviltoast

Hello internet users. I have tried gpt4all and like it, but it is very slow on my laptop. I was wondering if anyone here knows of any solutions I could run on my server (debian 12, amd cpu, intel a380 gpu) through a web interface. Has anyone found any good way to do this?

  • Kir@feddit.it
    link
    fedilink
    English
    arrow-up
    4
    ·
    10 months ago

    Have you been able to use it with your AMD GPU? I have a 6800 and would like to test something

    • passepartout@feddit.de
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      edit-2
      10 months ago

      Yes, since we have similar gpus you could try the following to run it in a docker container on linux, taken from here and slightly modified:

      #!/bin/bash
      
      model=microsoft/phi-2
      # share a volume with the Docker container to avoid downloading weights every run
      volume=<path-to-your-data-directory>/data
      
      docker run -e HSA_OVERRIDE_GFX_VERSION=10.3.0 -e PYTORCH_ROCM_ARCH="gfx1031" --device /dev/kfd --device /dev/dri --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.4-rocm --model-id $model
      

      Note how the rocm version has a different tag and that you need to mount your gpu device into the container. The two environment variables are specific to my (any maybe yours also) gpu architecture. It will need a while to download though.