Self hosted LLM - eviltoast

Hello internet users. I have tried gpt4all and like it, but it is very slow on my laptop. I was wondering if anyone here knows of any solutions I could run on my server (debian 12, amd cpu, intel a380 gpu) through a web interface. Has anyone found any good way to do this?

  • passepartout@feddit.de
    link
    fedilink
    English
    arrow-up
    4
    ·
    10 months ago

    I tried Huggingface TGI yesterday, but all of the reasonable models need at least 16 gigs of vram. The only model i got working (on a desktop machine with a amd 6700xt gpu) was microsoft phi-2.

    • Kir@feddit.it
      link
      fedilink
      English
      arrow-up
      4
      ·
      10 months ago

      Have you been able to use it with your AMD GPU? I have a 6800 and would like to test something

      • passepartout@feddit.de
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        edit-2
        10 months ago

        Yes, since we have similar gpus you could try the following to run it in a docker container on linux, taken from here and slightly modified:

        #!/bin/bash
        
        model=microsoft/phi-2
        # share a volume with the Docker container to avoid downloading weights every run
        volume=<path-to-your-data-directory>/data
        
        docker run -e HSA_OVERRIDE_GFX_VERSION=10.3.0 -e PYTORCH_ROCM_ARCH="gfx1031" --device /dev/kfd --device /dev/dri --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.4-rocm --model-id $model
        

        Note how the rocm version has a different tag and that you need to mount your gpu device into the container. The two environment variables are specific to my (any maybe yours also) gpu architecture. It will need a while to download though.

    • HumanPerson@sh.itjust.worksOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      10 months ago

      I know the gpt4all models run fine on my desktop with 8gig vram. It does use a decent chunk of my normal ram though. Could the gpt4all models work on huggingface or do they use different formats? Sorry if I am completely misunderstanding huggingface, I haven’t heard of it until now.

      • passepartout@feddit.de
        link
        fedilink
        English
        arrow-up
        2
        ·
        10 months ago

        Huggingface TGI is just a piece of software handling the models, like gpt4all. Here is a list of models officially supported by TGI, although they state that you can try different ones as well. You follow the link and look for the files section. The size of the model files (safetensors or pickele binaries) gives a good estimate of how much vram you will need. Sadly this is more than most consumer graphics cards have except for santacoder and microsoft phi.

        • HumanPerson@sh.itjust.worksOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 months ago

          I don’t really want to try to get that to work. I wonder how hard it would be to create my own webui using gpt4all’s Python package.