Use llama cpp. It uses cpu so you don’t have to spend $10k just to get a graphics card that meets the minimum requirements. I run it on a shitty 3.0ghz Amd 8300 FX and it runs ok. Most people probably have better computers than that.
Note that gpt4all runs on top of llama cpp and despite gpt4all having a gui, it isn’t any easier to use than llamacpp so you might as well use the one with less bloat. Just remember if something isn’t working on llamacpp, it’s also going to not work in exactly the same way on gpt4all.
Follow the 1 click installation instructions part way down and complete steps 1-3
When step 3 is done, if there were no errors, the web ui should be running. It should show the URL in the command window it opened. In my case it shows “https://127.0.0.1:7860”. Input that into a web browser of your choice
Now you need to download a model as you don’t actually have anything to run. For simplicity sake, I’d start with a small 7b model so you can quickly download it and try it out. Since I don’t know your setup, I’ll recommend using GGUF file formats which work with Llama.cpp which is able to load the model onto your CPU and GPU.
If you only have 16 gigs you can try something on those pages by going to /main and using a Q3 instead of a Q4 (quantization) but that’s going to degrade the quality of the responses.
Once that is finished downloading, go to the folder you installed the web-ui at and there will be a folder called “models”. Place the model you download into that folder.
In the web-ui you’ve launched in your browser, click on the “model” tab at the top. The top row of that page will indicate no model is loaded. Click the refresh icon beside that to refresh the model you just downloaded. Then select it in the drop down menu.
Click the “Load” button
If everything worked, and no errors are thrown (you’ll see them in the command prompt window and possibly on the right side of the model tab) you’re ready to go. Click on the “Chat” tab.
Enter something in the “send a message” to begin a conversation with your local AI!
Now that might not be using things efficiently, back on the model tab, there’s “n-gpu-layers” which is how much to offload to the GPU. You can tweak the slider and see how much ram it says it’s using in the command / terminal window and try to get it as close to your video cards ram as possible.
Then there’s “threads” which is how many cores your CPU has (non virtual) and you can slide that up as well.
Once you’ve adjusted those, click the load button again, see that there’s no errors and go back to the chat window. I’d only fuss with those once you have it working, so you know it’s working.
Also, if something goes wrong after it’s working, it should show the error in the command prompt window. So if it’s suddenly hanging or something like that, check the window. It also posts interesting info like tokens per second, so I always keep an eye on it.
Oh, and TheBloke is a user who converts so many models into various formats for the community. He’ll have a wide variety of gguf models available on HuggingFace, and if formats change over time, he’s really good at updating them accordingly.
So I got the model working (TheBloke/PsyMedRP-v1-20B-GGUF). How do you jailbreak this thing? A simple request comes back with “As an AI, I cannot engage in explicit or adult content. My purpose is to provide helpful and informative responses while adhering to ethical standards and respecting moral and cultural norms. Blah de blah…” I would expect this llm to be wide open?
On Xitter I used to get ads for Replika. They say you can have a relationship with an AI chatbot and it has a sexy female avatar that you can customise. It weirded me out a lot so I’m glad I don’t use Xitter anymore.
Plenty of better and better models coming out all the time. Right now I recommend, depending on what you can run:
7B: Openhermes 2 Mistral 7B
13B: XWin MLewd 0.2 13B
XWin 0.2 70B is supposedly even better than ChatGPT 4. I’m a little skeptical (I think the devs specifically trained the model on gpt-4 responses) but it’s amazing it’s even up for debate.
Chat bot created by Riley Ried in partnership with Lana Rhodes. A $30 monthly sub for unlimited chats. Not much for simps looking for a trusted and time tested performer partner /s
What’s an uncensored ai model thats better at sex talk than Wizard uncensored? Asking for a friend.
I, uh, hear it’s good from a friend.
https://huggingface.co/TheBloke/PsyMedRP-v1-20B-GGUF?not-for-all-audiences=true
13b too.
i see… I’ll have to ramp up my hardware exponentially …
Use llama cpp. It uses cpu so you don’t have to spend $10k just to get a graphics card that meets the minimum requirements. I run it on a shitty 3.0ghz Amd 8300 FX and it runs ok. Most people probably have better computers than that.
Note that gpt4all runs on top of llama cpp and despite gpt4all having a gui, it isn’t any easier to use than llamacpp so you might as well use the one with less bloat. Just remember if something isn’t working on llamacpp, it’s also going to not work in exactly the same way on gpt4all.
Gonna look into that - thanks
Check this out
https://github.com/oobabooga/text-generation-webui
It has a one click installer and can use llama.cpp
From there you can download models and try things out.
If you don’t have a really good graphics card, maybe start with 7b models. Then you can try 13b and compare performance and results.
Llama.cpp will spread the load over the cpu and as much gpu as you have available (indicated by layers that you can set on a slider)
Never heard of it. Have you compared to Mythalion?
Haven’t compared it to much yet, I stopped toying with LLMs for a few months and a lot has changed. The new 4k contexts are a nice change though.
Is there a post somewhere on getting started using things like these?
I don’t know a specific guide, but try these steps
Go to https://github.com/oobabooga/text-generation-webui
Follow the 1 click installation instructions part way down and complete steps 1-3
When step 3 is done, if there were no errors, the web ui should be running. It should show the URL in the command window it opened. In my case it shows “https://127.0.0.1:7860”. Input that into a web browser of your choice
Now you need to download a model as you don’t actually have anything to run. For simplicity sake, I’d start with a small 7b model so you can quickly download it and try it out. Since I don’t know your setup, I’ll recommend using GGUF file formats which work with Llama.cpp which is able to load the model onto your CPU and GPU.
You can try this either of these models to start
https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF/blob/main/mistral-7b-v0.1.Q4_0.gguf (takes 22gig of system ram to load)
https://huggingface.co/TheBloke/vicuna-7B-v1.5-GGUF/blob/main/vicuna-7b-v1.5.Q4_K_M.gguf (takes 19gigs of system ram to load)
If you only have 16 gigs you can try something on those pages by going to /main and using a Q3 instead of a Q4 (quantization) but that’s going to degrade the quality of the responses.
Once that is finished downloading, go to the folder you installed the web-ui at and there will be a folder called “models”. Place the model you download into that folder.
In the web-ui you’ve launched in your browser, click on the “model” tab at the top. The top row of that page will indicate no model is loaded. Click the refresh icon beside that to refresh the model you just downloaded. Then select it in the drop down menu.
Click the “Load” button
If everything worked, and no errors are thrown (you’ll see them in the command prompt window and possibly on the right side of the model tab) you’re ready to go. Click on the “Chat” tab.
Enter something in the “send a message” to begin a conversation with your local AI!
Now that might not be using things efficiently, back on the model tab, there’s “n-gpu-layers” which is how much to offload to the GPU. You can tweak the slider and see how much ram it says it’s using in the command / terminal window and try to get it as close to your video cards ram as possible.
Then there’s “threads” which is how many cores your CPU has (non virtual) and you can slide that up as well.
Once you’ve adjusted those, click the load button again, see that there’s no errors and go back to the chat window. I’d only fuss with those once you have it working, so you know it’s working.
Also, if something goes wrong after it’s working, it should show the error in the command prompt window. So if it’s suddenly hanging or something like that, check the window. It also posts interesting info like tokens per second, so I always keep an eye on it.
Oh, and TheBloke is a user who converts so many models into various formats for the community. He’ll have a wide variety of gguf models available on HuggingFace, and if formats change over time, he’s really good at updating them accordingly.
Good luck!
So I got the model working (TheBloke/PsyMedRP-v1-20B-GGUF). How do you jailbreak this thing? A simple request comes back with “As an AI, I cannot engage in explicit or adult content. My purpose is to provide helpful and informative responses while adhering to ethical standards and respecting moral and cultural norms. Blah de blah…” I would expect this llm to be wide open?
Wow I didn’t expect such a helpful and thorough response! Thank you kind stranger!
Stupid newbie question here, but when you go to a HuggingFace LLM and you see a big list like this, what on earth do all these variants mean?
psymedrp-v1-20b.Q2_K.gguf 8.31 GB
psymedrp-v1-20b.Q3_K_M.gguf 9.7 GB
psymedrp-v1-20b.Q3_K_S.gguf 8.66 GB
etc…
On Xitter I used to get ads for Replika. They say you can have a relationship with an AI chatbot and it has a sexy female avatar that you can customise. It weirded me out a lot so I’m glad I don’t use Xitter anymore.
Plenty of better and better models coming out all the time. Right now I recommend, depending on what you can run:
7B: Openhermes 2 Mistral 7B
13B: XWin MLewd 0.2 13B
XWin 0.2 70B is supposedly even better than ChatGPT 4. I’m a little skeptical (I think the devs specifically trained the model on gpt-4 responses) but it’s amazing it’s even up for debate.
Clona.ai
Chat bot created by Riley Ried in partnership with Lana Rhodes. A $30 monthly sub for unlimited chats. Not much for simps looking for a trusted and time tested performer partner /s
This AI sucks. I’ve tried it. It’s worse than Replika from 4 years ago.