Testing the Limits: My GTX 1070 Rig vs Mistral Small 22B

Smokeydope@lemmy.world · 3 months ago

Testing the Limits: My GTX 1070 Rig vs Mistral Small 22B

Smokeydope@lemmy.world · edit-2 3 months ago

Hey @brucethemoose hope you don’t mind if I ding you one more time. Today I loaded up with qwen 14b and 32b. Yes, 32B (Q3_KS). I didn’t do much testing with 14B but it spoke well and fast. Was more excited to play with the 32B once I found out it would run to be honest. It just barely makes the mark of tolerable speed just under 2T/s (really more like 1.7 with some context loaded in). I really do mean barely, the people who think 5t/s is slow would eat their heart out. However that reasoning and coherence though? Off the charts. I like the way it speaks more than mistral small too. So wow just wow is all I can say. Can’t believe all the good models that came out in such a short time and leaps made in the past two months. Thank you again for recommending qwen don’t think I would have tried the 32B without your input.

brucethemoose@lemmy.world · edit-2 3 months ago

Good! Try the IQM, XS, and XSS quantizations as well, especially if you try a 14B, as they “squeeze” the model into less space better than the Q3_K quantizations.

Yeah I’m liking the 32B as well. If you are looking for speed just for ultilitarian Q/A, you might want to keep a Deepseek Lite V2 Code GGUF on hand, as it’s uber fast partially offloaded.