cm0002@lemmy.cafe to Technology@lemmy.zipEnglish · 11 days agoResearchers Jailbreak AI by Flooding It With Bullshit Jargonwww.404media.coexternal-linkmessage-square17linkfedilinkarrow-up188arrow-down14cross-posted to: chatgpt@lemmy.worldtechnology@lemmy.ml
arrow-up184arrow-down1external-linkResearchers Jailbreak AI by Flooding It With Bullshit Jargonwww.404media.cocm0002@lemmy.cafe to Technology@lemmy.zipEnglish · 11 days agomessage-square17linkfedilinkcross-posted to: chatgpt@lemmy.worldtechnology@lemmy.ml
minus-squareiAvicenna@lemmy.worldlinkfedilinkEnglisharrow-up4arrow-down1·11 days agoI wonder if they tried this on DeepSeek with Tiananmen square queries
minus-squareSheeEttin@lemmy.ziplinkfedilinkEnglisharrow-up4·11 days agoNo, those filters are performed by a separate system on the output text after it’s been generated.
minus-squareiAvicenna@lemmy.worldlinkfedilinkEnglisharrow-up1·11 days agomakes sense though I wonder if you can also tweak the initial prompt so that the output is also full of jargon so that output filter also misses the context
minus-squareSheeEttin@lemmy.ziplinkfedilinkEnglisharrow-up1·10 days agoYes. I tried it, and it only filtered English and Chinese. If I told it to use Spanish, it didn’t get killed.
I wonder if they tried this on DeepSeek with Tiananmen square queries
No, those filters are performed by a separate system on the output text after it’s been generated.
makes sense though I wonder if you can also tweak the initial prompt so that the output is also full of jargon so that output filter also misses the context
Yes. I tried it, and it only filtered English and Chinese. If I told it to use Spanish, it didn’t get killed.