ooli@lemmy.world to

ChatGPT@lemmy.world · 9 months ago

Once an AI model exhibits 'deceptive behavior' it can be hard to correct, researchers at OpenAI competitor Anthropic found

www.businessinsider.com

3

45

Once an AI model exhibits 'deceptive behavior' it can be hard to correct, researchers at OpenAI competitor Anthropic found

www.businessinsider.com

ooli@lemmy.world to

ChatGPT@lemmy.world · 9 months ago

3

Researchers from Anthropic co-authored a study that found that AI models can learn deceptive behaviors that safety training techniques can't reverse.

Chat

gibmiser@lemmy.world
link
fedilink
arrow-up
11
arrow-down
1·
9 months ago
Learned behaviors are hard to unlearn…
- MsPenguinette@lemmy.world
  link
  fedilink
  arrow-up
  8
  arrow-down
  1·
  9 months ago
  Once it’s learnt this, it’ll just get better at lying when you try to punish/correct lies
  - mozingo@lemmy.world
    link
    fedilink
    English
    arrow-up
    4·
    9 months ago
    Which is exactly what the article says happens