Hacker plants false memories in ChatGPT to steal user data in perpetuity - eviltoast
  • fmstrat@lemmy.nowsci.com
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    edit-2
    2 months ago

    Sort of, but not really.

    In basic terms, if an LLM’s training data has:

    Bob is 21 years old.

    Bob is 32 years old.

    Then when it tries to predict the next word after “Bob is”, it would pick 21 or 32 assuming somehow the weights were perfectly equal between the two (weight being based on how many times it occurred in training data around other words).

    If the user has memories turned on, it’s sort of like providing additional training data. So if in previous prompts you said:

    I am Bob.

    I am 43 years old.

    The system will parse that and use it with a higher weight, sort of like custom training the model. This is not exactly how it works, because training is much more in-depth, it’s more of a layer on top of the training, but hopefully gives you an idea.

    The catch is it’s still not reliable, as the other words in your prompt may still lead the LLM to predict a word from it’s original training data. Tuning the weights is not a one-size fits all endeavor. What works for:

    How old am I?

    May not work for:

    What age is Bob?

    For instance.