Sarah Silverman sues Meta, OpenAI for copyright infringement

AradFort@lemmy.world · 1 year ago

Sarah Silverman sues Meta, OpenAI for copyright infringement

MartianSands@sh.itjust.works · 1 year ago

it is absolutely true that that AI keeps record of everything fed into it

No it isn’t.

A properly trained deep learning system will ultimately far smaller than all of the data it’s been trained on. It’s simply impossible for it to have retained a record of very much of it at all.

When everything is working correctly it shouldn’t have any of the actual text stored at all. Certainly every single piece of training data will have left some impression on the model, but that’s a very long way from actually storing the training data. The model consists of statistical relationships, not a copy-paste of the inputs.

Strictly speaking there is something resembling text in the model, but it’s made up of the smallest possible units of language (unless there’s been overfitting, in which case the training has gone wrong and there probably would be a case to answer).

The model builds sentances from a list of “phrases” which don’t even need to line up with word boundaries. Things like “is a” might be treated as a “word”, as might “ing”, if the model finds that to be a useful snippet.