Google updates privacy policy to train its AI on everything you post online

fne8w2ah@lemmy.world · 1 year ago

Google updates privacy policy to train its AI on everything you post online

agitatedpotato@lemmy.world · 1 year ago

Theres no situation in which I can envision AI scraping the open internet to be a good way to train them. Stop doing things cheaply and curate it yourself or you’re gonna get what you paid for, which in this case is mostly free trash content.

Flying Squid@lemmy.world · 1 year ago

What’s going to be fun is when they start scraping their own output and it becomes a recursive nightmare.

agitatedpotato@lemmy.world · edit-2 1 year ago

As if the internet isn’t full of regurgitated lies already, can’t wait for the AI to reinforce them into itself.

Zarxrax@lemmy.world · 1 year ago

They need massive amounts of data. There is simply no way to manually curate data on that scale, short of hiring like a million people. It’s very likely that they do use some sort of automated filtering to curate the data though.

Preston Maness ☭@lemmygrad.ml · 1 year ago

They need massive amounts of data. There is simply no way to manually curate data on that scale, short of hiring like a million people. It’s very likely that they do use some sort of automated filtering to curate the data though.

If we can throw tens of millions of soldiers into meat grinders for wars, then I think hiring a few million people to curate data is table stakes by comparison.

dystop@lemmy.world · 1 year ago

To be fair everything on the web is already being used to train AI.

fne8w2ah@lemmy.world · 1 year ago

deleted by creator

Chainweasel@lemmy.world · 1 year ago

Sounds like a good way to speed up sample collapse. Eventually AI training data will include enough AI written comments and articles that the training data or “sample” will be contaminated by it resulting in training data that makes the AI worse, kind of like inbreeding in humans expressing recessive genetic disorders like diseases and birth defects. Once that happens they’ll need to find a way to accurately detect and remove AI data from the sample to continue improving the AI. But even with the early generations of AI we’re using now it’s incredibly hard to automate that with any kind of real accuracy and attempts have caught a lot of false positives and let a lot of legitimate AI generated texts through.
It’s interesting that the limiting factor on AI development may be the fact that it was released to the public before further training could be done.

Greenskye@lemmy.world · 1 year ago

A big hurdle of AI is the fact that they really can’t ‘learn’, at least not like humans can, where we filter out bad data or go back and correct previous assumptions (not that we do this perfectly). Seems like anyone who’s able to truly figure out how to teach AI without needing super-clean data sets will have basically unlocked something pretty close to the singularity. Which makes me assume that we’re honestly no where close to figuring that out and that sample collapse is much more likely (with possibly the internet as a whole being effectively ruined, same as voice calls have been effectively ruined by rampant spam).

grte@lemmy.ca · 1 year ago

As AI generated content proliferates the internet won’t this lead to, I don’t know I’m not an expert on this, a worse and worse environment to actually learn from human input? Like, more and more spoiled data inputs should lead to some weird results, wouldn’t you think?