Child sex abuse images found in dataset training image generators

squirrel@lemmy.blahaj.zone · 11 months ago

Child sex abuse images found in dataset training image generators

sbv@sh.itjust.works · 11 months ago

However, because he knew that the dataset was “being fed by essentially unguided crawling” of the web, including “a significant amount of explicit material,” he also didn’t rule out the possibility that image generators could also be directly referencing CSAM included in the LAION-5B dataset.

I wonder what a minimum dataset size would be to produce useful LLMs. Dealing with massive a uncurated dataset seems like a bad idea, if you’re concerned about harmful outputs.

Norah - She/They@lemmy.blahaj.zone · 11 months ago

I assumed the images were used for an image generator, not an LLM.

Child sex abuse images found in dataset training image generators

Child sex abuse images found in dataset training image generators

Child sex abuse images found in dataset training image generators, report says