How the hell does no one talk about this?

101@feddit.org · edit-2 2 months ago

How the hell does no one talk about this?

fool@programming.dev · edit-2 2 months ago

Thanks for the detailed reply! :P

I’d like to converse with every part of what you pointed out – real discussions are always exciting!

…they pay the journals, not the other way around…

Yes of course. It’s not at all relevant?

It’s arguably relevant. Researchers pay journals to display their years of work, then these journals resell those years of work to AI companies who send indirect pressure to researchers for more work. It’s a form of labor where the pay direction is reversed. Yes, researchers are aware that their papers can be used for profit (like medical tech) but they didn’t conceive that it would be sold en masse to ethically dubious, historically copyright-violating, pollution-heavy server farms. Now, I see that you don’t agree with this, since you say:

…not only is it very literally transparent and most models open-weight, and most libraries open-source, but it’s making knowledge massively more accessible.

but I can’t help but feel obliged to share the following evidence.

Though a Stanford report notes that most new models are open source (Lynch, 2024), the models with the most market-share (see this Forbes list) are not open-source. Of those fifty, only Cleanlab, Cohere, Hugging Face (duh), LangChain (among other Python stuff like scikit-learn or tensorflow), Weaviate, TogetherAI and notably Mistral are open source. Among the giants, OpenAI’s GPT-4 et al., Claude, and Gemini are closed-source, though Meta’s LLaMa is open-source.
Transparency is… I’ll cede that it is improving! But it’s also lacking. According to the Stanford 2024 Foundation Model Transparency Index, which uses 100 indicators such as data filtration transparency, copyright transparency, and pollution transparency (Bommasani et al., 2024, p. 27 fig. 8), developers were opaque, including open-source developers. The pertinent summary notes that the mean FMTI company score improved from 37 to 58 over the last year, but information about copyright data, licenses, and guardrails have remained opaque.

I see you also argue that:

With [the decline of effort in average people’s fact-finding] in mind I see no reason not to feed [AI] products of the scientific method, [which is] the most rigorous and highest solution to the problems of epistemology we’ve come up with thus far.

And… I partly agree with you on this. As another commenter said, “[AI] is not going back in the bottle”, so might as well make it not totally hallucinatory. Of course, this should be done in an ethical way, one that respects the rights to the data of all involved.

But about your next point regarding data usage:

…if you actually read the terms and conditions when you signed up to Facebook… and if you listened to the experts then you and these artists would not feel like you were being treated unfairly, because not only did you allow it to happen, you all encouraged it. Now that it might actually be used for good, you are upset. It’s disheartening. I’m sorry, most of you signed it all away by 2006. Data is forever.

That’s a mischaracterization of a lot of views. Yes, a lot of people willfully ignored surveillance capitalism, but we never encouraged it, nor did we ever change our stance from affirmatory to negative because the data we intentionally or inadvertently produced began to be “used for good”. One of the earliest surveillance capitalism investigators, Harvard Business School professor Shoshana Zuboff, confirms that we were just scared and uneducated about these things outside of our control.

“Every single piece of research, going all the way back to the early 2000s, shows that whenever you expose people to what’s really going on behind the scenes with surveillance capitalism, they don’t want anything to do [with] it. The only reason we keep engaging with it is because we feel like we have no choice. …[it] is a colossal market failure. Because it is not giving people what people want. …everything that’s inside that choice [i.e. the choice of picking between convenience and privacy] has been designed to keep us in ignorance.” (Kulwin, 2019)

This kind of thing – corporate giants giving up thousands of papers to AI – is another instance of people being scared. But it’s not fearmongering. Fearmongering implies that we’re making up fright where it doesn’t really exist; however, there is indeed an awful, fear-inducing precedent set by this action. Researchers now have to live with the idea that corporations, these vast economic superpowers, can suddenly and easily pivot into using all of their content to fuel AI and make millions. This is the same content they spent years on, that they intended for open use in objectively humanity-supporting manners by peers, the same content they had few alternative options for storage/publishing/hosting other than said publishers. Yes, they signed the ToS and now they’re eating it. We’re evolving towards the future at breakneck pace – what’s next? they worry, what’s next?

(Comment 1/2)

fool@programming.dev · edit-2 2 months ago

Speaking of fearmongering, you note that:

an artist getting their style copied

So if I go to an art gallery for inspiration I must declare this in a contract too? This is absurd. But to be fair I’m not surprised. Intellectual property is altogether an absurd notion in the digital age, and insanity like “copyrighting styles” is just the sharpest most obvious edge of it.

I think also the fearmongering about artists is overplayed by people who are not artists.

Ignoring the false equivalency between getting inspiration at an art gallery and feeding millions of artworks into a non-human AI for automated, high-speed, dubious-legality replication and derivation, copyright is how creative workers retain their careers and find incentivization. Your Twitter experiences are anecdotal; in more generalized reality:

Chinese illustrator jobs purportedly dropped by 70% in part due to image generators
Lesser-known artists are being hindered from making themselves known as visual art venues close themselves to known artists in order to reduce AI-generated issues – the opposite of democratizing art
Artists have reported using image generators to avoid losing their jobs
Artists’ works, such as those by Hollie Mengert and Karen Hallion among others, have been used without their compensation, attribution, nor consent in training data – said style mimicries have been described as “invasive” (someone can steal your mode of self-expression) and reputationally damaging – even if the style mimicries are solely “surface-level”

The above four points were taken from the Proceedings of the 2023 AIII/ACM Conference on AI, Ethics, and Society (Jiang et al., 2023, section 4.1 and 4.2).

Help me understand your viewpoint. Is copyright nonsensical? Are we hypocrites for worrying about the ways our hosts are using our produced goods? There is a lot of liability and a lot of worry here, but I’m having trouble reconciling: you seem to be implying that this liability and worry are unfounded, but evidence seems to point elsewhere.

Thanks for talking with me! ^ᴗ^

(Comment 2/2)