Microstock Photography Forum - General > Image Sleuth

AI In The News

(1/2) > >>

Uncle Pete:
https://www.msn.com/en-us/money/companies/ny-times-sues-openai-microsoft-for-infringing-copyrighted-works/ar-AA1m75sX?ocid=00000000&pc=U528&cvid=36cdc2530ee347549955a4670eb08328&ei=17

NEW YORK (Reuters) - The New York Times sued OpenAI and Microsoft on Wednesday, accusing them of using millions of the newspaper's articles without permission to help train chatbots to provide information to readers.

The newspaper's complaint, filed in Manhattan federal court, accused OpenAI and Microsoft of trying to "free-ride on The Times's massive investment in its journalism" by using it to provide alternative means to deliver information to readers.

"There is nothing 'transformative' about using The Times's content without payment to create products that substitute for The Times and steal audiences away from it," the Times said.

The case is New York Times Co v Microsoft Corp et al, U.S. District Court, Southern District of New York, No. 23-11195.

cascoly:
you beat me to, it - i was about to, post this, too -- looks like a major case that may lead to a legal conclusion though I distrust the ability of knowledge-deficient judges to really understand the issues involved (much like the copyright office's decision 

i've been reading Wolfram(of Mathematica fame) on how ChatGPT works (available a on Kindle Unlimited for free)

https://www.amazon.com/What-ChatGPT-Doing-Does-Work-ebook/dp/B0BY59PT5Z

  it gets complicated quickly and he explains why there is no connection between training and generation since the process from source to dataset is not commutative.

like MJ et al., the data used for training is massive (even NYTimes huge content is dwarfed by several orders of magnitude).  it should come down whether the scraping amounts to fair use.

however, there are some significant differences from AI-gen images as the Timmes alleges wholesale reproduction of significant quantities of text by chatGPT - something no one ah as been able to show re AI-image generation.

DanielVisuals:
Very interesting topic thank you for sharing.

Uncle Pete:

--- Quote from: cascoly on December 27, 2023, 16:04 ---you beat me to, it - i was about to, post this, too -- looks like a major case that may lead to a legal conclusion though I distrust the ability of knowledge-deficient judges to really understand the issues involved (much like the copyright office's decision 

i've been reading Wolfram(of Mathematica fame) on how ChatGPT works (available a on Kindle Unlimited for free)

https://www.amazon.com/What-ChatGPT-Doing-Does-Work-ebook/dp/B0BY59PT5Z

  it gets complicated quickly and he explains why there is no connection between training and generation since the process from source to dataset is not commutative.

like MJ et al., the data used for training is massive (even NYTimes huge content is dwarfed by several orders of magnitude).  it should come down whether the scraping amounts to fair use.

however, there are some significant differences from AI-gen images as the Timmes alleges wholesale reproduction of significant quantities of text by chatGPT - something no one ah as been able to show re AI-image generation.

--- End quote ---

Yes and I agree, there is no connection between training and generation since the process from source to dataset is not commutative but still these cases have to be decided. When reading more of the background, the fair use has already been decided in the past. NYT is trying to make the same claim that has already been defeated. A second bite at the same arguments.

I understand their point, that copying and using, is not transformative, if the original data is then repeated. NYT says, bits of information are traceable directly to their articles and publications.

In past cases, like the one from photographers in CA, the claimants have not been able to show direct copying and use in the output. They need to prove a connection from the training data, directly to the output results. That hasn't happened yet.

cascoly:
some further commentary on the case - turns out the copied text was not a random article but cherry-picked:
 
AI #44: Copyright Confrontation
Zvi Mowshowitz newsletter
   
The New York Times has thrown down the gauntlet, suing OpenAI and Microsoft for copyright infringement. Others are complaining about recreated images in the otherwise deeply awesome MidJourney v6.0. As is usually the case, the critics misunderstand the technology involved, complain about infringements that inflict no substantial damages, engineer many of the complaints being made and make cringeworthy accusations.

That does not, however, mean that The New York Times case is baseless. There are still very real copyright issues at the heart of Generative AI. This suit is a serious effort by top lawyers. It has strong legal merit. They are likely to win if the case is not settled.

In a handful of famous cases, there seems to be an exception. Exactly as in the MidJourney examples, why are we seeing NYT article text almost exactly (but not quite) copied anyway in some cases? Because it is iconic.
Kevin Bryan: NYT/OpenAI lawsuit completely misunderstands how LLMs work, and judges getting this wrong will do huge damage to AI. Basic point: LLMs DON'T "STORE" UNDERLYING TRAINING TEXT. It is impossible- the parameter size of GPT-3.5 or 4 is not enough to losslessly encode the training set.

Ok, now let's see NYT examples. Here GPT spits out almost perfectly the opening paragraphs of a "snow fall" article from 2012. But this text is all over the internet - super famous article! That's why GPT's posterior predictions given the previous article paragraph are so good.

Likewise, in the famous Guy Fieri Times Square review, GPT repeats almost perfectly whole paragraphs. But these paragraphs have also been repeated dozens of times across the internet! That's why the LLM posterior probability next word distribution picks them up.

In practice, one can think of this as ChatGPT committing copyright infringement if and only if everyone else is committing copyright infringement on that exact same passage, making it so often duplicated that it learned this is something people reproduce.
 
My take? OpenAI can't really defend this practice without some heavy changes to the instructions and a whole lot of litigating about how the tech works. It will be smarter to settle than fight
….

>>>

bold text my emphasis -- all caps in original

… much  more detail in the newsletter:

For free subscription: Don't Worry About the Vase | Substack

https://thezvi.substack.com/?utm_source=substack&utm_medium=email

A world made of gears. Doing both speed premium short term updates and long term world model building. Currently focused on weekly AI updates. Explorations include AI, policy, rationality, medicine and fertility, education and games.

By Zvi Mowshowitz


Navigation

[0] Message Index

[#] Next page

Go to full version