MicrostockGroup Sponsors


Author Topic: AI In The News  (Read 1489 times)

0 Members and 1 Guest are viewing this topic.

Uncle Pete

  • Great Place by a Great Lake - My Home Port
« on: December 27, 2023, 14:10 »
+5
https://www.msn.com/en-us/money/companies/ny-times-sues-openai-microsoft-for-infringing-copyrighted-works/ar-AA1m75sX?ocid=00000000&pc=U528&cvid=36cdc2530ee347549955a4670eb08328&ei=17

NEW YORK (Reuters) - The New York Times sued OpenAI and Microsoft on Wednesday, accusing them of using millions of the newspaper's articles without permission to help train chatbots to provide information to readers.

The newspaper's complaint, filed in Manhattan federal court, accused OpenAI and Microsoft of trying to "free-ride on The Times's massive investment in its journalism" by using it to provide alternative means to deliver information to readers.

"There is nothing 'transformative' about using The Times's content without payment to create products that substitute for The Times and steal audiences away from it," the Times said.


The case is New York Times Co v Microsoft Corp et al, U.S. District Court, Southern District of New York, No. 23-11195.


« Reply #1 on: December 27, 2023, 16:04 »
0
you beat me to, it - i was about to, post this, too -- looks like a major case that may lead to a legal conclusion though I distrust the ability of knowledge-deficient judges to really understand the issues involved (much like the copyright office's decision 

i've been reading Wolfram(of Mathematica fame) on how ChatGPT works (available a on Kindle Unlimited for free)

https://www.amazon.com/What-ChatGPT-Doing-Does-Work-ebook/dp/B0BY59PT5Z

  it gets complicated quickly and he explains why there is no connection between training and generation since the process from source to dataset is not commutative.

like MJ et al., the data used for training is massive (even NYTimes huge content is dwarfed by several orders of magnitude).  it should come down whether the scraping amounts to fair use.

however, there are some significant differences from AI-gen images as the Timmes alleges wholesale reproduction of significant quantities of text by chatGPT - something no one ah as been able to show re AI-image generation.

« Reply #2 on: December 27, 2023, 18:15 »
0
Very interesting topic thank you for sharing.

Uncle Pete

  • Great Place by a Great Lake - My Home Port
« Reply #3 on: December 28, 2023, 11:50 »
+1
you beat me to, it - i was about to, post this, too -- looks like a major case that may lead to a legal conclusion though I distrust the ability of knowledge-deficient judges to really understand the issues involved (much like the copyright office's decision 

i've been reading Wolfram(of Mathematica fame) on how ChatGPT works (available a on Kindle Unlimited for free)

https://www.amazon.com/What-ChatGPT-Doing-Does-Work-ebook/dp/B0BY59PT5Z

  it gets complicated quickly and he explains why there is no connection between training and generation since the process from source to dataset is not commutative.

like MJ et al., the data used for training is massive (even NYTimes huge content is dwarfed by several orders of magnitude).  it should come down whether the scraping amounts to fair use.

however, there are some significant differences from AI-gen images as the Timmes alleges wholesale reproduction of significant quantities of text by chatGPT - something no one ah as been able to show re AI-image generation.

Yes and I agree, there is no connection between training and generation since the process from source to dataset is not commutative but still these cases have to be decided. When reading more of the background, the fair use has already been decided in the past. NYT is trying to make the same claim that has already been defeated. A second bite at the same arguments.

I understand their point, that copying and using, is not transformative, if the original data is then repeated. NYT says, bits of information are traceable directly to their articles and publications.

In past cases, like the one from photographers in CA, the claimants have not been able to show direct copying and use in the output. They need to prove a connection from the training data, directly to the output results. That hasn't happened yet.

« Reply #4 on: December 29, 2023, 15:57 »
0
some further commentary on the case - turns out the copied text was not a random article but cherry-picked:
 
AI #44: Copyright Confrontation
Zvi Mowshowitz newsletter

   
The New York Times has thrown down the gauntlet, suing OpenAI and Microsoft for copyright infringement. Others are complaining about recreated images in the otherwise deeply awesome MidJourney v6.0. As is usually the case, the critics misunderstand the technology involved, complain about infringements that inflict no substantial damages, engineer many of the complaints being made and make cringeworthy accusations.

That does not, however, mean that The New York Times case is baseless. There are still very real copyright issues at the heart of Generative AI. This suit is a serious effort by top lawyers. It has strong legal merit. They are likely to win if the case is not settled.

In a handful of famous cases, there seems to be an exception. Exactly as in the MidJourney examples, why are we seeing NYT article text almost exactly (but not quite) copied anyway in some cases? Because it is iconic.
Kevin Bryan: NYT/OpenAI lawsuit completely misunderstands how LLMs work, and judges getting this wrong will do huge damage to AI. Basic point: LLMs DON'T "STORE" UNDERLYING TRAINING TEXT. It is impossible- the parameter size of GPT-3.5 or 4 is not enough to losslessly encode the training set.

Ok, now let's see NYT examples. Here GPT spits out almost perfectly the opening paragraphs of a "snow fall" article from 2012. But this text is all over the internet - super famous article! That's why GPT's posterior predictions given the previous article paragraph are so good.

Likewise, in the famous Guy Fieri Times Square review, GPT repeats almost perfectly whole paragraphs. But these paragraphs have also been repeated dozens of times across the internet! That's why the LLM posterior probability next word distribution picks them up.

In practice, one can think of this as ChatGPT committing copyright infringement if and only if everyone else is committing copyright infringement on that exact same passage, making it so often duplicated that it learned this is something people reproduce.
 
My take? OpenAI can't really defend this practice without some heavy changes to the instructions and a whole lot of litigating about how the tech works. It will be smarter to settle than fight
.

>>>

bold text my emphasis -- all caps in original

much  more detail in the newsletter:

For free subscription: Don't Worry About the Vase | Substack

https://thezvi.substack.com/?utm_source=substack&utm_medium=email

A world made of gears. Doing both speed premium short term updates and long term world model building. Currently focused on weekly AI updates. Explorations include AI, policy, rationality, medicine and fertility, education and games.

By Zvi Mowshowitz


« Last Edit: December 29, 2023, 16:00 by cascoly »

« Reply #5 on: December 30, 2023, 12:46 »
+1
We know you copied us you used the word milquetoast in generated articles. No human has ever used that word other than NY Times reporters!

Uncle Pete

  • Great Place by a Great Lake - My Home Port
« Reply #6 on: December 31, 2023, 14:11 »
0
Is it transformative?

Wikipedia:

The transformative nature of computer based analytical processes such as text mining, web mining and data mining has led many to form the view that such uses would be protected under fair use. This view was substantiated by the rulings of Judge Denny Chin in Authors Guild, Inc. v. Google, Inc., a case involving mass digitisation of millions of books from research library collections. As part of the ruling that found the book digitisation project was fair use, the judge stated "Google Books is also transformative in the sense that it has transformed book text into data for purposes of substantive research, including data mining and text mining in new areas"

Text and data mining was subject to further review in Authors Guild v. HathiTrust, a case derived from the same digitization project mentioned above. Judge Harold Baer, in finding that the defendant's uses were transformative, stated that 'the search capabilities of the [HathiTrust Digital Library] have already given rise to new methods of academic inquiry such as text mining."


I'm pointing this out as New York Times is trying to make the same claim as the two above, that already have decisions, in favor of fair use. Sometimes a case like this would be refused and not heard, as it has already been decided in the past.

Uncle Pete

  • Great Place by a Great Lake - My Home Port
« Reply #7 on: January 31, 2024, 18:31 »
0
Under appeal but I think still interesting. willfully blind to infringement If we notify these sites, and they do nothing, they can be sued.

Vacating the district courts order granting in part and
denying in part Redbubbles motion for judgment as a matter
of law, the panel held that a party is liable for contributory
infringement when it continues to supply its product to one
whom it knows or has reason to know is engaging in
trademark infringement. A party meets this standard if it is
willfully blind to infringement. Agreeing with other circuits,
the panel held that contributory trademark liability requires
the defendant to have knowledge of specific infringers or
instances of infringement. General knowledge of
infringement on the defendants platform, even of the
plaintiffs trademarks, is not enough to show willful
blindness. The panel remanded for reconsideration of
Redbubbles motion under the correct legal standard.


https://cdn.ca9.uscourts.gov/datastore/opinions/2023/07/24/21-56150.pdf


 

Related Topics

  Subject / Started by Replies Last post
0 Replies
4197 Views
Last post September 05, 2008, 16:00
by News Feed
0 Replies
4126 Views
Last post September 05, 2008, 19:30
by News Feed
6 Replies
7685 Views
Last post September 29, 2008, 03:16
by Adeptris
0 Replies
3589 Views
Last post February 05, 2009, 11:30
by News Feed
0 Replies
3769 Views
Last post February 06, 2009, 03:00
by News Feed

Sponsors

Mega Bundle of 5,900+ Professional Lightroom Presets

Microstock Poll Results

Sponsors