Shutterstock "Contributor Fund"

Mrblues101 · December 31, 2022, 18:59

Quote from: Uncle Pete on December 31, 2022, 17:29
Original content is NOT modified or used, so I'd say, that's not important or relevant. Images and descriptions are only used to train the AI.

Exactly, in the same way your brain changes when you read a book and you learn about globalization (for example) being able to talk about it, or write about it, but without this meaning that you are infringing the copyright of the original book. You learn about something, and now you are able to create something new by using the new learning.

This is how machine learning works, and law now understand that new creations by AI based on machine learning, owe nothing to the elements they have used to learn.

Zero Talent · January 01, 2023, 19:15

Quote from: Mrblues101 on December 31, 2022, 18:59
Quote from: Uncle Pete on December 31, 2022, 17:29
Original content is NOT modified or used, so I'd say, that's not important or relevant. Images and descriptions are only used to train the AI.

Exactly, in the same way your brain changes when you read a book and you learn about globalization (for example) being able to talk about it, or write about it, but without this meaning that you are infringing the copyright of the original book. You learn about something, and now you are able to create something new by using the new learning.

This is how machine learning works, and law now understand that new creations by AI based on machine learning, owe nothing to the elements they have used to learn.

All this is true, however one must be aware about plagiarism when something "new" is created.
There is a level beyond which plagiarism fades away, but I can imagine that when someone is trying to create some niche images, with few training samples, elements from the images used for training may appear with little or no modification in the "newly" created content.
This is no different than copying entire paragraphs from a book and then claiming that you are the author of the "newly" created content.

This will most likely continue to be something for the lawyers to argue about.

cascoly · January 01, 2023, 19:30

Quote from: Zero Talent on January 01, 2023, 19:15...

All this is true, however one must be aware about plagiarism when something "new" is created.
There is a level beyond which plagiarism fades away, but I can imagine that when someone is trying to create some niche images, with few training samples, elements from the images used for training may appear with little or no modification in the "newly" created content.
This is no different than copying entire paragraphs from a book and then claiming that you are the author of the "newly" created content.

This will most likely continue to be something for the lawyers to argue about.

bold added

that's a strawman argument - no training set contains only a 'few' images, and no 'elements' of any image is copied. that's just not how ML works

Zero Talent · January 02, 2023, 00:51

Quote from: cascoly on January 01, 2023, 19:30
Quote from: Zero Talent on January 01, 2023, 19:15...

All this is true, however one must be aware about plagiarism when something "new" is created.
There is a level beyond which plagiarism fades away, but I can imagine that when someone is trying to create some niche images, with few training samples, elements from the images used for training may appear with little or no modification in the "newly" created content.
This is no different than copying entire paragraphs from a book and then claiming that you are the author of the "newly" created content.

This will most likely continue to be something for the lawyers to argue about.
bold added

that's a strawman argument - no training set contains only a 'few' images, and no 'elements' of any image is copied. that's just not how ML works

I know how ML works (using several variants in my daily job).

So how can you tell the number of elements used to train a specific request, without being involved in the algorithm development? What you say may be true only if there a minimum threshold for the training set, a threshold beyond which individual image characteristics are fading away.
You have to know it, before making such statements.
If such threshold doesn't exist, some requests may simply plagiarize the few images used to respond to that query.

If there is only one image describing, let's say a clown, in the training set, then it's very likely that all queries requesting clowns will plagiarize that unique clown image, because that's the only thing the algorithm has learned about clowns.

cascoly · January 02, 2023, 02:02

Quote from: Zero Talent on January 02, 2023, 00:51
Quote from: cascoly on January 01, 2023, 19:30
Quote from: Zero Talent on January 01, 2023, 19:15...

All this is true, however one must be aware about plagiarism when something "new" is created.
There is a level beyond which plagiarism fades away, but I can imagine that when someone is trying to create some niche images, with few training samples, elements from the images used for training may appear with little or no modification in the "newly" created content.
This is no different than copying entire paragraphs from a book and then claiming that you are the author of the "newly" created content.

This will most likely continue to be something for the lawyers to argue about.
bold added

that's a strawman argument - no training set contains only a 'few' images, and no 'elements' of any image is copied. that's just not how ML works

I know how ML works (using several variants in my daily job).

So how can you tell the number of elements used to train a specific request, without being involved in the algorithm development? What you say may be true only if there a minimum threshold for the training set, a threshold beyond which individual image characteristics are fading away.
You have to know it, before making such statements.
If such threshold doesn't exist, some requests may simply plagiarize the few images used to respond to that query.

If there is only one image describing, let's say a clown, in the training set, then it's very likely that all queries requesting clowns will plagiarize that unique clown image, because that's the only thing the algorithm has learned about clowns.

still setting up strawmen for your arguments - how about a real-life example of a request that won't have hundreds if not many thousands of images it's been trained on? simple way would be to find any search on SS that has < 100 images & then ask dall-e et al for that.

SS has 200,000,000 images for their training set - web scrapers can have many more (one recent example claimed 2 billion) so the chances of your scenario are pretty small and no one has been able to demonstrate this sort of result.

Zero Talent · January 02, 2023, 04:49

Quote from: cascoly on January 02, 2023, 02:02
Quote from: Zero Talent on January 02, 2023, 00:51
Quote from: cascoly on January 01, 2023, 19:30
Quote from: Zero Talent on January 01, 2023, 19:15...

All this is true, however one must be aware about plagiarism when something "new" is created.
There is a level beyond which plagiarism fades away, but I can imagine that when someone is trying to create some niche images, with few training samples, elements from the images used for training may appear with little or no modification in the "newly" created content.
This is no different than copying entire paragraphs from a book and then claiming that you are the author of the "newly" created content.

This will most likely continue to be something for the lawyers to argue about.
bold added

that's a strawman argument - no training set contains only a 'few' images, and no 'elements' of any image is copied. that's just not how ML works

I know how ML works (using several variants in my daily job).

So how can you tell the number of elements used to train a specific request, without being involved in the algorithm development? What you say may be true only if there a minimum threshold for the training set, a threshold beyond which individual image characteristics are fading away.
You have to know it, before making such statements.
If such threshold doesn't exist, some requests may simply plagiarize the few images used to respond to that query.

If there is only one image describing, let's say a clown, in the training set, then it's very likely that all queries requesting clowns will plagiarize that unique clown image, because that's the only thing the algorithm has learned about clowns.

still setting up strawmen for your arguments - how about a real-life example of a request that won't have hundreds if not many thousands of images it's been trained on? simple way would be to find any search on SS that has < 100 images & then ask dall-e et al for that.

SS has 200,000,000 images for their training set - web scrapers can have many more (one recent example claimed 2 billion) so the chances of your scenario are pretty small and no one has been able to demonstrate this sort of result.

I only pointed out that this scenario is possible and you seem to agree with me, even if you call it a "strawman argument".

I'm not going to spend time looking for an example. It will take too long for me, but it may pop-up, eventually. This is what crowd-sourcing is good at.

The world is not stuck in it the present. There will always be some new things, for which only a limited set of photos will be available for training.

When such case will be found, then the case and maybe even the system might be challenged by lawyers, the same way plagiarism is normally challenged.

cascoly · January 02, 2023, 22:22

Quote from: Zero Talent on January 02, 2023, 04:49

I only pointed out that this scenario is possible and you seem to agree with me, even if you call it a "strawman argument".

I'm not going to spend time looking for an example. It will take too long for me, but it may pop-up, eventually. This is what crowd-sourcing is good at.

The world is not stuck in it the present. There will always be some new things, for which only a limited set of photos will be available for training.

When such case will be found, then the case and maybe even the system might be challenged by lawyers, the same way plagiarism is normally challenged.

you obviously don't know what a strawman argument is...
you made a silly, irrelevant claim, so it IS your responsibility to at least give an example of a phrase that would only find one artist's images out of 300,000,000

i actually did do a search for 'shaman puri india' - on SS 26 of only 27 are mine; google images show mine as 22 of first 25. then i used that phrase in DALL-E and it gave 4 completely different images, none of which remotely resembled mine in sadhu or temple bkgd

and when i required Puri in the google search, it showed only 30 images total, 17 of mine. one of the images was a map of korea, one retail ad for a box of sp.ices and 2 others w no shaman at all

your turn!!!!!!!!!!!

Zero Talent · January 03, 2023, 03:54

Quote from: cascoly on January 02, 2023, 22:22
Quote from: Zero Talent on January 02, 2023, 04:49

I only pointed out that this scenario is possible and you seem to agree with me, even if you call it a "strawman argument".

I'm not going to spend time looking for an example. It will take too long for me, but it may pop-up, eventually. This is what crowd-sourcing is good at.

The world is not stuck in it the present. There will always be some new things, for which only a limited set of photos will be available for training.

When such case will be found, then the case and maybe even the system might be challenged by lawyers, the same way plagiarism is normally challenged.
you obviously don't know what a strawman argument is...
you made a silly, irrelevant claim, so it IS your responsibility to at least give an example of a phrase that would only find one artist's images out of 300,000,000

i actually did do a search for 'shaman puri india' - on SS 26 of only 27 are mine; google images show mine as 22 of first 25. then i used that phrase in DALL-E and it gave 4 completely different images, none of which remotely resembled mine in sadhu or temple bkgd

and when i required Puri in the google search, it showed only 30 images total, 17 of mine. one of the images was a map of korea, one retail ad for a box of sp.ices and 2 others w no shaman at all

your turn!!!!!!!!!!!

"Une hirondelle ne fait pas le printemps".

This is very much applicable in science when you want to validate a hypothesis.
The fact that you found 1, 2, 10, or 1,000 examples matching it, is not sufficient to make it a theory.
One single counter-example is enough to disprove it.

You have no idea if your images were even used by the algorithm when you did your isolated "experiments" (which is rather likely to be true, since its output was garbage)

My advice for those who have niche images (maybe even for you with your rather unique temple) is to opt out of the AI training deal, as soon as it will become possible, so the customers have no other option but to buy from you and delay as long as possible the competition from AI on your unique topics.

Lowls · January 03, 2023, 13:09

Quote from: Zero Talent on January 03, 2023, 03:54
Quote from: cascoly on January 02, 2023, 22:22
Quote from: Zero Talent on January 02, 2023, 04:49

I only pointed out that this scenario is possible and you seem to agree with me, even if you call it a "strawman argument".

I'm not going to spend time looking for an example. It will take too long for me, but it may pop-up, eventually. This is what crowd-sourcing is good at.

The world is not stuck in it the present. There will always be some new things, for which only a limited set of photos will be available for training.

When such case will be found, then the case and maybe even the system might be challenged by lawyers, the same way plagiarism is normally challenged.
you obviously don't know what a strawman argument is...
you made a silly, irrelevant claim, so it IS your responsibility to at least give an example of a phrase that would only find one artist's images out of 300,000,000

i actually did do a search for 'shaman puri india' - on SS 26 of only 27 are mine; google images show mine as 22 of first 25. then i used that phrase in DALL-E and it gave 4 completely different images, none of which remotely resembled mine in sadhu or temple bkgd

and when i required Puri in the google search, it showed only 30 images total, 17 of mine. one of the images was a map of korea, one retail ad for a box of sp.ices and 2 others w no shaman at all

your turn!!!!!!!!!!!

"Une hirondelle ne fait pas le printemps".

This is very much applicable in science when you want to validate a hypothesis.
The fact that you found 1, 2, 10, or 1,000 examples matching it, is not sufficient to make it a theory.
One single counter-example is enough to disprove it.

You have no idea if your images were even used by the algorithm when you did your isolated "experiments" (which is rather likely to be true, since its output was garbage)

My advice for those who have niche images (maybe even for you with your rather unique temple) is to opt out of the AI training deal, as soon as it will become possible, so the customers have no other option but to buy from you and delay as long as possible the competition from AI on your unique topics.

There is an image of a flower and it is the only image of that flower on SS. Don't know why but it is. II'll happily send a link to the image to help support an argument ... when I get $20.00.

stoker2014 · January 03, 2023, 14:15

Quote from: wds on December 14, 2022, 13:30
What is the "Contributor Fund" column in the Shutterstock "Earnings Summary"?

You will only be paid once every six months.

The more people refuse, the more money you can earn from it.

Diana Herrmann · January 03, 2023, 16:01

Quote from: Lowls on January 03, 2023, 13:09
Quote from: Zero Talent on January 03, 2023, 03:54
Quote from: cascoly on January 02, 2023, 22:22
Quote from: Zero Talent on January 02, 2023, 04:49

I only pointed out that this scenario is possible and you seem to agree with me, even if you call it a "strawman argument".

I'm not going to spend time looking for an example. It will take too long for me, but it may pop-up, eventually. This is what crowd-sourcing is good at.

The world is not stuck in it the present. There will always be some new things, for which only a limited set of photos will be available for training.

When such case will be found, then the case and maybe even the system might be challenged by lawyers, the same way plagiarism is normally challenged.
you obviously don't know what a strawman argument is...
you made a silly, irrelevant claim, so it IS your responsibility to at least give an example of a phrase that would only find one artist's images out of 300,000,000

i actually did do a search for 'shaman puri india' - on SS 26 of only 27 are mine; google images show mine as 22 of first 25. then i used that phrase in DALL-E and it gave 4 completely different images, none of which remotely resembled mine in sadhu or temple bkgd

and when i required Puri in the google search, it showed only 30 images total, 17 of mine. one of the images was a map of korea, one retail ad for a box of sp.ices and 2 others w no shaman at all

your turn!!!!!!!!!!!

"Une hirondelle ne fait pas le printemps".

This is very much applicable in science when you want to validate a hypothesis.
The fact that you found 1, 2, 10, or 1,000 examples matching it, is not sufficient to make it a theory.
One single counter-example is enough to disprove it.

You have no idea if your images were even used by the algorithm when you did your isolated "experiments" (which is rather likely to be true, since its output was garbage)

My advice for those who have niche images (maybe even for you with your rather unique temple) is to opt out of the AI training deal, as soon as it will become possible, so the customers have no other option but to buy from you and delay as long as possible the competition from AI on your unique topics.
There is an image of a flower and it is the only image of that flower on SS. Don't know why but it is. II'll happily send a link to the image to help support an argument ... when I get $20.00.

Make an AI image of that flower and if it's the same, then you have an example. If it's different you prove that AI doesn't steal and copy.

cascoly · January 03, 2023, 19:31

Quote from: Zero Talent on January 03, 2023, 03:54
Quote from: cascoly on January 02, 2023, 22:22
Quote from: Zero Talent on January 02, 2023, 04:49

I only pointed out that this scenario is possible and you seem to agree with me, even if you call it a "strawman argument".

I'm not going to spend time looking for an example. It will take too long for me, but it may pop-up, eventually. This is what crowd-sourcing is good at.

The world is not stuck in it the present. There will always be some new things, for which only a limited set of photos will be available for training.

When such case will be found, then the case and maybe even the system might be challenged by lawyers, the same way plagiarism is normally challenged.
you obviously don't know what a strawman argument is...
you made a silly, irrelevant claim, so it IS your responsibility to at least give an example of a phrase that would only find one artist's images out of 300,000,000

i actually did do a search for 'shaman puri india' - on SS 26 of only 27 are mine; google images show mine as 22 of first 25. then i used that phrase in DALL-E and it gave 4 completely different images, none of which remotely resembled mine in sadhu or temple bkgd

and when i required Puri in the google search, it showed only 30 images total, 17 of mine. one of the images was a map of korea, one retail ad for a box of sp.ices and 2 others w no shaman at all

your turn!!!!!!!!!!!

"Une hirondelle ne fait pas le printemps".

This is very much applicable in science when you want to validate a hypothesis.
The fact that you found 1, 2, 10, or 1,000 examples matching it, is not sufficient to make it a theory.
One single counter-example is enough to disprove it.

you're making my case for me! i never claimed there was NO possibility, but it's extremely unlikely and you shouldn't skip those nice spring days! you're throwing out an entire new tool because of something that that's unlikely to happen in our lifetime - hence homme de paille

you're also distorting the argument - if your theory won't survive 1 odd result, then it is a poor theory - but that's never been the case here.

Quote
You have no idea if your images were even used by the algorithm when you did your isolated "experiments" (which is rather likely to be true, since its output was garbage)

again, making my point - you have yet to even suggest a possible phrase that would use only 1 artist's images. even if you enter an image for the ai, the result will not match the original!

re 'garbage' - not sure what you mean since you didn't see my result - my results were pretty decent, tho i dont usually ask for a photo, just illustrations. did you h ave different results? or are you just making it up as you go along?

Quote
My advice for those who have niche images (maybe even for you with your rather unique temple) is to opt out of the AI training deal, as soon as it will become possible, so the customers have no other option but to buy from you and delay as long as possible the competition from AI on your unique topics.

actually the reason i chose that phrase was because i thought it highly unlikely anyone would ever use that phrase & it had so few results on SS. (the phrase came to mind since i had recently blogged on this experience)

instead people are more likely to find it thru more generic tags like 'hindu shaman' (most of mine still on 1st page of 6

Zero Talent · January 03, 2023, 19:56

Quote from: cascoly on January 03, 2023, 19:31you're making my case for me! i never claimed there was NO possibility, but it's extremely unlikely and you shouldn't skip those nice spring days! you're throwing out an entire new tool because of something that that's unlikely to happen in our lifetime - hence homme de paille

you're also distorting the argument - if your theory won't survive 1 odd result, then it is a poor theory - but that's never been the case here.

Then we are talking about the same thing and there is no strawman argument.

The only difference is that while you belive that the plagiarism exceptions (you admit possible) are harmless, I believe that they have a real chance to lead to lawsuits.

This is most likely, why SS is planning to obtain the contributors' consent, before allowing further use of their images in AI training. They want to cover their a@@ and prevent plagiarism accusations, when those exceptions (you admit possible) will happen.

PS. The output was garbage because you said that none of the results represented the unique image you tried to compare it against. So a customer attempting to create that unique image via AI will fail and will have no option but to buy it directly from you.

Anita Potter · January 04, 2023, 11:43

I got a rock.

cascoly · January 04, 2023, 18:53

some final comments

Quote from: Zero Talent on January 03, 2023, 19:56

This is most likely, why SS is planning to obtain the contributors' consent, before allowing further use of their images in AI training. They want to cover their a@@ and prevent plagiarism accusations, when those exceptions (you admit possible) will happen.

that's not what they said - they already trained on their existing dataset. they said they might give a way to opt out of FUTURE inclusion

Quote
PS. The output was garbage because you said that none of the results represented the unique image you tried to compare it against. So a customer attempting to create that unique image via AI will fail and will have no option but to buy it directly from you.

that's not what i said! since you refused to back up your claim, i did a quick test using a phrase that used mostly my images on SS, and highly placed images on google. according to your untested claims, that might result in i mages that clearly violated my copyright - as i reported - none of the results looked like mine, BUT they did show a shaman in India which was what was asked for - rather than garbage, the algorithm performed just as it should, not as you predicted

Zero Talent · January 04, 2023, 20:58

Quote from: cascoly on January 04, 2023, 18:53
some final comments

Quote from: Zero Talent on January 03, 2023, 19:56

This is most likely, why SS is planning to obtain the contributors' consent, before allowing further use of their images in AI training. They want to cover their a@@ and prevent plagiarism accusations, when those exceptions (you admit possible) will happen.

that's not what they said - they already trained on their existing dataset. they said they might give a way to opt out of FUTURE inclusion
Quote
PS. The output was garbage because you said that none of the results represented the unique image you tried to compare it against. So a customer attempting to create that unique image via AI will fail and will have no option but to buy it directly from you.

that's not what i said! since you refused to back up your claim, i did a quick test using a phrase that used mostly my images on SS, and highly placed images on google. according to your untested claims, that might result in i mages that clearly violated my copyright - as i reported - none of the results looked like mine, BUT they did show a shaman in India which was what was asked for - rather than garbage, the algorithm performed just as it should, not as you predicted

Everybody is learning, not just the AI!

If maybe SS was of your opinion initially (i.e. there is no legal risk in producing the plagiarism exceptions), now they realised that such possibilities do exist (as you also admited), and they want to be covered legaly.

So they switched from your opinion to mine

!

As I said before, you don't know how the algorithm works, you don't know if all images, or only a subset was used for AI training, you don't know if there is any sample threshold required before the algo is responding to a query, etc, etc, etc.

Your isolated experiment is proving nothing, hence "Une hirondelle ne fait pas le printemps".

In theory, and probably also in practice, the more this algo will be used, the higher the chance for those plagiarism examples to pop-up.

That's what SS realised (most probably), and they want to be ready for it.
See? No "strawman argument", just you making unverified assumptions and jumping to conclusions!

DeStock · January 17, 2023, 22:06

Question:
Does any know how the calculation is made?

Scenario 1) Is it a % of the total of deal made by Shutterstock for licensing their images for training divided by the number of images and then distributed as a pro rata of who has what amount of images?

For example, with the deal made with Meta. Meta pays SS $100,000 for the one-time right to use 406 million images for a training set. SS keeps 70% and uses $30,000 to pay contributors based on the amount of images they have in that 406 million.

Scenario 2) Or is it a fixed fee based on the amount of images . Whatever SS gets from a deal, contributor always gets paid same amount ( doesn't seem like it)

It would be interesting to know what is the value of an image when it is used for training and if it's more ( or less) than for a download.

cascoly · January 18, 2023, 18:58

Quote from: DeStock on January 17, 2023, 22:06
...

For example, with the deal made with Meta. Meta pays SS $100,000 for the one-time right to use 406 million images for a training set. SS keeps 70% and uses $30,000 to pay contributors based on the amount of images they have in that 406 million.

Scenario 2) Or is it a fixed fee based on the amount of images . Whatever SS gets from a deal, contributor always gets paid same amount ( doesn't seem like it)

It would be interesting to know what is the value of an image when it is used for training and if it's more ( or less) than for a download.

whatever the scenario it's going to be tiny - I did some quick calcs earlier ( $ rec'd/portfolio size), but no one else posted their estimates, so no way to know if my theory was correct.

using your figures, payment from a Meta deal would be .00025 cents/image - slightly less than what we get for a download!

Zero Talent · February 02, 2023, 03:09

Following up on the arguments around the risks of AI plagiarism debated above, here is an interesting paper (check the pdf in the link):

https://arxiv.org/abs/2301.13188

"In this work, we demonstrate that state-of-the-art diffusion models do memorize and regenerate individual training examples"

SpaceStockFootage · February 02, 2023, 04:14

Stable Diffusion, as the kids are fond of saying... did her dirty.

cobalt · February 02, 2023, 06:16

Quote from: Zero Talent on February 02, 2023, 03:09
Following up on the arguments around the risks of AI plagiarism debated above, here is an interesting paper (check the pdf in the link):

https://arxiv.org/abs/2301.13188

"In this work, we demonstrate that state-of-the-art diffusion models do memorize and regenerate individual training examples"

Things like these and European data protection laws...fun times ahead...

I like that ss is compensating artists, but it must be clear that all training files are properly licensed

Uncle Pete · February 03, 2023, 19:46

Quote from: SpaceStockFootage on February 02, 2023, 04:14
Stable Diffusion, as the kids are fond of saying... did her dirty.

Not enough examples for proper training data. Besides they shouldn't be doing photo realistic images of specific real people!

Reimar · July 13, 2023, 16:28

Quote from: Reimar on December 20, 2022, 16:41
The latest e-mail from SS has: "The Contributor Fund will release earnings every 6 months"
If that's all we get for 6 months of usage, it is very underwhelming.

Well, since a payment in December 2022, now June 2023 has past and still no additional Contributor funds. Are they late? Did SS change their minds?

Roscoe · July 13, 2023, 16:44

Quote from: Reimar on July 13, 2023, 16:28
Quote from: Reimar on December 20, 2022, 16:41
The latest e-mail from SS has: "The Contributor Fund will release earnings every 6 months"
If that's all we get for 6 months of usage, it is very underwhelming.

Well, since a payment in December 2022, now June 2023 has past and still no additional Contributor funds. Are they late? Did SS change their minds?

I got a payment in May 2023, and a lot of others did so too if I'm not mistaken.

Reimar · July 13, 2023, 23:34

Thanks Roscoe, I forgot about that. I got $40 in May too. So I guess the next one will in time for my birthday early November.

MicrostockGroup Sponsors

Shutterstock "Contributor Fund"