Shutterstock+AWS press release: your images as AI training data

Jo Ann Snover:

I think that the announcement means that there will be images and metadata made available by Shutterstock to be used to train AI systems or other non-traditional uses with no compensation to the owners of the images or metadata (Shutterstock doesn't do any keywording). In spite of their claims about rigorous review, keyword spam is rife on their site

"The datasets include collections of images and 3D models from Shutterstock.AI's library of 400 million visual assets, along with metadata backed by rigorous human and AI review. The datasets span multiple industry categories, and have been curated to align with some of the most common computer vision applications in ecommerce, travel and tourism, self-driving cars, and consumer electronics."

Here's some pricing information on the AWS web site:

How does that $10,000 price get shared out among contributors?

From the description on AWS as to what you get for your (minimum) $10,000, you're not licensing content, rather "This data license gives you the right to train models for the duration of the subscription. Data sets will be published to your S3 bucket." I could easily see how Shutterstock would decide nothing was due in royalties for training models, even though without contributor content they'd have nothing to offer.

Contributors get shared a reset on january

If keyword spam is a true problem then everything will be classified as "background" :-)

“A machine learning algorithm walks into a bar. The bartender asks, ‘What’ll you have?’ The algorithm says, ‘What’s everyone else having?’”

Comments in this thread can be qualified as a brainstorming for a company which puts efforts to pay less for us. Especially when it is, indeed, possible to separate a background info noise from the value data.

I wonder if hidden in one of the changes to the TOS was something saying they could profit off our keywording (intellectual property) without paying us.


