MicrostockGroup Sponsors

Envato Elements

Author Topic: How to handle 100 Million files  (Read 8207 times)

0 Members and 1 Guest are viewing this topic.

« on: July 15, 2011, 17:02 »
0
Hi together,

I thought Id start a thread about the future. It wont be long and all the sites will have 100 Million files of all kinds of files types. How can this be organized? What is the best strategy for the buyer, the contributor, the agency?

Different agencies will come up with different strategies, but since we all face the same problem, I wonder if the community can add some brain power and ideas that might be beneficial for everyone.

We all know there is an oversupply of images and a lot of duplicated content (how many red roses on white do you need?). At the same time, contributors who want to offer a "complete" range of files in a certain subject to strengthen their profile and attract loyal buyers, dont like to have their shooting subjects limited because it is already on the site (but I want my red rose on white in my portfolio!). Buyers want an efficient search that is fast and gives results that are adjusted to their personal preferences (ideally search engines learning their taste, not having to turn a lot knobs and dials/buttons and sliders).

It is a difficult challenge.

Gettyimages just announced changes to their contract that will even allow RM content moved to RF if it doesnt sell (something I agree with and have signed the contract). On istock, I can move my slow sellers into the partner program and even deactivate them from istock altogether. Again something I support. We also have a dollar bin, although we cant add to it at the moment.

I dont know what other sites do, maybe someone can explain what their strategies are.

Personally I would propose to just separate the main collection from the personal portfolio. Files that dont sell over a given time frame should be removed from the main collection. Just like the dollar bin files are no longer visible in the search. However they could stay in the artists personal webshop, if he or she wants that. Or offer the possibility to remove and go to a different site.

Allowing the artist the freedom to handle his own portfolio is very motivating and helps to develop your own style and create a loyal following of buyers. The artist can also promote this personal portfolio through social media networks and his website. If the content gets spread around over many different sites, it becomes much harder to create a follwoing as an artist. Especially if some sites dont even show the name of the artist (or even attribute a wrong one).

I think you could easily remove 30% of files from the main collection, if they can stay in the artists portfolio. You could also add "Contributors choice" options for files that the artist thinks have to stay in the main collection. On istock E+ could serve that function.

A system like that can handle very, very large volume of files. If the non sellers are always removed from the main collection every three years, the collection would probably be very trim and up to date. You could even keep it roughly at the same files size. The personal portfolio can keep growing and the non sellers can also be added to another sales outlet if necessary. I mean how many pictures can you shoot over a lifetime? 30 000? Ive seen personal portfolios of that size, for a private portfolio, that isnt a problem. It would be the same like having your own webshop embedded in the agency.

What do you think??
« Last Edit: July 15, 2011, 17:14 by cobalt »


« Reply #1 on: July 15, 2011, 17:20 »
0
I think that the size of the collections is irrelevant if you have great search tools. Google does a (largely) great job searching through masses of content and we don't care about the size of the pool of crud they searched, just about getting reasonable number of relevant results.

We have a number of legacy problems in existing collections - poor keywording and categorizing, for example - but I don't see size per se as a problem.

There have been lots of good ideas that appeared and then went nowhere: for example, a cross-agency search tool that put up a huge matrix of tiny squares of images in response to a search and you picked a few to seed a more targeted search of the type of thing you want. It was really surprising how you could rule out and rule in image candidates from small thumbs and then use the ones you like to get better results.

I don't know if any of the agencies have the market heft to do a really great search job - they're squeezing to get more profits now, not to invest in better search technology. Google could do the job, but I don't know if stock image/video/audio content is a big enough pie for them to want to participate.

« Reply #2 on: July 15, 2011, 17:24 »
0
Interesting topic.  

Maybe it can't be done, i.e. maybe it's not possible to have an archive 10 times larger than what exists today,  and maintain any meaningful standards of quality, or make it searchable in truly useful ways.  Arguably, the big microstocks are already in some degree of chaos, and have tried to push 'crowdsourcing' too far.  Improving overall quality and searchability means investing time and money in skilled reviewers, keyworders and software developers - more than is being spent today.

I like the idea of an agency having both public and private collections although I'm not sure how I'd actually make use of it.  

« Reply #3 on: July 15, 2011, 17:37 »
0
"I like the idea of an agency having both public and private collections although I'm not sure how I'd actually make use of it.  "

Well, for instance all the artists that shoot "lifestyle" or business. every year they have to create similar images of business teams, families having dinner, just to make sure the models are dressed in the latest style of clothes, hairstyle, home design and using the latest electronic gadgets.

Some images maybe so generic that they hav a very long shelf life (teenagers in jeans and T shirt sitting in a group and chatting) but other will look very dated after 3 years (mobile phones that look like bricks).

So when they stop selling you can still have them in your portfolio if you want them. Maybe a customer needs an older style (80ties revival party, insurance targeting seniors). If the buyer is loyal to you or likes how you shoot they will want to look at all your files first.

On the other side there are specialists, for instance someone who collects images from all the butterfly species of the world. With correct terminology, description etc...They would always just have a percentage of their files in the main collection (maybe the most beautiful, eye catching butterflies) but they wcertainly attract loyal buyers. If their slow selling files are removed their portofilio is weakened. Some very exotic, rare butterfly may only be bought once every 10 years. But the specialist will have it, the customer is happy he can find a real expert.

This also works for someone who shots landscapes or specializes in images from a certain region (traditional clothing, food, houses). They can happily shoot their region in all seasons, have images from all the festival (editorial, even video) but only a part of the portfolio will be in the main search. But over many years the personal portfolio can be properly developed with a lot of attention to detail. A large personal portfolio encourages to develop your own style, it will be less generic.

The agency then has more specialized, regional content they can add to galleries and lightboxes to promote to different customer groups.

etc...
« Last Edit: July 15, 2011, 17:46 by cobalt »

Ed

« Reply #4 on: July 15, 2011, 17:37 »
0
My thoughts are there's a market for every image.

If Getty is so short sighted that they think older, retro images aren't going to sell, then more power to them - the same contributors will have those images listed on another agency and they'll get picked up there.  Wait, weren't contributors upset in 2006 or 2007 when Getty added retro images to the iStock collection for the first time?  Don't they keep doing that at iStock?  ;D ;D ;D ;D

There are always people rotating in and out of the agencies.  Images will come and go.  I contribute to multiple agencies to get my work out to the public and if an agency want's to trash it, then so be it....I'll sell it somewhere else.

I think that the size of the collections is irrelevant if you have great search tools. Google does a (largely) great job searching through masses of content and we don't care about the size of the pool of crud they searched, just about getting reasonable number of relevant results.

We have a number of legacy problems in existing collections - poor keywording and categorizing, for example - but I don't see size per se as a problem.

There have been lots of good ideas that appeared and then went nowhere: for example, a cross-agency search tool that put up a huge matrix of tiny squares of images in response to a search and you picked a few to seed a more targeted search of the type of thing you want. It was really surprising how you could rule out and rule in image candidates from small thumbs and then use the ones you like to get better results.

I don't know if any of the agencies have the market heft to do a really great search job - they're squeezing to get more profits now, not to invest in better search technology. Google could do the job, but I don't know if stock image/video/audio content is a big enough pie for them to want to participate.

I agree 100%

« Reply #5 on: July 15, 2011, 17:54 »
0
"I think that the size of the collections is irrelevant if you have great search tools."

Id love a best match that is intelligent and learns buyer behaviour. If the buyer prefers cheap files, give him best match results with 80% cheap files, if money is no problem, increase higher priced content. If the software detects similar buying patterns between two customers, show each of them the files the other has bought like..."other customers also bought these files"...but you dont have to point them out in a special page. just add them to the mix.

Obviously add regional data etc...but I think many sites already do that.

So I agree that great search tools can handle many more files. maybe a combination of both - great search engine and personal portfolios could be combined.

Maybe then files wouldnt have to be removed from the site at all.

lisafx

« Reply #6 on: July 15, 2011, 18:02 »
0
Good idea to start thinking about these issues now.  Saturation is already a huge issue and only going to get worse.  

I would probably start by evaluating all the "dead" portfolios.  There have to be many thousands of people who just aren't active on the sites at all anymore - portfolios sitting idle and ignored for years.  I would probably purge those of anything but exceptionally good or unique images.

I would oppose having images that are on the servers but don't show in the searches.  Microstock is not the same as RM or expensive trad RF.  You don't recoup your production costs in a couple of sales.  For the prices we are getting for these images, it won't be worthwhile doing it at all if we can't count on continued exposure for good images, as long as they keep selling.  

By the same token, I don't think we should have to promote our own images.  That's what we pay our agents anything from 50% to (an obscene) 85% of the sales to do for us.  Marketing our work is their job, otherwise why have an agent at all?  Any marketing efforts I put forth are going to be directed to bringing buyers to my own site, not Istock or similar.  

I think JoAnn's solution is the only viable one for the long term.  The sites need to improve search engines - ideally in a way that best serves buyers, rather than short term profits.  Probably some mix of classic bestsellers along with an emphasis on newer, fresher images is ideal.  Before Istock stratified into collections, I think they were getting pretty close to an ideal search algorithm.

« Reply #7 on: July 15, 2011, 18:25 »
0
I am only thinking of removing files from searches that dont sell (after 3,4,5, years) Just like you suggested to purge old portfolios. These would be automatically transformed into nearly 100% "personal portfolios".

The artist can then return if he wants to and add new content.

Maybe even have the possibility to add an old file to the main collection if suddenly it starts to sell again. 

« Reply #8 on: July 15, 2011, 18:41 »
0
I am only thinking of removing files from searches that dont sell (after 3,4,5, years) Just like you suggested to purge old portfolios. These would be automatically transformed into nearly 100% "personal portfolios".

The artist can then return if he wants to and add new content.

Maybe even have the possibility to add an old file to the main collection if suddenly it starts to sell again. 

I think culling will be the way of the near future to push out the inevitable 100 mil threshold.  And I suspect that it will be done in more frequent periods, say non-selling in two years.  We're going to have to keep shooting, shooting, shooting to pipeline in new stuff to replace the culled images.  It will weed out the part timers and really only be aligned with those who can turn and burn.

lisafx

« Reply #9 on: July 15, 2011, 18:46 »
0
I am only thinking of removing files from searches that dont sell (after 3,4,5, years) Just like you suggested to purge old portfolios. These would be automatically transformed into nearly 100% "personal portfolios".

The artist can then return if he wants to and add new content.

Maybe even have the possibility to add an old file to the main collection if suddenly it starts to sell again. 

I think culling will be the way of the near future to push out the inevitable 100 mil threshold.  And I suspect that it will be done in more frequent periods, say non-selling in two years.  We're going to have to keep shooting, shooting, shooting to pipeline in new stuff to replace the culled images.  It will weed out the part timers and really only be aligned with those who can turn and burn.

You are both probably right about culling.  Guess I had better bone up on some other job skills, because being pushed in to producing like a factory doesn't appeal to me at all. 

« Reply #10 on: July 15, 2011, 19:00 »
0
I think that the size of the collections is irrelevant if you have great search tools. Google does a (largely) great job searching through masses of content and we don't care about the size of the pool of crud they searched, just about getting reasonable number of relevant results.

We have a number of legacy problems in existing collections - poor keywording and categorizing, for example - but I don't see size per se as a problem.
Exactly. Efficiently searching 100 million files is a technological "problem" which Google et al solved a long time ago. Is the World Wide Web 'oversaturated' with webpages just because there are billions of them?

« Reply #11 on: July 15, 2011, 19:02 »
0
I am only thinking of removing files from searches that dont sell (after 3,4,5, years) Just like you suggested to purge old portfolios. These would be automatically transformed into nearly 100% "personal portfolios".

The artist can then return if he wants to and add new content.

Maybe even have the possibility to add an old file to the main collection if suddenly it starts to sell again.  

I think culling will be the way of the near future to push out the inevitable 100 mil threshold.  And I suspect that it will be done in more frequent periods, say non-selling in two years.  We're going to have to keep shooting, shooting, shooting to pipeline in new stuff to replace the culled images.  It will weed out the part timers and really only be aligned with those who can turn and burn.


You are both probably right about culling.  Guess I had better bone up on some other job skills, because being pushed in to producing like a factory doesn't appeal to me at all.  


^^ Me either, it becomes work then.
« Last Edit: July 15, 2011, 19:09 by Mantis »

« Reply #12 on: July 15, 2011, 19:04 »
0
Why cull anything?  Storage is dirt cheap and getting cheaper all the time.  It probably costs more to delete the file than it does to keep it indefinitely.

Processing power is likewise cheap and getting cheaper.  Server costs to support complex searches of massive databases isn't really an issue any longer.

As someone said, search algorithms can be refined to provide reasonable returns that match very closely the desires of the searcher, even for very large databases.  Of course that assumes agencies don't artificially skew algorithms to return results they want to sell rather returning what a customer is looking to buy (as we've seen one agency do to a rather ridiculous degree lately).  I don't think that's going to turn out to be a "sustainable" business practice in the long run.  People hate bait-and-switch.  

So, I just don't see any real reason to cull images.  As I said, I think the economics of the situation are rapidly approaching the point where it is too costly to cull images (if we haven't already reached that point).

The question worth asking is how contributors can generate significant incomes in an ever increasingly saturated market.

« Reply #13 on: July 15, 2011, 19:08 »
0
I think that making a search that really works makes the total # of files irrelevant for the agency. If there are a million isolated apples and someone searches for isolated apple a good search engine would show them those apples and the buyer will buy. That makes a happy customer and a happy agency. Now the chance that my isolated apple will sell is pretty small, but that is how this works already.

The real problem will be fixing or culling the bad keywords. Either someone needs to actually do this, or they need to make a search engine that can somehow tell and deliver relevant content w/o pages and pages of near identical images.  DTs image flagging is one way to do it, but it doesn't seem like their program really works.

Someone else mentioned a field for what is actually in the image - that would be pretty nice, but who wants to go back and do it for the old images. It would be like istocks disambiguation mess, although if you pushed images w/o this field to the back of the search that would be a pretty good incentive.

Rather than having the search try to guess what you want based on previous experience, it would be nice to have the ability to have lots of settings and have them stay the way you set them until you reset them. So if you just want cheap files, you set it that way.

« Reply #14 on: July 15, 2011, 19:19 »
0
I'm confused.  Isn't "turn and burn" a major part of the problem?

As to "our art", sorry guys but, figuratively speaking, micro is producing Widgets, with some very few exceptions.

 And technology is the enabler that has allowed this situation to occur.  Just as it has in almost all other industries.

I don't think there is a viable solution, except to wait until normal market forces sort it out. But I fear that is some way off yet.
« Last Edit: July 15, 2011, 19:25 by bizair »

« Reply #15 on: July 15, 2011, 19:44 »
0
I'm confused.  Isn't "turn and burn" a major part of the problem?

As to "our art", sorry guys but, figuratively speaking, micro is producing Widgets, with some very few exceptions.

 And technology is the enabler that has allowed this situation to occur.  Just as it has in almost all other industries.

I don't think there is a viable solution, except to wait until normal market forces sort it out. But I fear that is some way off yet.

I was responding to a brain storming question that the OP posed.  How to handle 100 mil files.  So for me, all things being equal, culling may be a possible approach.  All things not being equal, meaning there evolves a technology that can streamline keywording, capital investment is a reality that new technology, some of these other suggestions are awesome. When I said turn and burn, I meant it in the context that "if they culled at a high rate (2 years) we'd, as contributors, would have to turn and burn images to keep up.  Part timers who simply could not produce in the volume they'd need to would (or may) give up.  It would change how images are produced, who images are produced by and, consequently, weed out a lot of the "noise" in current submissions.  Just my opinion, of course.

« Reply #16 on: July 16, 2011, 08:37 »
0
I think if someone could devise a truly universal keyword template every stock shooter could host, and share, their images with the search engine and the related download and e-commerce software.


« Reply #17 on: July 16, 2011, 08:46 »
0

Id love a best match that is intelligent and learns buyer behaviour. If the buyer prefers cheap files, give him best match results with 80% cheap files, if money is no problem, increase higher priced content. If the software detects similar buying patterns between two customers, show each of them the files the other has bought like..."other customers also bought these files"...but you dont have to point them out in a special page. just add them to the mix.

Obviously add regional data etc...but I think many sites already do that.


I, personally, don't really like when things are decided for me. I'd rather make my own choices on what price range I want to see, I really don't want suggestions as to what other similar-minded people are buying (though that could be useful in what to avoid, LOL), and I think regional data/localized searches are somewhat useless in this global world. Designers have customers the world over. I don't know of anyone who designs solely for their "region".

« Reply #18 on: July 16, 2011, 09:23 »
0
I agree that all options should be available for you as a buyer to set the way you like. But why should you get the same initial best match like someone in China? Or Africa?

I am thinking of general searches like "business team" where a buyer in China probably cannot use a best match that serves up all American business teams. I know this buyer could add "chinese" to his search, but how many people will do that? What if he compares the results to a chines stock house that immediatly gives ethnic results and where he would add "american" if he wants to target the US.



 

« Reply #19 on: July 16, 2011, 09:32 »
0
I, personally, don't really like when things are decided for me. I'd rather make my own choices on what price range I want to see, I really don't want suggestions as to what other similar-minded people are buying (though that could be useful in what to avoid, LOL), and I think regional data/localized searches are somewhat useless in this global world. Designers have customers the world over. I don't know of anyone who designs solely for their "region".

I agree. I don't mind choices, but I wouldn't want to see images from the collection being hidden from me just because someone else thinks that today I am still looking for an inexpensive image, or an image from my region, etc.

I don't think searches are ever going to be perfect for everyone on a global level.

« Reply #20 on: July 16, 2011, 09:54 »
0
I agree that all options should be available for you as a buyer to set the way you like. But why should you get the same initial best match like someone in China? Or Africa?

I am thinking of general searches like "business team" where a buyer in China probably cannot use a best match that serves up all American business teams. I know this buyer could add "chinese" to his search, but how many people will do that? What if he compares the results to a chines stock house that immediatly gives ethnic results and where he would add "american" if he wants to target the US.

What if I have a client in China or Africa though? Or what if someone in China has an American client? I don't like all these assumptions that are made about people. If I want Chinese or African or American, I can search for it. I don't want someone who thinks they know my business and my clients better than me making decisions for me.

« Reply #21 on: July 16, 2011, 11:37 »
0
The real problem will be fixing or culling the bad keywords. Either someone needs to actually do this, or they need to make a search engine that can somehow tell and deliver relevant content w/o pages and pages of near identical images.

Someone else mentioned a field for what is actually in the image - that would be pretty nice, but who wants to go back and do it for the old images.

Actually cleaning up keywords would take money because you have to pay skilled people, and so far the microstocks haven't made that investment. They've tried various schemes to get contributors and/or buyers to do it for nothing, and they haven't really worked.

« Reply #22 on: July 16, 2011, 12:03 »
0
"I don't want someone who thinks they know my business and my clients better than me making decisions for me."

best match is always an assumption of what you might like. That is why it is called best match.

What you see in the search now, is what the company thinks you will like to buy. There is programm that goes over the hundred thousands of files that are in the database and comes up with a selection for you.

All stock sites do this now.

Cas, I have a question: how often do you as a buyer go to the artists portfolio? Do you ever look at their landing page? Do you read the artists bio? Do you look at their lightboxes? Do you bookmark an artist and make notes about their speciality?
« Last Edit: July 16, 2011, 12:06 by cobalt »

SNP

  • Canadian Photographer
« Reply #23 on: July 16, 2011, 12:04 »
0
I don't think the size of the collection is irrelevant. it is already clear that some files uploaded never see the light of day if an unfavorable best match shift happens shortly after they're uploaded. in some open discussions with more experienced contributors and TPTB, it has been loosely acknowledged that some files never ever come up again in best match.

in a perfect model that does not risk cannibalization of sales, I'd like to see poorly performing files separated as you stated Jasmin--into tiers of collections. after x number of years without performance, move those files into reduced pricing tiers and entirely out of the main collection. This shouldn't however be the purpose of the partner program....as long as sites like shutterstock include all levels of content--good and bad--a partner program model that includes only bargain images will never be able to compete. unless you argue that iStock castoffs are better than SS quality content (which does not seem to be the case).

the iStock collection is FAR too large now. it doesn't make sense as a contributor to cull your own images if they're not culling the entire database...because then you potentially lose out in best match shifts that favour older content, and that happens quite often IMO.

I don't like the idea of a separate site with its own search (essentially the PP sites)...because that potentially pulls buyers away from iStock content. so a dollar bin type collection on the main site would be my preference, however, with files culled from the main collection and showing only in the dollar bin.

« Reply #24 on: July 16, 2011, 12:14 »
0
Actually I dont think there are too many images on the sites. Quite the contrary, i think there is loads of stuff missing.

Think of how many different jobs there are in the world? Just count the different areas a medical doctor can specialize in? Or a gardener? Or a builder? Or restaurants???

All the different professions and all the different business of the globe need stock images for their advertising. Many need very specific images, showing regional content (people, locations) Others always need the latest technical gadets or clothing styles.

All the different festivals of the world? Even Xmas is celebrated differently around the world.

Or family relationships? Images that make you feel "at home" will be very different across the globe.

But many of these images will not sell in the high volume necessary to keep them in the main search. And buyers in emerging markets will also not want to pay higher prices for a rare image. anyway, it can be supplied  from a local artist with regional production costs.

So I believe it will become very important to develop the "local face" of a website. I know istock can be searched in 12 different languages and in some countries they have their own office (and newsletter etc...).

I predict that the regionalization of the agencies will become a lot more important in the future.
« Last Edit: July 16, 2011, 12:18 by cobalt »

Microstock InsiderEnvato Elements

 

Related Topics

  Subject / Started by Replies Last post
1.5 million files !

Started by Istock News Microstock News

2 Replies
2262 Views
Last post March 04, 2007, 17:13
by GeoPappas
1.5 million files !

Started by Istock News Microstock News

0 Replies
1175 Views
Last post March 09, 2007, 23:11
by Istock News
1 Replies
1415 Views
Last post August 07, 2007, 18:07
by ozbandit
0 Replies
1024 Views
Last post August 07, 2007, 21:55
by Istock News
16 Replies
2394 Views
Last post January 21, 2014, 23:24
by JPSDK

Sponsors

Microstock Poll Results

Sponsors

Envato Elements