pancakes

MicrostockGroup Sponsors


Author Topic: Improving the integrity of our search engine  (Read 5073 times)

0 Members and 1 Guest are viewing this topic.

marthamarks

« on: September 02, 2013, 22:10 »
+3
Tonight I did a few test searches on the SYS global search page. It seems to work very well. Fast and accurate by keywords.

However, I'm finding it pulls up many images that should not pull up for a given search.

For example, I searched for "elk" and got this page of 19 images:

http://www.symbiostock.info/index2.php?search_item=elk&search_order=1


Of the 19 images shown, only 7 are actually of "elk."  The other 12 are an assortment of "moose", "caribou" and "deer".  None of those creatures, lovely as they are, are the same as an elk.

If I were a buyer searching on SYS globally for an image of an "elk" and found all that, I think I'd be quite turned off.

So I'd like to offer a modest suggestion (and this includes me, too) that we all make a collective effort to keyword more accurately.

Thoughts??


EDITED TO ADD: Curious, I just searched on "moose" and found 21 images of -- TA DA! -- moose. No elk, deer, or caribou. Not sure exactly why that is.
« Last Edit: September 02, 2013, 22:15 by marthamarks »


Leo Blanchette

« Reply #1 on: September 02, 2013, 22:38 »
0
http://naturetravels.wordpress.com/2007/10/29/what%E2%80%99s-the-difference-between-a-moose-and-an-elk/

 :D

I think AJT and Cascoly need a lot of support since they are the quiet unsung heros who's hosting free search engines. In fact a huge amount of the network's success and "happy secure we matter and this proves it" feeling comes from these search engines.

+50 on this thread!

I think they both need a visual overhaul and perhaps a little help policing images. For instance a small "voting" system on images would help immensely.

Lets say I have "rat" images on one of the search engines and I find they are crowded out by pictures that are obviously "mice" or worse "hamsters". I could probably give a little "x" vote on a button that pops up on hover, thus adding images to a queue for them to quickly look at. So an example would be they would check out their "queue" and see the hamster with the prompt "RAT?" beside it. They could then do a YES or NO action.

(edit - giving them even less work, I could actually extend the voting system into Symbiostock by way of a plugin, delegeting the work to everyone. So with a large number of "agreed" votes the image would be taken off from the given search)

It saves them from having to police a HUGE network of 100,000, of images one by one. Obviously this could end up being a big community effort and the end result is the critics are definitely wrong when they say Symbiostock would be full of noise.



Second they definitely need some ways to profit. There are plenty of ways to do this, either by paid inclusion or even benefiting 3rd parties by providing site info. Its a broad subject.
« Last Edit: September 02, 2013, 22:40 by Leo »

marthamarks

« Reply #2 on: September 02, 2013, 22:54 »
0
Thanks, Leo, for your input here. Several thoughts.

1. No way, Jose, did I mean to criticize Cascoly or AJT. In truth, I didn't even look to see whose images those were. Just saw images accurately labeled "moose", "caribou", and "deer" pop up on the "elk" search page. So obviously, somewhere in their list of keywords there had to be the word "elk", which just didn't seem right to me.

2. Agree it would be great to find a way to financially support the ongoing development of our search engines. Not sure how you guys might set that up, but I will gladly contribute to the effort if I think it will help and if I see others are doing that too.

3. Elk = Moose in Scandinavia?  Who knew???   :o

Leo Blanchette

« Reply #3 on: September 02, 2013, 23:03 »
+1
There's lots of ways search engines profit from being...search engines. I think we can probably address those things - search accuracy / integrity and getting them something for their work.


stockphoto-images.com

« Reply #4 on: September 02, 2013, 23:11 »
+1
When I saw your post, addressing the issue of irrelevant images popping up, I expected worse to be honest.

Of course, for the reasons you mentioned, some images are not accurate (more the keywords aren't accurate) but I think there is a little more to it than just fixing the search engines:

1. Let's also hope for some extra common sense on the buyer's side. The buyer might look for an elk but since it's not a pet like a "tabby cat", the buyer could resort to using the animal's proper Latin name, which should be included in any animal image anyway. To some people who shoot photos of wildlife for "fun", a moose, elk or deer is all the same to them. However, if the photographer would go to the lengths of describing the animal accordingly with its Latin name, more research is required and hopefully the correct keywords are actually being used.

2. Well it's partially already mentioned in #1: The photographers also have to be a little more specific. Animals should always have the Latin name in the keywords.

3. AJT and Cascoly are gods like Leo (sorry Leo, I just had to) - they work incredibly hard without expecting a paycheck. I'm very happy to see the great functionality of the meta serches and their (so far) good accuracy. Keep in mind that some microstock agencies employ an entire department just to optimize search algorithms etc.

Leo, your idea of using multiple votes on incorrect keywords would make sense. It appears to me not feasible to single out "keyword cops" with their sole purpose browsing for inaccurate keywords all day. If funky keywords come up, they should be confirmed by three or four people (to maximize the chance of reaching native English speakers - since the keywords are in English...).

I would support this approach.

marthamarks

« Reply #5 on: September 02, 2013, 23:11 »
+1
There's lots of ways search engines profit from being...search engines. I think we can probably address those things - search accuracy / integrity and getting them something for their work.

+50

I 100% support that idea, and also your edited-in suggestion of having some kind of voting system from within the SYS network.

All of us probably are carrying some spammy keywords with our images. I've been finding 'em in my stuff these last few weeks as I've Yoasted my way through hundreds of images keyworded years ago. Often too fast, copying and pasting lists of keywords from other images that maybe weren't exactly the same but got keyworded alike anyway.

Even if we don't intend to commit keyword spam, it's easy to do. Just gotta try to winnow the spam out.
« Last Edit: September 03, 2013, 06:59 by marthamarks »

marthamarks

« Reply #6 on: September 02, 2013, 23:16 »
0
OOPS! Somehow I managed to turn my last post into a quote by Leo. Dunno how I did that but... 

Hmmm.

Anyway, the latter part of what was written above and attributed to Leo came from me.

Oops!


EDITED: I fixed the problem above. Thanks, Les, for the tip!
« Last Edit: September 03, 2013, 07:02 by marthamarks »

Leo Blanchette

« Reply #7 on: September 02, 2013, 23:32 »
0
One way to freely update the search engine's accuracy is to have humans fill out a special captcha in different places (like sign-in)

Basically once a queue begins to be formed you have this on everyone's sites:

symbiostock captcha:

"please choose the keyword which does not describe this image"

<thumbnail image>

fur, cute, brown, teen, kitten

If its a picture of a cat, the user would click the word "teen" or whatever.

Its actually an unknown captcha system so it could register CORRECT either way and let the viewer through.

« Reply #8 on: September 02, 2013, 23:45 »
+4
I would not appreciate anything that can alter the keywords on my images without my say-so.

I understand that we all make mistakes, but my experience with Dreamstime's keyword "reporting" tool is that 99% of the supposed problems are either (a) nonsense suggestions (I had a meadow full of dandelions get reported for "dandelions") or (b) a result of breaking up phrases into separate words. Same thing with iStock's wiki - I didn't agree with the vast majority of the changes.

I follow some pretty tight rules when keywording - for example, I've seen cases where people put names of multiple countries on a picture of a tropical beach because it could be any of them. I think this is wrong. The only way to do location information is where it actually was - city, state, country, region. No one other than me can be sure where things were taken. I don't care what anyone else thinks about where it looks like.

I'm not advocating individual control because I'm a spammer who wants to be free to continue :)

Some way to suggest to the image owner that they have an error is fine. But we are independent sites - as many others have pointed out - so barring some problem with a malicious spammer trying to ruin the network on purpose I don't think there should be any central control of what keywords are on what images.

« Reply #9 on: September 03, 2013, 01:45 »
0
One "advantage"of having to re upload most of my images is that I can now keyword in phrases instead of singe words , for instance cavalier king charles spaniel, singlely there is no king but it is a valid keyword.  Descriptions are also searched and as someone mentioned tne other day they were looking for bird and some of my cat pictures came up, valid due to description

It would be good to have some kind of alert system for keywords that bare no relation to the images though and I would support this, especially as the network grows with people who have no experience with stock, better than blocking them

« Reply #10 on: September 03, 2013, 05:11 »
+2
I dont want my keywords changed without my permission, but i am always open to suggestions. Early in the istock game, we were told to keyword with anything that might be related to what was in the image, but those rules have changed over the years. I am going through every one of my photos before uploading and i think they are correct. But if someone else has a different take on it, i am willing to listen. But ultimately the change should be left to the site owner.


I also think that bad keywording will frustrate some buyers and they wont buy there. That means the problem will self correct on the individual site (lost sales) but hopefully wont turn off buyers to the whole network.

« Reply #11 on: September 03, 2013, 06:20 »
0
A question for Leo - someone mentioned turning off the partial word search feature, giving an example that a search on Asian would also return Caucasian. What governs the returns on a network search?  Is it the setting on the network partner or would the original site search overrule the settings on the partner site?

ShazamImages

  • ShazamImages.com
« Reply #12 on: September 03, 2013, 06:24 »
+2
I would not appreciate anything that can alter the keywords on my images without my say-so.

I dont want my keywords changed without my permission, but i am always open to suggestions.

I also would not want to have the keywords on my images altered without my approval (or have my images show up at the back of the search as a result).  I don't mind if someone wants to notify me of a keyword that they think could be incorrect, but I should have the final say as to what keywords are on my images.

As I stated in another thread, there are plenty of keywords that might seem inappropriate because complex searches don't work at this time (on the base version).  So an artist that wants to let the buyer know that there is "copy space" in the image, currently has to break the phrase up into two words: "copy" and "space".  So when a buyer now wants to search on "space" (to find images related to astronomy), they end up with all sorts of other images.

In addition, I was doing searches on the major players the other day and they ALL had images that were keyworded incorrectly.  I was surprised by how many images showed up on the first few pages that seemed to be totally off in respect to their keywords.  If the major players can't fix the keywording issue (with the billions of dollars that they are making), then there is little chance that small mom-and-pop outfits will be able to come up with a solution.

Having a voting system will only end up with sites forming groups and attacking others so that their images will get to the front of the search and push others to the back.  I've seen it done before time and again.

« Reply #13 on: September 03, 2013, 06:41 »
0
There is no easy solution.


And maybe the major players are major because they are keyword stuffing.  :o 

« Reply #14 on: September 03, 2013, 06:56 »
0
Having a voting system will only end up with sites forming groups and attacking others so that their images will get to the front of the search and push others to the back.  I've seen it done before time and again.

If the whole thing is open source it is probably possible for groups or indiviuduals to build and connect to their own alternative search systems if they feel they have a better model of how it will work best.

« Reply #15 on: September 03, 2013, 07:04 »
0
Why vote, why not just a notification system?   That way if you find a lot of images coming up that when you check the description as well and you consider they are still wrong and will damage your searches just exclude that site from the results ?   I do not know if they come up in the symbiostock.info network results but would not in your own.

Spectral-Design.net

« Reply #16 on: September 03, 2013, 07:06 »
0
As others pointed out, I do not believe in rating systems nor would I want that anybody alters any keywords. The core and heart of symbiostock is independent photographers. I see the limitations of this concept as it comes along with a chaos of different licenses, prices, quality standards etc. but this is what comes along with the desired solution and I think it is good.

« Reply #17 on: September 03, 2013, 08:48 »
0
Leo's link to the elk/moose article proves that the search results for "elk" are actually very good and accurate since we never know WHO will do the search.
In addition to that, every SYS site owner has a fundamental interest in accurate keywords because he/she is responsible for the success of his/her website. If there are lots of inapplicable keywords to be found, visitors will probably avoid the site in the future.

Bottom line, I don't see a need for moderation of textual content.

marthamarks

« Reply #18 on: September 03, 2013, 10:11 »
0
If there are lots of inapplicable keywords to be found, visitors will probably avoid the site in the future.

I'm okay with not doing anything "official." Nobody wants somebody else messing with their keywords.

But when it comes to the global search, it's all our skin in the game. If would-be buyers find too many irrelevant images pulled up in their search, they won't just skip a particular site. They'll likely skip everybody's sites.

travelwitness

« Reply #19 on: September 03, 2013, 10:37 »
0
There are 2 things already built into the base theme that could help clean up the search results.

  • Weighting results using the internal rating system already built into the base theme - ie: using the best work from each portfolio.

  • Favouring search results to match specialities - there is already a weighted keyword option in the base theme that could be exploited.

« Reply #20 on: September 03, 2013, 12:52 »
0
If there are lots of inapplicable keywords to be found, visitors will probably avoid the site in the future.

I'm okay with not doing anything "official." Nobody wants somebody else messing with their keywords.

But when it comes to the global search, it's all our skin in the game. If would-be buyers find too many irrelevant images pulled up in their search, they won't just skip a particular site. They'll likely skip everybody's sites.

Agree.
At one time, the general wisdom was The More Keywords, The Better.
I don't think that's true anymore, at least not for the main Internet search engines.

« Reply #21 on: September 03, 2013, 13:31 »
0
lots of topics to address:

first, I don't want to make any decisions about which keywords are correct and which not (see the thread I started earlier about 'keyword spam', since we'll all disagree about where to draw the line.

there's no easy way to notify site owners of 'glagged' keywords since there's no direct way to contact them; each site decides whether to have a public email somewhere on their site.  the email in the symbiocard is (rightly) encrypted

regarding the examples - I agree that a beach in Bermuda should not have Hawaii or Riviera as keywords; but I don't see a problem with moose,elk & deer --  there are 2 major viewpoints - first is to make the description & keywords as accurate as possible - precise and scientific.  however, searchers are NOT necessarily that precise, or even knowledgable.  if someone is searching for an elk, they might be able to use a moose just as easily; and many people don't know the difference between an ape & a monkey.  so my personal rule is to have the description be as accurate as possible, using scientific names only when I am positive of the identification; but I am looser when it comes to keywording

another source of search confusion is if the engine uses descriptions -- while accurate, descriptions may not actually describe what's in the picture -- eg, there's a perfectly appropriate description of a cat 'thnking  about catching a bird' -- fine description, but it will appear if 'bird' is searched using the description

right now, I use keywords only in my search engine, but i'm playing with various ways to weight the search.  we can handle most searches now, since a few words will bring up a manageable set of results; but something stronger will be needed as we grow.  (even the most common keywords produce greatly reduced results when combined.)

another option is condensing results by combining similars -- for 'red food' this reduces the results from 1200 to 600 and the user can click on the image to see similar; 'beautiful woman' reduces 3500 to about 750 using this very crude filter

i'm always interested in other ideas for searching

steve

« Reply #22 on: September 03, 2013, 14:38 »
+1
On the matter of accuracy in keywording nature subjects, there was a similar discussion on the Alamy forum a few months ago. A knowledgeable ornithological type declared there was no such thing as a 'seagull', only varieties of gull.

Someone else then pointed out that there had been many searches on Alamy for 'seagull'; it's likely that most buyers won't be worried about the particular species as long as there is some sort of gull in whatever situation they want.

The ornithologist went off to add 'seagull' to his keywords.

I agree with Steve above, descriptions should be as precise as possible, but some slack in keywords for the non-expert buyers is a good thing.


 

Related Topics

  Subject / Started by Replies Last post
9 Replies
7862 Views
Last post October 11, 2006, 09:30
by Striker77s
11 Replies
7040 Views
Last post October 19, 2006, 11:59
by CJPhoto
42 Replies
17134 Views
Last post November 10, 2007, 01:50
by fotografer
3 Replies
4425 Views
Last post April 20, 2008, 12:03
by jsnover
Terrible search engine!

Started by lagereek « 1 2  All » Dreamstime.com

38 Replies
14993 Views
Last post February 21, 2010, 08:15
by GeoPappas

Sponsors

Mega Bundle of 5,900+ Professional Lightroom Presets

Microstock Poll Results

Sponsors