MicrostockGroup Sponsors


Author Topic: unique phrases - keywords question  (Read 4212 times)

0 Members and 1 Guest are viewing this topic.

« on: March 14, 2015, 06:53 »
0
Background:

As part of a project I am working on, I have been writing functionality which parses embedded IPTC image metadata.

With respect to this I am currently specifically focused on the different ways in which multi-word phrases or unique phrases are used.

This question relates specifically to Shutterstock. Item 6 of this article at Shutterstock talks about unique phrases.

Quote
6) For multiple words where you would like to create a unique phrase, consider using quotation marks. Example: clay pot. Depending on whether quotation marks are used, this phrase can be understood as two separate keywords or one keyword. In response to clay pot our search engine would only return an image if both clay and pot are separate keywords associated with the image. Now, if someone wanted to treat this keyword as a phrase, they would place it in quotations, "clay pot. The search engine would then return all images where the phrase "clay pot" was used as the keyword. It does not return images which only used the individual keywords clay and pot.


Question:

What I am seeing when I search for clay pot using quotes seems to be images which seem to have clay and pot as keywords - but seemingly no images including the exact unique phrase. As if the search quotes were being ignored.

Can anyone explain what it going on or help me understand this better ? Perhaps someone who has specifically used a unique phrase in quotes at Shutterstock. Is it that the unique phrase "clay pot" is there but is not listed in the visible customer facing keywords ?


Shelma1

« Reply #1 on: March 14, 2015, 07:23 »
0
That's an ooooold article. Recently SS began splitting up phrases into individual keywords, so I don't think that function works any more.

Semmick Photo

« Reply #2 on: March 14, 2015, 07:32 »
0
Thats a bug, it has been reported, SS knows about it and they cant fix or wont fix it.

When I submit this keyword Notre-Dame de lImmaculee Conception, it ends up like this (as separate keywords):

Notre-Dame
de
l
Immaculee
Conception


But apparently the complete keyword is attached to the image in the back end.

Semmick Photo

« Reply #3 on: March 14, 2015, 07:35 »
0
And their search is removing the apostrophe as well.

Basically anyone searching with the correct keyword will never find the images of Notre-Dame de lImmaculee Conception

Some tech company

« Reply #4 on: March 14, 2015, 07:46 »
0
Interesting.  I always include keyword phrases on the idea that it might improve the search.  Following your example I just searched on clay pot (no quotations) and got 24,000 images of clay pots (or at least on the first page they were all relevant).  Then I searched on "clay pot" in quotations and got the same 24,000 images in the same order.  So it made no difference.  I guess bothering with phrases is a waste of time unless it helps at other agencies.  Thanks for pointing that out.

« Reply #5 on: March 14, 2015, 07:52 »
0
Thanks very much for both of your informative answers.

But apparently the complete keyword is attached to the image in the back end.

How are you able to reach this conclusion ?

I guess bothering with phrases is a waste of time unless it helps at other agencies.  Thanks for pointing that out.

I suppose that depends upon whether it is a bug or a 'new normal' !

Semmick Photo

« Reply #6 on: March 14, 2015, 08:02 »
0
Thanks very much for both of your informative answers.

But apparently the complete keyword is attached to the image in the back end.

How are you able to reach this conclusion ?


SS told me, I need to find that email if you want me to

Semmick Photo

« Reply #7 on: March 14, 2015, 08:07 »
0
Quote
the words are being broken up on the front end (screen) but this does not effect the search engine (back end).

« Reply #8 on: March 14, 2015, 11:43 »
0
Quote
the words are being broken up on the front end (screen) but this does not effect the search engine (back end).

That's useful to know. Thanks very much.

In your post above you said that "Notre-Dame de l'Immacule-Conception" as a phrase gets turned into separate constituent keywords. Including de and l (but with the apostrophe stripped). When you submit your image for review do you leave in the de and l keywords ?

Uncle Pete

« Reply #9 on: March 14, 2015, 11:55 »
+1
From SS this week: comma separated phrases, or phrases in general will only negatively affect the image in search because the search engine will only include those images if that exact phrase is used (since the phrase is essentially a 'keyword') Which I'm not sure, is true anymore?

Yes the system has been breaking these up for some time, and removing duplicates in the process.

Keyword UPLOAD Example, "auto racing", "Car Racing", "Car",  "Auto", "racing", only the last three show in my keywords as the duplicates are removed by the system.

Keyword SEARCH test examples:

"Grand Prix" 
Grand prix Stock Photos, Illustrations, and Vector Art (31,064)

Grand Prix not in quotes
Grand Prix Stock Photos, Illustrations, and Vector Art (31,064)

Same images in the same order and yes I did add "Grand Prix" in quotes to one of my images, as a test, (yesterday) before doing this.

It appears to me, that it no longer makes a difference, uploading or searching, and the quotes and phrases have become irrelevant. Someone else might want to do some more tests.

Also yes the system has been messing with apostrophes. Making them illegal characters or just dropping them.

Another test:

Grand Prix Racing Stock Photos, Illustrations, and Vector Art (24,607)

Grand Prix Race Stock Photos, Illustrations, and Vector Art (24,607)

Didn't used to be this way, now Race or Racing are interchangeable, either one in the keywords will suffice. Just saved me one more keyword spot.  :)
« Last Edit: March 14, 2015, 11:58 by Uncle Pete »

« Reply #10 on: March 14, 2015, 12:04 »
0
From SS this week: comma separated phrases, or phrases in general will only negatively affect the image in search because the search engine will only include those images if that exact phrase is used (since the phrase is essentially a 'keyword')

Hi Pete. Have you got a link for that ?

Uncle Pete

« Reply #11 on: March 14, 2015, 12:18 »
0
No, it was a private email = no link.

And I don't believe it's accurate, since I just tested with the "Grand Prix" search and the results are identical. I believe at this point, that phrases within quotes are irrelevant and have no effect on searches anymore.

If someone can find an example that contradicts what I posted, please do.

"Grand Prix" in quotes vs Grand Prix no quotes, identical results, in the same order.

As I suggested, someone else go test and see what you find. It won't take more than five minutes.





From SS this week: comma separated phrases, or phrases in general will only negatively affect the image in search because the search engine will only include those images if that exact phrase is used (since the phrase is essentially a 'keyword')

Hi Pete. Have you got a link for that ?

« Reply #12 on: March 14, 2015, 15:31 »
0
I tried it on a few of pairs I've used in quotes like "Navajo sandstone" and "Red river"

both return same # if used without quotes, so the search is not impaired.   the problem comes when someone enters 'navajo' or 'red' and gets thousands of images that are incorrect since those paired tags are not the same as the 2 parts searched individually

Uncle Pete

« Reply #13 on: March 15, 2015, 09:12 »
+1
True if I understand your message. One word searches will produce terrible results, everywhere.

What I was getting at, is the identical results for a phrase in quotes, as we uploaded it, or a phrase in quotes as someone searching or just entering three words.

All the "in quotes" phrases and special effort is now irrelevant and unnecessary. There are no more Unique Phrases on SS.

Lets hope that buyers are smart enough to search for Red Navajo Blanket if that's what they wanted. No one should use a one word search if they have any idea what they want to find. There are 50 million images.  :)


I tried it on a few of pairs I've used in quotes like "Navajo sandstone" and "Red river"

both return same # if used without quotes, so the search is not impaired.   the problem comes when someone enters 'navajo' or 'red' and gets thousands of images that are incorrect since those paired tags are not the same as the 2 parts searched individually

« Reply #14 on: March 15, 2015, 15:40 »
0
True if I understand your message. One word searches will produce terrible results, everywhere.

What I was getting at, is the identical results for a phrase in quotes, as we uploaded it, or a phrase in quotes as someone searching or just entering three words.

All the "in quotes" phrases and special effort is now irrelevant and unnecessary. There are no more Unique Phrases on SS.

Lets hope that buyers are smart enough to search for Red Navajo Blanket if that's what they wanted. No one should use a one word search if they have any idea what they want to find. There are 50 million images.  :)


I tried it on a few of pairs I've used in quotes like "Navajo sandstone" and "Red river"

both return same # if used without quotes, so the search is not impaired.   the problem comes when someone enters 'navajo' or 'red' and gets thousands of images that are incorrect since those paired tags are not the same as the 2 parts searched individually

not quite what I meant -- if my 2 linked pairs become single words, someone searching for 'red' is going to get my red river images ; search for Navajo, they'll get 'navajo sandstone' which is a geological formation

so the search is degraded by automatically storing words individually - probably another instance of making changes without bothering to look at the implications

Uncle Pete

« Reply #15 on: March 15, 2015, 17:06 »
+1
Ah, sorry I missed the point.

I think the search is Word AND word AND word. Not all matching words OR included. You may want to go test your results and see.

I don't think this is a new change, it may have been going on for months and no one noticed until I was testing for selective matches and wondering why some of my phrases were single words - mostly things like someones name, which fall right into what you wrote.

People have complained since last year about phrases being broken up.

My test phrase was Juan Pablo Garcia - an Indy Lights driver and I have the only seven images with his name in them. (part of the discovery was my error and illegal characters and having to go back and edit the images, one by one)

(in quotes) Juan Pablo Garcia Stock Photos, Illustrations, and Vector Art (7)
(just three words) Juan Pablo Garcia Stock Photos, Illustrations, and Vector Art (7)

Take a look:  http://tinyurl.com/ktp64fj

If what you were saying is true, I would get millions of every with Juan or Pablo or Garcia. I don't think it's broken, I think it's very specific. Only images with all three words are found.


True if I understand your message. One word searches will produce terrible results, everywhere.

What I was getting at, is the identical results for a phrase in quotes, as we uploaded it, or a phrase in quotes as someone searching or just entering three words.

All the "in quotes" phrases and special effort is now irrelevant and unnecessary. There are no more Unique Phrases on SS.

Lets hope that buyers are smart enough to search for Red Navajo Blanket if that's what they wanted. No one should use a one word search if they have any idea what they want to find. There are 50 million images.  :)


What has left for sure is anything with "words in quotes" only matching when we uploaded the identical "words in quotes" a specific phrase match limited to only people who uploaded that phrase.

Now everyone with those three words, gets included. The search has been expanded and we all get better exposure.

No I don't think it has harmed anything or produce worse results.


I tried it on a few of pairs I've used in quotes like "Navajo sandstone" and "Red river"

both return same # if used without quotes, so the search is not impaired.   the problem comes when someone enters 'navajo' or 'red' and gets thousands of images that are incorrect since those paired tags are not the same as the 2 parts searched individually



not quite what I meant -- if my 2 linked pairs become single words, someone searching for 'red' is going to get my red river images ; search for Navajo, they'll get 'navajo sandstone' which is a geological formation

so the search is degraded by automatically storing words individually - probably another instance of making changes without bothering to look at the implications
« Last Edit: March 15, 2015, 17:08 by Uncle Pete »

Semmick Photo

« Reply #16 on: March 16, 2015, 02:42 »
0
Quote
the words are being broken up on the front end (screen) but this does not effect the search engine (back end).

That's useful to know. Thanks very much.

In your post above you said that "Notre-Dame de l'Immacule-Conception" as a phrase gets turned into separate constituent keywords. Including de and l (but with the apostrophe stripped). When you submit your image for review do you leave in the de and l keywords ?

No, its not happening like that.

I add the keyword to my images, for example :

Notre-Dame de l'Immacule-Conception


Shutterstock's editor will then turn special characters  into gibberish, like so

Notre-Dame de [email protected]%[email protected]%e-Conception

I then, delete the keyword from the image and replace it with the correct one

Notre-Dame de l'Immacule-Conception

Then I submit the image, and it will be sitting in the que with this keyword phrase correctly added

Notre-Dame de l'Immacule-Conception

Once the image goes online, the keyword phrase at the front end will be seen as individual keywords.

For example: Close up will become

Close
Up


The split up happens as soon as SS processes the image, not before that, they sit in the queue with the correct phrases


cuppacoffee

« Reply #17 on: March 16, 2015, 06:58 »
+1
It's the difference between inputting the words while on the site vs your input of the original text.

It has to do with the character set one uses when composing the words vs the character set used to read those words. You may be using unicode, utf, asci, or something else when you enter text, some programs dont read all of those different sets. On the server side you cant be sure what encoding is used.

All text is usually converted to a stream of ASCII text characters when you hit send from your side. If a message contains characters that arent in the ASCII character set, some programs and services use different ways of converting those characters for transmission and reception. Both the sending and receiving computer have to use the same character encoding if the final results show up as you intended them to. This doesnt always happen. Special characters are complex codes, not characters and not all systems read them the same. Throw in the differences between mac and windows character codes, how different programs and browsers resolve the characters and you get gibberish - code strings instead of the special characters. Other sites have the same problems too and some remove the offending characters by default. If the translation process is not fine-tuned to accept different character coding you end up with gibberish (a code instead of the punctuation). This has been going on forever on any and all sites.

Semmick Photo

« Reply #18 on: March 16, 2015, 07:07 »
0
Maybe, but that has nothing to do with the splitting up of the phrases. Maybe I shouldnt have mentioned the special characters in this instance.

However, 123 has no problem reading my keywords and keeps everything the same way I add it to the meta data.

cuppacoffee

« Reply #19 on: March 16, 2015, 07:28 »
0
I think it has to do with the volume of images on the sites now. Full-text search algorithms take longer. When reading a small number of text fields a full-text-search engine can take it's time and do a better job of associating certain words with each other. When the number of documents to search is huge the problem of full-text search involves both indexing and searching. The indexing step scans the text of all the image fields and builds a list of search terms (a concordance). In the search stage when performing a specific query only the index is referenced, rather than the text of the original fields. Bottom line, more sites are going to a different search algorithm and it looks like SS has found this easier and faster and most important it takes less server resources. The search results when compared from phrase to individual words seems to be similar, if not perfect. Place names are a problem though.

« Reply #20 on: March 16, 2015, 08:07 »
0
All text is usually converted to a stream of ASCII text characters when you hit send from your side.

Yes. I guess that you specifically mean - when entering keywords on the contributor side at Shutterstock ? I think it wants a subset of original ASCII. And it is worth noting that Shutterstock specifically says to keyword in English. A few exceptions apart - English mostly does not use accents. Pete also says that it doesn't like accents.

The Shutterstock search engine accepts unicode characters. Eg 医生

I would be very interested to know why that search ( 医生 ) produces such radically different results at Shutterstock vs iStock. If anyone has any idea. The characters can be copied and pasted into the searches.
^ ETA: I think it must have been a bug at iStock which has now been fixed or which has corrected itself

Shutterstock search seems very fast and impressive.
« Last Edit: March 17, 2015, 11:43 by bunhill »

Semmick Photo

« Reply #21 on: March 16, 2015, 08:16 »
0
Place names in a different language are spelled differently and cant be translated to English. Anyone searching for the name using correct spelling with special characters will get zero results.


 

Related Topics

  Subject / Started by Replies Last post
27 Replies
10070 Views
Last post October 20, 2006, 17:54
by GeoPappas
5 Replies
4290 Views
Last post November 13, 2006, 19:33
by Greg Boiarsky
5 Replies
2779 Views
Last post June 11, 2012, 00:37
by robynmac
2 Replies
2532 Views
Last post December 10, 2012, 17:48
by chrisbradshaw
6 Replies
3838 Views
Last post August 05, 2014, 14:51
by cuppacoffee

Sponsors

Mega Bundle of 5,900+ Professional Lightroom Presets

Microstock Poll Results

Sponsors

3100 Posing Cards Bundle