Author Topic: How to setup new "related images" feature? (Read 16466 times)

Chico · « **on:** April 16, 2013, 07:03 »

ok, i run "related images process". Was fast and no errors. Mail was received.

So i go to widgets area and i don't see any "related images" widget. Where is my fault? And where is the right place to put widget ?

Some good soul can help?

Running last update, 1.3.1.

THX

ajt · « **Reply #1 on:** April 16, 2013, 07:14 »

It is called there "Similar images". Drag it to "Image page bottom" and save.

Chico · « **Reply #2 on:** April 16, 2013, 07:33 »

Quote from: ajt on April 16, 2013, 07:14

It is called there "Similar images". Drag it to "Image page bottom" and save.

Thanks a lot. Works fine!

Chico · « **Reply #3 on:** April 16, 2013, 08:27 »

I did some uploads few minutes ago and the new images doesn't have "similar files" displayed.

I guess I have to run ""related images process" after each upload, right? Anyway to turn it permanent?

steheap · « **Reply #4 on:** April 16, 2013, 08:55 »

This is a fantastic addition!

A question though - the choice of similar images sometimes comes up with some strange similars. When I search google for my "bengal cat licking lips" the first image in my similars is one of a pair of hands isolated against white, followed by the expected series of cat images. I may be rubbish at keywording, but I don't think there is a great deal of similarity between those sets of keywords. There will definitely be some similars - isolated, white background, pair, etc. but others are much closer.

How is the order established, and is there something we can do to impact the similar search as we hone our skills?

Steve

Edit - did another test with "Solid gold coins treasure chest" - the similars appear to be ordered in the wrong way - the closest match are at the bottom of the sidebar rather than at the top? Is it as simple as that?

franky242 · « **Reply #5 on:** April 19, 2013, 16:07 »

One question: The generation of related images for me always stops at around 200 images - I gave it several tries and each time receive the confirmation mails regarding 100 and 200 images being processed but that´s it then, no matter how long I wait. Site: franky242.net/shop/ - about 2.800 photos available. Has anyone being able to process more than 200 images? Memory should not be a problem since I have plenty allocated (bluehost) and everything else works fine, even processing large amounts of uploaded images.

Cheers, Frank

Leo Blanchette · « **Reply #6 on:** April 19, 2013, 16:12 »

So just to confirm - it does complete the task, but it stops emailing you after so many?

Its possible your server takes methods to avoid spam.

steheap · « **Reply #7 on:** April 19, 2013, 16:21 »

I'm at 342 images now and I get 4 emails - 100, 200, 300 and 342. So it works for me, so far!

steve

franky242 · « **Reply #8 on:** April 19, 2013, 16:46 »

nope, it does NOT complete, the wheel keeps on turning forever (I did wait for about an hour each time) and if I check which images feature related images, they end up each time about the last 200 ones uploaded, older ones do not have the related images module on their page.

Quote from: Leo on April 19, 2013, 16:12

So just to confirm - it does complete the task, but it stops emailing you after so many?

Its possible your server takes methods to avoid spam.

steheap · « **Reply #9 on:** April 19, 2013, 16:49 »

Looking at the email timestamps, it takes 4 minutes to do 340 images and I normally have around 45 keywords per image.

Steve

Chico · « **Reply #10 on:** April 19, 2013, 16:53 »

Just run "update related images" with 91 files online and mail arrives ok.

Leo Blanchette · « **Reply #11 on:** April 19, 2013, 17:04 »

Here's a little tidbit for you -

The more image you have the more time it will take per image.

For instance, if you have 1000 images, each of those 1000 images are checked against those 1000 images.

If you have 3000 images, each of those 3000 images are checked against 3000 images.

See a pattern? It multiplies your time.

franky242 · « **Reply #12 on:** April 19, 2013, 17:04 »

hm, timestamp between my mails of 100 and 200 images is 4 mins, I also have about 40-50 keywords. Seems that my server is much slower then, although also bluehost. Might this be part of the problem? Is the script simply taking too long to process more of my images?

Quote from: steheap on April 19, 2013, 16:49

Looking at the email timestamps, it takes 4 minutes to do 340 images and I normally have around 45 keywords per image.

Steve

franky242 · « **Reply #13 on:** April 19, 2013, 17:05 »

Ah, this might be the difference to Steve: With my 2700 images available, I guess this might be the problem. And now?

Quote from: Leo on April 19, 2013, 17:04

Here's a little tidbit for you -

The more image you have the more time it will take per image.

For instance, if you have 1000 images, each of those 1000 images are checked against those 1000 images.

If you have 3000 images, each of those 3000 images are checked against 3000 images.

See a pattern? It multiplies your time.

Leo Blanchette · « **Reply #14 on:** April 19, 2013, 17:23 »

I might be confused. Did you say it abandons the files after a while - or it just stops sending emails? ie, after a certain point files do not get updated.

cascoly · « **Reply #15 on:** April 19, 2013, 17:57 »

my 275 images run to completion

i'd doubt there's a spam filter at so few emails -- i'd send 20+ emails to myself within a minute or 2 when some of my online games were updating & I was debugging

a more likely possibility is it's running out of time; not sure how WP reports such crashes - my ASP pages break and then show whatever part of the page had been generated, and then show an error message saying script timed out (these pages with script errors also show up in google webmaster tools as 404 errors)

cascoly · « **Reply #16 on:** April 19, 2013, 18:33 »

not sure how the similar is supposed to work, so not sure if this is a bug or a feature:

I start with

http://cascoly-images.com/pix/image/a-group-of-skiers-have-a-leisurely-lunch-outdoors-2/

but none of the similars show other 'skier' images that I get if I search for
http://cascoly-images.com/pix/search-images/skier/

I then choose
http://cascoly-images.com/pix/image/skiers-prepare-for-their-next-run/

and one of the similars is
http://cascoly-images.com/pix/image/antique-map-of-merovingian-france/

but THAT image doesn't show any similars that are not also maps even though there are 45 France images

==========================

are there weights assigned to the detection of similars based on keywords?

does keyword position matter?

added next day: I checked the code & appears answer is no to both of these - explained more below

franky242 · « **Reply #17 on:** April 20, 2013, 03:40 »

Quote from: Leo on April 19, 2013, 17:23

I might be confused. Did you say it abandons the files after a while - or it just stops sending emails? ie, after a certain point files do not get updated.

It abandons Processing Files somewhere between 200 and 300 files - I receive no more mails after the second one (200 processed) and if I check back (even after 10 hours now) only the last 200 something submitted images display related ones on their image detail page.

Hence I could imagine either we find a way that the script continues to run or it could check maybe upon start which images were processed recently (<24h should be sufficient) and start with the next one. That way I could process all images by calling the script several times - might also be saving resources for other users that upload frequently and do not need to process all images each time?

Sorry for not being more precise before, English is not my native language! :-)

Leo Blanchette · « **Reply #18 on:** April 20, 2013, 03:48 »

I could have never guessed you were speaking english as a second language. I'm just now learning one myself - but half the world does this as a requirement

I *think* there are one of two issues;

I have the script time set to infinity. But your server might not be allowing that. If you turn on error reporting in your functions.php (I'd have to insert the code again) I'm pretty sure it would say "memory" or "time"error on the page, assuming you don't leave it.

Remind me again what your hosting is? I'll check it tomorrow afternoon. Next week I'm looking forward to some big improvements.

cascoly · « **Reply #19 on:** April 20, 2013, 14:26 »

Quote from: franky242 on April 20, 2013, 03:40

It abandons Processing Files somewhere between 200 and 300 files - I receive no more mails after the second one (200 processed) and if I check back (even after 10 hours now) only the last 200 something submitted images display related ones on their image detail page.

Hence I could imagine either we find a way that the script continues to run or it could check maybe upon start which images were processed recently (<24h should be sufficient) and start with the next one. That way I could process all images by calling the script several times - might also be saving resources for other users that upload frequently and do not need to process all images each time?

....

it really does sound like a script that times out, but without an error being displayed

a simple fix would be to process the images from most recently updated (this can be done in the sql select that sets up the processing) then, if the script fails, the next time it would process new items rather than the ones already done

cascoly · « **Reply #20 on:** April 20, 2013, 14:36 »

Quote from: steheap on April 16, 2013, 08:55

This is a fantastic addition!

A question though - the choice of similar images sometimes comes up with some strange similars. When I search google for my "bengal cat licking lips" the first image in my similars is one of a pair of hands isolated against white, followed by the expected series of cat images. I may be rubbish at keywording, but I don't think there is a great deal of similarity between those sets of keywords. There will definitely be some similars - isolated, white background, pair, etc. but others are much closer.

How is the order established, and is there something we can do to impact the similar search as we hone our skills?

right now, it appears the matches are performed by giving 1 point for every keyword in common, then displaying the X highest scores; this distorts the results by unfairly weighting common keywords -- keywording style plays in here: if you use many, vague keywords you'll get less exact results than if you use fewer, specific ones

common keywords like 'seattle' or 'blue' are less useful in matching than uncommon ones like 'totem' or 'tiger', so a better result would be to weight the matches, giving less relevance to common keyword. shouldn't take Leo more than a day or 2 to do this!

cascoly · « **Reply #21 on:** April 20, 2013, 14:42 »

Quote from: Leo on April 20, 2013, 03:48

I could have never guessed you were speaking english as a second language. I'm just now learning one myself - but half the world does this as a requirement

I *think* there are one of two issues;

I have the script time set to infinity. But your server might not be allowing that. If you turn on error reporting in your functions.php (I'd have to insert the code again) I'm pretty sure it would say "memory" or "time"error on the page, assuming you don't leave it.

Remind me again what your hosting is? I'll check it tomorrow afternoon. Next week I'm looking forward to some big improvements.

most servers will not allow infinity, since it's to easy to miscode an infinite loop, so even if you set it in code, it'll be over-ridden. when I have processes like this, I set the timeout to a larger number at the start, then reset it after the process finishes to keep the host happy

Pilens · « **Reply #22 on:** April 20, 2013, 14:55 »

Quote from: cascoly on April 20, 2013, 14:36

Quote from: steheap on April 16, 2013, 08:55
This is a fantastic addition!

A question though - the choice of similar images sometimes comes up with some strange similars. When I search google for my "bengal cat licking lips" the first image in my similars is one of a pair of hands isolated against white, followed by the expected series of cat images. I may be rubbish at keywording, but I don't think there is a great deal of similarity between those sets of keywords. There will definitely be some similars - isolated, white background, pair, etc. but others are much closer.

How is the order established, and is there something we can do to impact the similar search as we hone our skills?

right now, it appears the matches are performed by giving 1 point for every keyword in common, then displaying the X highest scores; this distorts the results by unfairly weighting common keywords -- keywording style plays in here: if you use many, vague keywords you'll get less exact results than if you use fewer, specific ones

common keywords like 'seattle' or 'blue' are less useful in matching than uncommon ones like 'totem' or 'tiger', so a better result would be to weight the matches, giving less relevance to common keyword. shouldn't take Leo more than a day or 2 to do this!

It might be actually easier to realize than it sounds. Some sort of keyword statistic must be already available for the tag cloud widget. So maybe this can be tapped for refining selection of similars...

cascoly · « **Reply #23 on:** April 20, 2013, 15:42 »

Quote from: Pilens on April 20, 2013, 14:55

It might be actually easier to realize than it sounds. Some sort of keyword statistic must be already available for the tag cloud widget. So maybe this can be tapped for refining selection of similars...

right, the count for each keyword is stored, so the weighting could be done as a normalization from 1 to 100 -- if the most keywords is 237, then each weighted keyword would be 101- 100([count]/237) rounded down

237 --> 1
155 --> 35
1

Pilens · « **Reply #24 on:** April 20, 2013, 16:47 »

Quote from: cascoly on April 20, 2013, 15:42

Quote from: Pilens on April 20, 2013, 14:55

It might be actually easier to realize than it sounds. Some sort of keyword statistic must be already available for the tag cloud widget. So maybe this can be tapped for refining selection of similars...

right, the count for each keyword is stored, so the weighting could be done as a normalization from 1 to 100 -- if the most keywords is 237, then each weighted keyword would be 101- 100([count]/237) rounded down

237 --> 1
155 --> 35
1

I'd interested to see what similars your formula would dig up as opposed to the existing related images widget.

cascoly · « **Reply #25 on:** April 20, 2013, 17:46 »

Quote from: Pilens on April 20, 2013, 16:47

I'd interested to see what similars your formula would dig up as opposed to the existing related images widget.

ok -- check out http://cascoly.com/symbiostock-related-search.asp

I was curious too, so I set up a little system in excel that let me calculate the similarity tables for the 2 approaches. happily my prediction seems to work, at least at this level. I purposely set it up so there'd be ambiguities like 'leeks from france' which would be selected by the simple algorithm for the image "skiing France", but is not selected by the weighted approach. plus, the weighted model should perform even better in a larger database -- the main problem will be creating and calculating the matrices, but that might be incorporated in the process Leo uses now to set everything up

Leo Blanchette · « **Reply #26 on:** April 20, 2013, 18:49 »

You guys been holding out on me! Looks like we've got some great techs on this project. Ill be on the big issues Monday

Pilens · « **Reply #27 on:** April 21, 2013, 01:37 »

Quote from: cascoly on April 20, 2013, 17:46

Quote from: Pilens on April 20, 2013, 16:47

I'd interested to see what similars your formula would dig up as opposed to the existing related images widget.

ok -- check out http://cascoly.com/symbiostock-related-search.asp

I was curious too, so I set up a little system in excel that let me calculate the similarity tables for the 2 approaches. happily my prediction seems to work, at least at this level. I purposely set it up so there'd be ambiguities like 'leeks from france' which would be selected by the simple algorithm for the image "skiing France", but is not selected by the weighted approach. plus, the weighted model should perform even better in a larger database -- the main problem will be creating and calculating the matrices, but that might be incorporated in the process Leo uses now to set everything up

Wow! You put quite an effort into this. I agree, at this level your approach seems to work. I also think this should work even better in a larger database. Still, I have a hard time imagining it won't produce any oddities at all. In any case it'd be great to see this going live some day...

Leo Blanchette · « **Reply #28 on:** April 21, 2013, 01:43 »

http://cascoly.com/symbiostock-related-search.asp
WOA

BTW - I've credited the source a few times, but the related images comes from here:

http://wordpress.org/support/topic/custom-query-related-posts-by-common-tag-amount?replies=8

I simply had to modify it slightly to work with the 'image' custom post type.

cascoly · « **Reply #29 on:** April 21, 2013, 02:12 »

Quote from: Pilens on April 21, 2013, 01:37

Wow! You put quite an effort into this. I agree, at this level your approach seems to work. I also think this should work even better in a larger database. Still, I have a hard time imagining it won't produce any oddities at all. In any case it'd be great to see this going live some day...

right, 100% isn't the goal - just a better fit most of the time. the one concern is that processing grows exponentially with increasing size - ie, double the # of images takes 4 times the cycles. but there are a number of tricks to sidestep that, too

Kerioak~Christine · « **Reply #30 on:** April 21, 2013, 04:17 »

I have been away and not had internet connection for a few days so coming back and finding this new feature is great .

franky242 · « **Reply #31 on:** April 22, 2013, 05:25 »

Quote from: cascoly on April 20, 2013, 14:26

Quote from: franky242 on April 20, 2013, 03:40

It abandons Processing Files somewhere between 200 and 300 files - I receive no more mails after the second one (200 processed) and if I check back (even after 10 hours now) only the last 200 something submitted images display related ones on their image detail page.

Hence I could imagine either we find a way that the script continues to run or it could check maybe upon start which images were processed recently (<24h should be sufficient) and start with the next one. That way I could process all images by calling the script several times - might also be saving resources for other users that upload frequently and do not need to process all images each time?

....

it really does sound like a script that times out, but without an error being displayed

a simple fix would be to process the images from most recently updated (this can be done in the sql select that sets up the processing) then, if the script fails, the next time it would process new items rather than the ones already done

sorry, I've been offline over the weekend... :-)

I tried it again this morning and again, I only received mails for 100 and 200 images being processed. After checking my images online it became obvious that only my latest approx. 260 images feature the "similar images" widget. Hence I guess we need to find a way for a less resource-hungry processing - since in my case (2,800 photos online now) the processing of hundred photos takes about 6 minutes, I figured a complete run (give it would work) would take about 168 minutes (about 3 hours!) - even if the server would allow a script to run for such a long time, I doubt this makes sense every time you upload new images.

Hence the approach of processing images that were not processed yet - that way if you start the script enough times, all images will be processed. Maybe add a second option "process all images" so that from time to time you can build the whole references from scratch?

cascoly · « **Reply #32 on:** April 22, 2013, 17:39 »

Quote from: franky242 on April 22, 2013, 05:25

sorry, I've been offline over the weekend... :-)

.... Hence I guess we need to find a way for a less resource-hungry processing - since in my case (2,800 photos online now) the processing of hundred photos takes about 6 minutes, ...

Hence the approach of processing images that were not processed yet - that way if you start the script enough times, all images will be processed. Maybe add a second option "process all images" so that from time to time you can build the whole references from scratch?

each image should have a 'last updated for similars' datefield, defaulting to 12/12/2012 for new images. then when the update process is called, the function selects the oldest image available, and resets the date when done. each time in, it will process as many as it can

Leo then only needs to add the last update field, and change the SELECT query to use the date.

later refinements could include a report showing how many images might need to be updated, but that could be written by anyone

----------------------------
another approach for large sites would be something I did in online games - use every visitor to do a little bit of processing -- eg, go process the oldest image waiting in line. users are unlikely to notice the time it takes, and a well visited site will always be automatically up to date. (comparing last updated with last uploaded image will tell you if there are ANY images that need yto be updated)

the key is to pick somewhere when the user is busy reading a screen -- eg, when search results are displayed -- the user will normally spend a small bit of time staring at the screen, so we can sneak in some extra calcs

Leo Blanchette · « **Reply #33 on:** April 22, 2013, 18:00 »

Good idea. Wordpress "chron jobs" are actually just like that - they rely on visitors to get checked, and then run. Thats a good idea! I think I can work on that soon. Today I'm working on a search function that checks terms, title, and content, but I should get to that next.

cascoly · « **Reply #34 on:** April 22, 2013, 21:09 »

Quote from: Leo on April 22, 2013, 18:00

Good idea. Wordpress "chron jobs" are actually just like that - they rely on visitors to get checked, and then run. Thats a good idea! I think I can work on that soon. Today I'm working on a search function that checks terms, title, and content, but I should get to that next.

great, i'm still fairly new to what wp encompasses but there do seem to be a lot of useful plugins

MicrostockGroup Sponsors

Author Topic: How to setup new "related images" feature? (Read 16466 times)

Leo Blanchette

Leo Blanchette

Leo Blanchette

Leo Blanchette

Leo Blanchette

Leo Blanchette

Leo Blanchette

Related Topics

Sponsors

Microstock Poll Results

Sponsors