MicrostockGroup Sponsors

How to setup new "related images" feature?

Started by Chico, April 16, 2013, 13:03

Previous topic - Next topic

cascoly

#25
Quote from: Pilens on April 20, 2013, 22:47


I'd interested to see what similars your formula would dig up as opposed to the existing related images widget.

ok -- check out http://cascoly.com/symbiostock-related-search.asp

I was curious too, so I set up a little system in excel that let me calculate the similarity tables for the 2 approaches.  happily my prediction seems to work, at least at this level.  I purposely set it up so there'd be ambiguities like 'leeks from france'  which would be selected by the simple algorithm for the image "skiing France", but is not selected by the weighted approach.  plus, the weighted model should perform even better in a larger database -- the main problem  will be creating and calculating the matrices, but that might be incorporated in the process Leo uses now to set everything up
Steve Estvanik 
travel & photo blog https://cascoly-images.com

Leo Blanchette

You guys been holding out on me! Looks like we've got some great techs on this project. Ill be on the big issues Monday

Pilens

Quote from: cascoly on April 20, 2013, 23:46
Quote from: Pilens on April 20, 2013, 22:47


I'd interested to see what similars your formula would dig up as opposed to the existing related images widget.

ok -- check out http://cascoly.com/symbiostock-related-search.asp

I was curious too, so I set up a little system in excel that let me calculate the similarity tables for the 2 approaches.  happily my prediction seems to work, at least at this level.  I purposely set it up so there'd be ambiguities like 'leeks from france'  which would be selected by the simple algorithm for the image "skiing France", but is not selected by the weighted approach.  plus, the weighted model should perform even better in a larger database -- the main problem  will be creating and calculating the matrices, but that might be incorporated in the process Leo uses now to set everything up

Wow! You put quite an effort into this. I agree, at this level your approach seems to work. I also think this should work even better in a larger database. Still, I have a hard time imagining it won't produce any oddities at all. In any case it'd be great to see this going live some day...

Leo Blanchette

http://cascoly.com/symbiostock-related-search.asp
WOA

BTW - I've credited the source a few times, but the related images  comes from here:

http://wordpress.org/support/topic/custom-query-related-posts-by-common-tag-amount?replies=8

I simply had to modify it slightly to work with the 'image' custom post type.

cascoly

Quote from: Pilens on April 21, 2013, 07:37


Wow! You put quite an effort into this. I agree, at this level your approach seems to work. I also think this should work even better in a larger database. Still, I have a hard time imagining it won't produce any oddities at all. In any case it'd be great to see this going live some day...

right, 100% isn't the goal - just a better fit most of the time.  the one concern is that processing grows exponentially with increasing size - ie, double the # of images takes 4 times the cycles.  but there are a number of tricks to sidestep that, too

Steve Estvanik 
travel & photo blog https://cascoly-images.com

Kerioak~Christine

I have been away and not had internet connection for a few days so coming back and finding this new feature is great .


franky242

Quote from: cascoly on April 20, 2013, 20:26
Quote from: franky242 on April 20, 2013, 09:40


It abandons Processing Files somewhere between 200 and 300 files - I receive no more mails after the second one (200 processed) and if I check back (even after 10 hours now) only the last 200 something submitted images display related ones on their image detail page.

Hence I could imagine either we find a way that the script continues to run or it could check maybe upon start which images were processed recently (<24h should be sufficient) and start with the next one. That way I could process all images by calling the script several times - might also be saving resources for other users that upload frequently and do not need to process all images each time?

....

it really does sound like a script that times out, but without an error being displayed

a simple fix would be to process the images from most recently updated (this can be done in the sql select that sets up the processing)  then, if the script fails, the next time it would process new items rather than the ones already done

sorry, I've been offline over the weekend... :-)

I tried it again this morning and again, I only received mails for 100 and 200 images being processed. After checking my images online it became obvious that only my latest approx. 260 images feature the "similar images" widget. Hence I guess we need to find a way for a less resource-hungry processing - since in my case (2,800 photos online now) the processing of hundred photos takes about 6 minutes, I figured a complete run (give it would work) would take about 168 minutes (about 3 hours!) - even if the server would allow a script to run for such a long time, I doubt this makes sense every time you upload new images.

Hence the approach of processing images that were not processed yet - that way if you start the script enough times, all images will be processed. Maybe add a second option "process all images" so that from time to time you can build the whole references from scratch?


cascoly

Quote from: franky242 on April 22, 2013, 11:25

sorry, I've been offline over the weekend... :-)

.... Hence I guess we need to find a way for a less resource-hungry processing - since in my case (2,800 photos online now) the processing of hundred photos takes about 6 minutes, ...

Hence the approach of processing images that were not processed yet - that way if you start the script enough times, all images will be processed. Maybe add a second option "process all images" so that from time to time you can build the whole references from scratch?

each image should have a 'last updated for similars' datefield, defaulting to 12/12/2012 for new images.  then when the update process is called, the function selects the oldest image available, and resets the date when done.  each time in, it will process as many as it can

Leo then only needs to add the last update field, and change the SELECT query to use the date.

later refinements could include a report showing how many images might need to be updated, but that could be written by anyone

----------------------------
another approach for large sites would be something I did in online games - use every visitor to do a little bit of processing -- eg,  go process the oldest image waiting in line.  users are unlikely to notice the time it takes, and a well visited site will always be automatically up to date.  (comparing last updated with last uploaded image will tell you if there are ANY images that need yto be updated)

the key is to pick somewhere when the user is busy reading a screen -- eg, when search results are displayed -- the user will normally spend a small bit of time staring at the screen, so we can sneak in some extra calcs
Steve Estvanik 
travel & photo blog https://cascoly-images.com

Leo Blanchette

Good idea. Wordpress "chron jobs" are actually just like that - they rely on visitors to get checked, and then run. Thats a good idea! I think I can work on that soon. Today I'm working on a search function that checks terms, title, and content, but I should get to that next.

cascoly

Quote from: Leo on April 23, 2013, 00:00
Good idea. Wordpress "chron jobs" are actually just like that - they rely on visitors to get checked, and then run. Thats a good idea! I think I can work on that soon. Today I'm working on a search function that checks terms, title, and content, but I should get to that next.

great, i'm still fairly new to what wp encompasses but there do seem to be a lot of useful plugins
Steve Estvanik 
travel & photo blog https://cascoly-images.com