MicrostockGroup
Microstock Photography Forum - General => Symbiostock => Symbiostock - Technical Support => Topic started by: Chico on April 16, 2013, 07:03
-
ok, i run "related images process". Was fast and no errors. Mail was received.
So i go to widgets area and i don't see any "related images" widget. Where is my fault? And where is the right place to put widget ?
Some good soul can help?
Running last update, 1.3.1.
THX
-
It is called there "Similar images". Drag it to "Image page bottom" and save.
-
It is called there "Similar images". Drag it to "Image page bottom" and save.
Thanks a lot. Works fine!
-
I did some uploads few minutes ago and the new images doesn't have "similar files" displayed.
I guess I have to run ""related images process" after each upload, right? Anyway to turn it permanent?
-
This is a fantastic addition!
A question though - the choice of similar images sometimes comes up with some strange similars. When I search google for my "bengal cat licking lips" the first image in my similars is one of a pair of hands isolated against white, followed by the expected series of cat images. I may be rubbish at keywording, but I don't think there is a great deal of similarity between those sets of keywords. There will definitely be some similars - isolated, white background, pair, etc. but others are much closer.
How is the order established, and is there something we can do to impact the similar search as we hone our skills?
Steve
Edit - did another test with "Solid gold coins treasure chest" - the similars appear to be ordered in the wrong way - the closest match are at the bottom of the sidebar rather than at the top? Is it as simple as that?
-
One question: The generation of related images for me always stops at around 200 images - I gave it several tries and each time receive the confirmation mails regarding 100 and 200 images being processed but thatīs it then, no matter how long I wait. Site: franky242.net/shop/ - about 2.800 photos available. Has anyone being able to process more than 200 images? Memory should not be a problem since I have plenty allocated (bluehost) and everything else works fine, even processing large amounts of uploaded images.
Cheers, Frank
-
So just to confirm - it does complete the task, but it stops emailing you after so many?
Its possible your server takes methods to avoid spam.
-
I'm at 342 images now and I get 4 emails - 100, 200, 300 and 342. So it works for me, so far!
steve
-
nope, it does NOT complete, the wheel keeps on turning forever (I did wait for about an hour each time) and if I check which images feature related images, they end up each time about the last 200 ones uploaded, older ones do not have the related images module on their page.
So just to confirm - it does complete the task, but it stops emailing you after so many?
Its possible your server takes methods to avoid spam.
-
Looking at the email timestamps, it takes 4 minutes to do 340 images and I normally have around 45 keywords per image.
Steve
-
Just run "update related images" with 91 files online and mail arrives ok.
-
Here's a little tidbit for you -
The more image you have the more time it will take per image.
For instance, if you have 1000 images, each of those 1000 images are checked against those 1000 images.
If you have 3000 images, each of those 3000 images are checked against 3000 images.
See a pattern? It multiplies your time.
-
hm, timestamp between my mails of 100 and 200 images is 4 mins, I also have about 40-50 keywords. Seems that my server is much slower then, although also bluehost. Might this be part of the problem? Is the script simply taking too long to process more of my images?
Looking at the email timestamps, it takes 4 minutes to do 340 images and I normally have around 45 keywords per image.
Steve
-
Ah, this might be the difference to Steve: With my 2700 images available, I guess this might be the problem. And now?
Here's a little tidbit for you -
The more image you have the more time it will take per image.
For instance, if you have 1000 images, each of those 1000 images are checked against those 1000 images.
If you have 3000 images, each of those 3000 images are checked against 3000 images.
See a pattern? It multiplies your time.
-
I might be confused. Did you say it abandons the files after a while - or it just stops sending emails? ie, after a certain point files do not get updated.
-
my 275 images run to completion
i'd doubt there's a spam filter at so few emails -- i'd send 20+ emails to myself within a minute or 2 when some of my online games were updating & I was debugging
a more likely possibility is it's running out of time; not sure how WP reports such crashes - my ASP pages break and then show whatever part of the page had been generated, and then show an error message saying script timed out (these pages with script errors also show up in google webmaster tools as 404 errors)
-
not sure how the similar is supposed to work, so not sure if this is a bug or a feature:
I start with
http://cascoly-images.com/pix/image/a-group-of-skiers-have-a-leisurely-lunch-outdoors-2/ (http://cascoly-images.com/pix/image/a-group-of-skiers-have-a-leisurely-lunch-outdoors-2/)
but none of the similars show other 'skier' images that I get if I search for
http://cascoly-images.com/pix/search-images/skier/ (http://cascoly-images.com/pix/search-images/skier/)
I then choose
http://cascoly-images.com/pix/image/skiers-prepare-for-their-next-run/ (http://cascoly-images.com/pix/image/skiers-prepare-for-their-next-run/)
and one of the similars is
http://cascoly-images.com/pix/image/antique-map-of-merovingian-france/ (http://cascoly-images.com/pix/image/antique-map-of-merovingian-france/)
but THAT image doesn't show any similars that are not also maps even though there are 45 France images
==========================
are there weights assigned to the detection of similars based on keywords?
does keyword position matter?
added next day: I checked the code & appears answer is no to both of these - explained more below
-
I might be confused. Did you say it abandons the files after a while - or it just stops sending emails? ie, after a certain point files do not get updated.
It abandons Processing Files somewhere between 200 and 300 files - I receive no more mails after the second one (200 processed) and if I check back (even after 10 hours now) only the last 200 something submitted images display related ones on their image detail page.
Hence I could imagine either we find a way that the script continues to run or it could check maybe upon start which images were processed recently (<24h should be sufficient) and start with the next one. That way I could process all images by calling the script several times - might also be saving resources for other users that upload frequently and do not need to process all images each time?
Sorry for not being more precise before, English is not my native language! :-)
-
I could have never guessed you were speaking english as a second language. I'm just now learning one myself - but half the world does this as a requirement
I *think* there are one of two issues;
I have the script time set to infinity. But your server might not be allowing that. If you turn on error reporting in your functions.php (I'd have to insert the code again) I'm pretty sure it would say "memory" or "time"error on the page, assuming you don't leave it.
Remind me again what your hosting is? I'll check it tomorrow afternoon. Next week I'm looking forward to some big improvements.
-
It abandons Processing Files somewhere between 200 and 300 files - I receive no more mails after the second one (200 processed) and if I check back (even after 10 hours now) only the last 200 something submitted images display related ones on their image detail page.
Hence I could imagine either we find a way that the script continues to run or it could check maybe upon start which images were processed recently (<24h should be sufficient) and start with the next one. That way I could process all images by calling the script several times - might also be saving resources for other users that upload frequently and do not need to process all images each time?
....
it really does sound like a script that times out, but without an error being displayed
a simple fix would be to process the images from most recently updated (this can be done in the sql select that sets up the processing) then, if the script fails, the next time it would process new items rather than the ones already done
-
This is a fantastic addition!
A question though - the choice of similar images sometimes comes up with some strange similars. When I search google for my "bengal cat licking lips" the first image in my similars is one of a pair of hands isolated against white, followed by the expected series of cat images. I may be rubbish at keywording, but I don't think there is a great deal of similarity between those sets of keywords. There will definitely be some similars - isolated, white background, pair, etc. but others are much closer.
How is the order established, and is there something we can do to impact the similar search as we hone our skills?
right now, it appears the matches are performed by giving 1 point for every keyword in common, then displaying the X highest scores; this distorts the results by unfairly weighting common keywords -- keywording style plays in here: if you use many, vague keywords you'll get less exact results than if you use fewer, specific ones
common keywords like 'seattle' or 'blue' are less useful in matching than uncommon ones like 'totem' or 'tiger', so a better result would be to weight the matches, giving less relevance to common keyword. shouldn't take Leo more than a day or 2 to do this!
-
I could have never guessed you were speaking english as a second language. I'm just now learning one myself - but half the world does this as a requirement
I *think* there are one of two issues;
I have the script time set to infinity. But your server might not be allowing that. If you turn on error reporting in your functions.php (I'd have to insert the code again) I'm pretty sure it would say "memory" or "time"error on the page, assuming you don't leave it.
Remind me again what your hosting is? I'll check it tomorrow afternoon. Next week I'm looking forward to some big improvements.
most servers will not allow infinity, since it's to easy to miscode an infinite loop, so even if you set it in code, it'll be over-ridden. when I have processes like this, I set the timeout to a larger number at the start, then reset it after the process finishes to keep the host happy
-
This is a fantastic addition!
A question though - the choice of similar images sometimes comes up with some strange similars. When I search google for my "bengal cat licking lips" the first image in my similars is one of a pair of hands isolated against white, followed by the expected series of cat images. I may be rubbish at keywording, but I don't think there is a great deal of similarity between those sets of keywords. There will definitely be some similars - isolated, white background, pair, etc. but others are much closer.
How is the order established, and is there something we can do to impact the similar search as we hone our skills?
right now, it appears the matches are performed by giving 1 point for every keyword in common, then displaying the X highest scores; this distorts the results by unfairly weighting common keywords -- keywording style plays in here: if you use many, vague keywords you'll get less exact results than if you use fewer, specific ones
common keywords like 'seattle' or 'blue' are less useful in matching than uncommon ones like 'totem' or 'tiger', so a better result would be to weight the matches, giving less relevance to common keyword. shouldn't take Leo more than a day or 2 to do this!
It might be actually easier to realize than it sounds. Some sort of keyword statistic must be already available for the tag cloud widget. So maybe this can be tapped for refining selection of similars...
-
It might be actually easier to realize than it sounds. Some sort of keyword statistic must be already available for the tag cloud widget. So maybe this can be tapped for refining selection of similars...
right, the count for each keyword is stored, so the weighting could be done as a normalization from 1 to 100 -- if the most keywords is 237, then each weighted keyword would be 101- 100([count]/237) rounded down
237 --> 1
155 --> 35
1
-
It might be actually easier to realize than it sounds. Some sort of keyword statistic must be already available for the tag cloud widget. So maybe this can be tapped for refining selection of similars...
right, the count for each keyword is stored, so the weighting could be done as a normalization from 1 to 100 -- if the most keywords is 237, then each weighted keyword would be 101- 100([count]/237) rounded down
237 --> 1
155 --> 35
1
I'd interested to see what similars your formula would dig up as opposed to the existing related images widget.
-
I'd interested to see what similars your formula would dig up as opposed to the existing related images widget.
ok -- check out http://cascoly.com/symbiostock-related-search.asp (http://cascoly.com/symbiostock-related-search.asp)
I was curious too, so I set up a little system in excel that let me calculate the similarity tables for the 2 approaches. happily my prediction seems to work, at least at this level. I purposely set it up so there'd be ambiguities like 'leeks from france' which would be selected by the simple algorithm for the image "skiing France", but is not selected by the weighted approach. plus, the weighted model should perform even better in a larger database -- the main problem will be creating and calculating the matrices, but that might be incorporated in the process Leo uses now to set everything up
-
You guys been holding out on me! Looks like we've got some great techs on this project. Ill be on the big issues Monday
-
I'd interested to see what similars your formula would dig up as opposed to the existing related images widget.
ok -- check out [url]http://cascoly.com/symbiostock-related-search.asp[/url] ([url]http://cascoly.com/symbiostock-related-search.asp[/url])
I was curious too, so I set up a little system in excel that let me calculate the similarity tables for the 2 approaches. happily my prediction seems to work, at least at this level. I purposely set it up so there'd be ambiguities like 'leeks from france' which would be selected by the simple algorithm for the image "skiing France", but is not selected by the weighted approach. plus, the weighted model should perform even better in a larger database -- the main problem will be creating and calculating the matrices, but that might be incorporated in the process Leo uses now to set everything up
Wow! You put quite an effort into this. I agree, at this level your approach seems to work. I also think this should work even better in a larger database. Still, I have a hard time imagining it won't produce any oddities at all. In any case it'd be great to see this going live some day...
-
http://cascoly.com/symbiostock-related-search.asp (http://cascoly.com/symbiostock-related-search.asp)
WOA
BTW - I've credited the source a few times, but the related images comes from here:
http://wordpress.org/support/topic/custom-query-related-posts-by-common-tag-amount?replies=8 (http://wordpress.org/support/topic/custom-query-related-posts-by-common-tag-amount?replies=8)
I simply had to modify it slightly to work with the 'image' custom post type.
-
Wow! You put quite an effort into this. I agree, at this level your approach seems to work. I also think this should work even better in a larger database. Still, I have a hard time imagining it won't produce any oddities at all. In any case it'd be great to see this going live some day...
right, 100% isn't the goal - just a better fit most of the time. the one concern is that processing grows exponentially with increasing size - ie, double the # of images takes 4 times the cycles. but there are a number of tricks to sidestep that, too
-
I have been away and not had internet connection for a few days so coming back and finding this new feature is great .
-
It abandons Processing Files somewhere between 200 and 300 files - I receive no more mails after the second one (200 processed) and if I check back (even after 10 hours now) only the last 200 something submitted images display related ones on their image detail page.
Hence I could imagine either we find a way that the script continues to run or it could check maybe upon start which images were processed recently (<24h should be sufficient) and start with the next one. That way I could process all images by calling the script several times - might also be saving resources for other users that upload frequently and do not need to process all images each time?
....
it really does sound like a script that times out, but without an error being displayed
a simple fix would be to process the images from most recently updated (this can be done in the sql select that sets up the processing) then, if the script fails, the next time it would process new items rather than the ones already done
sorry, I've been offline over the weekend... :-)
I tried it again this morning and again, I only received mails for 100 and 200 images being processed. After checking my images online it became obvious that only my latest approx. 260 images feature the "similar images" widget. Hence I guess we need to find a way for a less resource-hungry processing - since in my case (2,800 photos online now) the processing of hundred photos takes about 6 minutes, I figured a complete run (give it would work) would take about 168 minutes (about 3 hours!) - even if the server would allow a script to run for such a long time, I doubt this makes sense every time you upload new images.
Hence the approach of processing images that were not processed yet - that way if you start the script enough times, all images will be processed. Maybe add a second option "process all images" so that from time to time you can build the whole references from scratch?
-
sorry, I've been offline over the weekend... :-)
.... Hence I guess we need to find a way for a less resource-hungry processing - since in my case (2,800 photos online now) the processing of hundred photos takes about 6 minutes, ...
Hence the approach of processing images that were not processed yet - that way if you start the script enough times, all images will be processed. Maybe add a second option "process all images" so that from time to time you can build the whole references from scratch?
each image should have a 'last updated for similars' datefield, defaulting to 12/12/2012 for new images. then when the update process is called, the function selects the oldest image available, and resets the date when done. each time in, it will process as many as it can
Leo then only needs to add the last update field, and change the SELECT query to use the date.
later refinements could include a report showing how many images might need to be updated, but that could be written by anyone
----------------------------
another approach for large sites would be something I did in online games - use every visitor to do a little bit of processing -- eg, go process the oldest image waiting in line. users are unlikely to notice the time it takes, and a well visited site will always be automatically up to date. (comparing last updated with last uploaded image will tell you if there are ANY images that need yto be updated)
the key is to pick somewhere when the user is busy reading a screen -- eg, when search results are displayed -- the user will normally spend a small bit of time staring at the screen, so we can sneak in some extra calcs
-
Good idea. Wordpress "chron jobs" are actually just like that - they rely on visitors to get checked, and then run. Thats a good idea! I think I can work on that soon. Today I'm working on a search function that checks terms, title, and content, but I should get to that next.
-
Good idea. Wordpress "chron jobs" are actually just like that - they rely on visitors to get checked, and then run. Thats a good idea! I think I can work on that soon. Today I'm working on a search function that checks terms, title, and content, but I should get to that next.
great, i'm still fairly new to what wp encompasses but there do seem to be a lot of useful plugins