Hey folks,
Just wanted to share my experience with a little tool I've been tinkering with for image tagging using Ollama locally, been at it for over a year now.
Getting the model and prompts right actually matters a lot. At first I kept getting echo replies (where the AI just repeats your question back) or totally off tags.
That's why I ended up adding a bunch of cleanup options to filter the garbage out, like setting up blocklists for unwanted words and phrases.
After trying different setups, I settled on three separate queries: one for the title, one for the description, and one for keywords.
Works more reliably than cramming everything into a single prompt.
Also, I don't send full-size images to Ollama - just a downscaled thumbnail. Speeds things up noticeably without hurting quality much.
Right now I'm running it on a 3060 (12GB) with gemma3:12b.
I added a simple hint system using variables in prompts. Partly to steer the model toward what actually matters in the frame - but more importantly, to inject location context.
My main goal was getting editorial captions.
Location data can come from EXIF or from a personal database that you create and fill. You can feed that DB from:
Phone photos (grabbing their GPS data)
Google Takeout location history (JSON)
Android location log exports
Or just scan your own photo library for captions like "Milan, Italy – April 07, 2018"
It's not hyper-precise,stores locations with at least a 1-hour gap, so occasional mismatches happen.
I tweak the prompt templates themselves with AI help (tried Qwen and DeepSeek). Show them a bad output, describe what's wrong, and iterate. Works surprisingly well.
Everything runs locally except reverse geocoding (turning coordinates into place names).
You'll need a decent GPU and Ollama with a capable model.
An SSD helps a lot too, especially when importing locations, the JSON files can contain tens of thousands of entries, and inserting them all is much faster on SSD.
Even with all this, I still review and tweak the results manually.
Automation speeds things up, but doesn't replace eyeballs.
The CSV export uses templates that allow you to change file extensions in the output, like replacing .jpg with .mov for video entries.
Video sort of works too if you feed it a keyframe plus a hint, but results are less consistent.
No proper docs yet, but most UI elements have tooltips. I can record a quick screencast or write up a short guide if anyone's interested.
Important: it's still early testing. Always work on copies of your files- stuff can go sideways.
TagFlux a local Ollama client for auto-tagging images and writing metadata
https://meshtonic.ru/en/extras/#tagflux-a-local-ollama-client-for-ai-powered-automated-image-metadata-tagging
screen
Just wanted to share my experience with a little tool I've been tinkering with for image tagging using Ollama locally, been at it for over a year now.
Getting the model and prompts right actually matters a lot. At first I kept getting echo replies (where the AI just repeats your question back) or totally off tags.
That's why I ended up adding a bunch of cleanup options to filter the garbage out, like setting up blocklists for unwanted words and phrases.
After trying different setups, I settled on three separate queries: one for the title, one for the description, and one for keywords.
Works more reliably than cramming everything into a single prompt.
Also, I don't send full-size images to Ollama - just a downscaled thumbnail. Speeds things up noticeably without hurting quality much.
Right now I'm running it on a 3060 (12GB) with gemma3:12b.
I added a simple hint system using variables in prompts. Partly to steer the model toward what actually matters in the frame - but more importantly, to inject location context.
My main goal was getting editorial captions.
Location data can come from EXIF or from a personal database that you create and fill. You can feed that DB from:
Phone photos (grabbing their GPS data)
Google Takeout location history (JSON)
Android location log exports
Or just scan your own photo library for captions like "Milan, Italy – April 07, 2018"
It's not hyper-precise,stores locations with at least a 1-hour gap, so occasional mismatches happen.
I tweak the prompt templates themselves with AI help (tried Qwen and DeepSeek). Show them a bad output, describe what's wrong, and iterate. Works surprisingly well.
Everything runs locally except reverse geocoding (turning coordinates into place names).
You'll need a decent GPU and Ollama with a capable model.
An SSD helps a lot too, especially when importing locations, the JSON files can contain tens of thousands of entries, and inserting them all is much faster on SSD.
Even with all this, I still review and tweak the results manually.
Automation speeds things up, but doesn't replace eyeballs.
The CSV export uses templates that allow you to change file extensions in the output, like replacing .jpg with .mov for video entries.
Video sort of works too if you feed it a keyframe plus a hint, but results are less consistent.
No proper docs yet, but most UI elements have tooltips. I can record a quick screencast or write up a short guide if anyone's interested.
Important: it's still early testing. Always work on copies of your files- stuff can go sideways.
TagFlux a local Ollama client for auto-tagging images and writing metadata
https://meshtonic.ru/en/extras/#tagflux-a-local-ollama-client-for-ai-powered-automated-image-metadata-tagging
screen
