Eliminating duplicates: Simple solution can be found here
Good afternoon,
After having migrated from Aperture to Capture One, I have identified quite a view duplicates in my library. Many of them stem from me taking pictures, sharing them on an Apple photo stream and re-import them later. I knew I have quite a few of them.
I have written a very simple program that looks for pictures taken the same date/time and marks them (inc all variants) with the keyword POTENTIALDUPLICATE (if there is aperture data as well as shutter speed, they are also included in the equation). After running it, I can use a Smart Album to find the respective variants, sort them by date and can start removing. Not fully automated but a start.
If anyone is interested, please leave a message here and I will make it available under the non-liability clause of the MIT license, i.e. "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. Make a backup, run the program, eliminate the duplicates physically (i.e. delete them from disk), load the back up and re-sync to delete the removed duplicates should put you on the safe side of life.
Based on interest, I might start working on a "real duplicate finder" based on existing open source software.
Regards,
Mercator
After having migrated from Aperture to Capture One, I have identified quite a view duplicates in my library. Many of them stem from me taking pictures, sharing them on an Apple photo stream and re-import them later. I knew I have quite a few of them.
I have written a very simple program that looks for pictures taken the same date/time and marks them (inc all variants) with the keyword POTENTIALDUPLICATE (if there is aperture data as well as shutter speed, they are also included in the equation). After running it, I can use a Smart Album to find the respective variants, sort them by date and can start removing. Not fully automated but a start.
If anyone is interested, please leave a message here and I will make it available under the non-liability clause of the MIT license, i.e. "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. Make a backup, run the program, eliminate the duplicates physically (i.e. delete them from disk), load the back up and re-sync to delete the removed duplicates should put you on the safe side of life.
Based on interest, I might start working on a "real duplicate finder" based on existing open source software.
Regards,
Mercator
0
-
Hi Mercator, this is awesome and a great tool for the power user who takes image management serious. 0 -
Hi,
I will post it asap. Works quite well so far.
Cheers,
Mercator0 -
Here we go: The Mac executable can be found here . You have to the program sidufi (simple duplicate finder) from the command line.
To tag: sidufi ADD <file name of the catalog DB, e.g. /Users/XYZ/test.cocatalogdb>
To remove the tags: sidufi REMOVE <file name of the catalog DB, e.g. /Users/XYZ/test.cocatalogdb>
Note: Both file names without the < >
The first command will tag obvious potential duplicates with the keyword POTENTIALDUPLICATE. Create a smart album searching for this keyword and sort by Date. The second command removes this keyword.
Let me know if this is useful. I am in a programming mode at the moment. Is somehow relaxing.
Best,
Mercator0 -
Just to add: Produced perceptual hash finger prints by using the CO thumbnails and scan for the images with a very short Hamming distance (<=2). It is just amazing to see that by just "looking" at the images and not the meta data I can now detect duplicates. Should turn this into a product 😄 0 -
Hi Mercator,
great idea!
I tried it but get an error:
SQL error: no such table: ZIMAGE
May you can help me?
Thanks!
Mike0 -
Hi,
My error handling is not perfect - and it actually did not find your CO database. Use as follows:
To start tagging, you must point sidufi to the CO database, NOT the folder it is located in. Assume you see the CO database in Finder under /Users/XYZ/CapOne, the actual database is "inside" and named as /Users/XYZ/CapOne/CapOne.catalogdb (i.e. the file is named like the Finder directory plus .catalogdb added).
In the example above, use
sidufi ADD /Users/XYZ/CapOne/CapOne.catalogdb
annd it should work fine.
Let me know if that helped. sidufi is really a simple program but it helps finding the most obvious duplicates. In the meantine, I have been working on a version that actually fingerprints the images without looking at the camera data. It already finds all my duplicates that I generated when putting JPGS made from DSLR images on an iOS photo stream, which subsequently get downloaded and imported again. Cool, works already much better that the professional programs I used so far. I just need to work on the speed...
Please, drop me a line so I know sidufi finally worked for you.
Best,
Mercator0 -
Hi Mercator: finding potential duplicates is tedious work, so if you can programme it for us, that will be SUPER GREAT! Thank you for your work. Not confident running things from Terminal, so if you could wrap it up nice, that would be wonderful. I'd pay for it too! Thanks. 0 -
[quote="NNN635300123820341538" wrote:
Hi Mercator: finding potential duplicates is tedious work, so if you can programme it for us, that will be SUPER GREAT! Thank you for your work. Not confident running things from Terminal, so if you could wrap it up nice, that would be wonderful. I'd pay for it too! Thanks.
Hi,
Took me some time... The fingerprinting of the images is working. If I take the Capture One thumbnails as the source for the fingerprint, I can do about 100 fingerprints per second, i.e. about 10 minutes to do my 45'000 image library. There is room for potential if I do not write to the Capture One database but would use a second database. While I am not yet at a point where I can write the info asa albums back to Capture One for your to display, I verified that it works quite well with rotated , scaled and images that have, e.g., curves applied or make B/W. If there are some more people with interest, I might program the rest...
Good night,
Mercator0 -
If you are working with relatively large files the file size is a pretty unique data value for duplicates.
This assumes you are looking for true duplicates rather than variants of an image.
In other words it would work well for finding duplicate copies of raw files but less well for their jpg variants. (For example).
HTH.
Grant0 -
Hi SFA,
Thanks for your mail. It made me check the Thumbnails again... and, they are too small. I did not realise the size depends on the zooming % when creating them. So, that was a good hint.
However, the new stuff I am writing looks at the actual image. Indeed, I wrote it to find JPG copies of my RAW files (I consolidate several iCould streams and often get duplicates as well as copies that had filters applied. It works pretty well with the images on my file system - and now I want to use CO as the interface to effectviely sort out the ones I want to keep/remove.
Regards,
Mercator0
Post is closed for comments.
Comments
10 comments