Performance re. keywording, an in-depth analysis
[Image] <----- [Image/keyword relations] -----> [Keyword]
Keywords are maintained in the [Keyword] table, one row only per keyword. Search for a keyword, rename a keyword etc., only one lookup is necessary (in an index on keywords), i.e. is fast.
In Capture One the database layout re. keywords (simplified) is
[Image] <----- [Metadata, incl. keywords]
In this layout, any keyword is repeated as many times as it is used in images, which may run into thousands of occurrences. Doing as before, search for a keyword etc., each and every metadata record must be checked, i.e. searches are slowed down considerably.
Excacerbating the problem is the fact that keywords are not kept individually in the metadata table, it is impossible by design. Instead, all keywords are concatenated into one string, like
keyword1||keyword2||keyword3||...
Whenever filtering, searching etc., the concatenated string must be parsed, i.e. broken up into the individual keywords for them to be available for evaluation. Such string operations are seriously resource demanding, and if repeated for thousands of records...performance goes down the drain. And indexing the concatenated string cannot help.
Kindly note that I have not had any COP documentation available to me in preparing this analysis, other than the database layout. So, it may not be the gospel truth, but hopefully can still contribute to a better understanding of the relevant design issues.
MLI
----------------------------------------------------------------------------------------------------------------------------------------------------
15. january 2019
I finally got around to checking the numbers re. the viabilty of the two database layouts re. keywording. For each of them I constructed the SQL query required to populate the Library/Filter/Keywords pane. I chose this query because it is very obvious how slowly this pane is populated when you open CO with "All Images" selected; it stutters and hesitates for minutes.
Let us check the numbers:
The common layout: [Image] <----- [Image/keyword relations] -----> [Keyword]
- 10000 images
- 2000 keywords
- 15000 metadata rows
Time to execute query: 40 ms
returning 1900 rows
The CO layout: [Image] <----- [Metadata, incl. keywords]
- 3800 images
- 350 keywords
- 12000 metadata rows
Time to execute query: 1540 ms
returning 99 rows
As you can see, even if the "common layout" database is the larger by a considerable amount, the "CO layout" database query is slower by a factor of 38.5.
I'd suggest that a database redesign is required if one wishes to address the Capture One keywording performance problem.
MLI
-
That explains many things.
There have been other comments in these forums on the poor choice of database indexing in this SW.
Organisations are sometimes quite reistant to change, and then this becomes their achilles heel.0 -
What would be the relative performance effect of writing and maintaining the keywords compared to the benefits of using them in searches?
In a Catalog environment one could always keep an index of the relationship (by variant), assuming nothing will have changed when opening the catalogue, and do some checks in the background during use.
This approach may not be so useful with session architectures depending in how the session is being managed over time. (For example whether the user sees any benefits in using the "favourites" concept or the potential benefits of using multiple sessions.)0 -
Eric Nepean wrote:
There have been other comments in these forums on the poor choice of database indexing in this SW.
And on the issues with the database design. If only better indexing could salvage a poor design, but it can't.
Interestingly, it seems the catalog comprises a list of entities, some 50 of them. One would then expect the number of tables to be approximately the same....but the number is only 18.
MLI0 -
I agree, as an ex-Oracle consultant, the database design is, interesting! I had a look when I lost a load of folders from the folder tool.
It may explain why I keep losing folders from the folder tool when I delete a folder in C1 and in Finder, others will randomly disappear from the folder tool in C1. Can just add them again, but this has happened to me in C1 11 and again in C1 12. All images deleted first and from catalog trash. Makes no difference if folder deleted in C1 first then finder or vice versa. Somewhat puzzled why C1 doesn't delete the folder in the filesystem too, inconsistent with how it handles images, renames, moves etc which are supposed to be done inside C1.
I guess that something goes wrong during the import process that creates folders and adds each folder to the database. Instead of my import creating a YYYY/MM/DD structure I've just gone to YYYY/MM. That way less likely to go wrong, easier to spot when it does and quicker to fix. Luckily when it loses folders, the collections are correct so All Images contains the images, the edits are intact and they can be found via metadata, but not in the folders tool.0 -
Wow! Thank you.
I knew there was a database problem, but I had not looked in to it. It is clear these fellows are all about adjustments, and they do a very fine job of it even though I am not at a point where I can fully appreciate it. Although I am not a professional, I am planning to use a session for an upcoming vacation trip on my Surface Go. Then I will import the images to my catalog when I get home.
I moved to CO because of Media Pro, and my stronger need for tagging and finding images (of which have thousands, many scanned film as JPEGS.) I never warmed up to Light Room's way of doing it. Since I had Media Pro, I did not spend much time on the cataloging functions in LR.
For the record, I have had several crashes that appear to have been related to tagging and moving images around. The system often freezes, or crashes without a hint before the Phase One crash dialogue box comes up. So far, CO has been able to fix the database itself, but am a bit fearful at this point. Phooey!
Thanks, again for your insight.0 -
Eric Nepean wrote: Organisations are sometimes quite resistant to change, and then this becomes their achilles heel.
I've worked in IT for over 30 years. There usually is a budget for a project. If some wrong decisions were made during the project, it is difficult to find additional funds to spend time properly correcting the problem.
Phase One is receiving money every time someone purchases or upgrades the software. They should have the budget to fix simple problems like the one discussed in this thread.
Now that Media Pro is dead, what will all the loyal customers do?
In another thread I read a comment where a forum member was wondering why Apple Aperture users keep talking about Aperture. It appears to me that a four-year-old dead-in-the-water software still has the best DAM available to MacOS users.
I've hesitated to switch to CO since v9. I've purchased v11 but still haven't migrated. Then v12 comes along, promising performance improvements, which ends up being a minor improvement, but having the same cost as new software.0 -
Holy shit, the fact they're parsing concatenated strings...per image...
It's one thing to blame the hardware on poor performance. It's another to do so whilst neglecting your code.0 -
Waouh. That does not inspire confidence in their strategy.
As anyone any clue on PhaseOne working to optimize the DAM aspect of CaptureOne ?
I would believe that on a consumer market (as opposed to professional) the DAM aspect is very important to maintain consumer loyalty, and so recurring revenues.0 -
We keep getting reminded here that this forum is user to user, so I suppose Capture One employees never read anything posted to the forum, and will never reply to counter the criticisms. 0 -
I've downloaded at least 7 trial versions of different DAM software app's and nothing handles Keywords like iView to the point I'm still using it. Capture One doesn't show keyword in list view let alone sort by column title. 0 -
MediaPro is dead. Apple Aperture is dead. Has anyone tried NeoFinder? How does an external DAM fit in a workflow with Capture One Pro? 0 -
Francesco Mariani wrote:
MediaPro is dead. Apple Aperture is dead. Has anyone tried NeoFinder? How does an external DAM fit in a workflow with Capture One Pro?
I got really excited reading about the product until I got to the specs. Somehow I thought NeoFinder supported Windows.
☹️0 -
Please take discussions re. DAMs/Neofinder to another topic.
The subject here is keywording performance in Capture One.
MLi0 -
I am resurrecting this old thread. It is still a valid topic that needs addressing. Here is one interesting thing I discovered as far back as CO 9 or 10.
If you select say 2000 images and add a keyword by typing it into Keyword Tool it takes a long, long time. If instead you drag those 2000 images onto a keyword in the Filters tool it takes a fraction of the time. Why the difference in behavior?
0 -
It is nothing like you describe. The SOOO LONG time is taken AFTER I press enter, not while typing. The suggested keywords come quickly while typing. Once I press enter for just a single keyword, it takes an unreasonably long time via Keyword tool vs drag-n-drop onto the Filter tool (I mean several minutes vs at most seconds). It clearly demonstrates that the tools are not using the same logic and code to achieve the same task.
0
Please sign in to leave a comment.
Comments
15 comments