OpenCl benchmark / OpenCL-Luxmark vs. Direct Compute?
In a former thread (C1 v7), Christian Gruner from PO linked to a Direct Compute (DirectX) benchmark (Passmark software)
Quote from this site:
"Note: OpenCL and Direct compute are different programming interfaces for compiling and running code on video cards. Once the code is running on the card, the performance should be roughly comparable with either interface. So you should be able to use the Direct compute chart above as a reasonable indication of OpenCL performance. "
I have found a link with both benchmarks, Direct Compute (ComputeMark software) and OpenCL (Luxmark software) are shown.
Now, if you compare NVIDIA GeForce GTX 780 Ti with AMD Radeon R9 290X you'll find the following:
- both Direct Compute benchmarks indicate that the NVIDIA is somewhat faster.
- the Luxmark results indicate that the AMD is TWICE as fast
So, it is the opposite!
Luxmark tests OpenCl, and C1 uses the OpenCl interface, but Christian linked to the DIrect Compute benchmark (and they (Passmark) claim it is suitable for OpenCl performance as well), so,
which benchmark is better suited to simulate C1 performance?
If Phase One could state which standard benchmark we should look at, this would be really really great!
It might turn out that one or the other is better suited depending on which C1 task, but this information would also be super super great! ❗️
Thanks and regards
BeO
Quote from this site:
"Note: OpenCL and Direct compute are different programming interfaces for compiling and running code on video cards. Once the code is running on the card, the performance should be roughly comparable with either interface. So you should be able to use the Direct compute chart above as a reasonable indication of OpenCL performance. "
I have found a link with both benchmarks, Direct Compute (ComputeMark software) and OpenCL (Luxmark software) are shown.
Now, if you compare NVIDIA GeForce GTX 780 Ti with AMD Radeon R9 290X you'll find the following:
- both Direct Compute benchmarks indicate that the NVIDIA is somewhat faster.
- the Luxmark results indicate that the AMD is TWICE as fast
So, it is the opposite!
Luxmark tests OpenCl, and C1 uses the OpenCl interface, but Christian linked to the DIrect Compute benchmark (and they (Passmark) claim it is suitable for OpenCl performance as well), so,
which benchmark is better suited to simulate C1 performance?
If Phase One could state which standard benchmark we should look at, this would be really really great!
It might turn out that one or the other is better suited depending on which C1 task, but this information would also be super super great! ❗️
Thanks and regards
BeO
0
-
Try this link: https://compubench.com/result.jsp?bench ... ase=device
Also note that per money spent, AMD/ATI cards seems to be faster in Capture One.0 -
Thanks Christian, very fast reply, btw 😄
Which of the tests to choose from is meaningful for C1? What do you mean by "try", is this benchmark used at Phase One?
Thanks and regards
Beo0 -
[quote="BeO" wrote:
Thanks Christian, very fast reply, btw 😄
Which of the tests to choose from is meaningful for C1? What do you mean by "try", is this benchmark used at Phase One?
Thanks and regards
Beo
The link should lead to the "Video composition" part of the benchmark. This seems to match what we are seeing internally.
We don't use benchmark pages as such, as Capture One provides its own benchmark number in the logs.0 -
Thanks Christian.
So I read you like "the ranking of that certain benchmark is highly correlated to our internal experience / our internal benchmarks".
AMD/ATI cards give more C1 bang for the buck, that's understood. To your internal (and support call) experience, are they equally stable to NVIDIA cards with C1? (In terms of "please set hardware acceleration to Never")
Thanks again,
BeO0 -
[quote="BeO" wrote:
Thanks Christian.
So I read you like "the ranking of that certain benchmark is highly correlated to our internal experience / our internal benchmarks".
AMD/ATI cards give more C1 bang for the buck, that's understood. To your internal (and support call) experience, are they equally stable to NVIDIA cards with C1? (In terms of "please set hardware acceleration to Never")
Thanks again,
BeO
I didn't say "highly correlated", I said "seems to match" 😉
Stability-wise there is no obvious difference. Usually it is also a good idea to keep up to date with the drivers.0 -
Thanks Christian.
Please take it as an idea to publish Capture One's OpenCl benchmark figures in the knowledge base for example.
Btw, are these figures linearly? E.g. 0.2 is twice as fast as 0.4?
Cheers,
BeO0 -
Hi
I don't get the effect op GPU's. When I do test with 9.0.1 while creating local masks (simple brush with zero hardness), the CPU is clearly the bottleneck at 95-100%, my GPU -AMD 7870- is running at 10-25% (and is not getting hotter, so this measurement seems rather correct).
My CPU is an older i5 2500K,but the newest consumer i7 6700K is only between 50% and 100% faster, enthusiast CPU's like the 5820K will be about 100% faster (if the 6 cores are used fully). This gives a big margin on the GPU before it becomes the bottleneck.
kind regards,
Alain0 -
hmm, I assume the OpenCL utilization balances computing power of the CPU and GPU. Maybe it does excatly that, more load on the CPU as this might be the faster device?
The AMD 7870 has 2 GB memory so it should be used by C1 if you have the preference settings right and if the performance of the card is sufficient. Do you know what the C1 benchmark test figure is for your card (and where to find it)?
Edit:
There might possibly also be a dependency on which files (camera) you are using... (see Christians answer)
cheers0 -
[quote="Alain" wrote:
Hi
I don't get the effect op GPU's. When I do test with 9.0.1 while creating local masks (simple brush with zero hardness), the CPU is clearly the bottleneck at 95-100%, my GPU -AMD 7870- is running at 10-25% (and is not getting hotter, so this measurement seems rather correct).
My CPU is an older i5 2500K,but the newest consumer i7 6700K is only between 50% and 100% faster, enthusiast CPU's like the 5820K will be about 100% faster (if the 6 cores are used fully). This gives a big margin on the GPU before it becomes the bottleneck.
kind regards,
Alain
Getting a quicker CPU will help you a good bit (if your disk is fast enough to handle the increase throughput from the GPU, as the CPU will be able to feed it faster)0 -
[quote="BeO" wrote:
hmm, I assume the OpenCL utilization balances computing power of the CPU and GPU. Maybe it does excatly that, more load on the CPU as this might be the faster device?
The AMD 7870 has 2 GB memory so it should be used by C1 if you have the preference settings right and if the performance of the card is sufficient. Do you know what the C1 benchmark test figure is for your card (and where to find it)?
Edit:
There might possibly also be a dependency on which files (camera) you are using... (see Christians answer)
cheers
Nop, CO does not balance it, as the GPU and CPU are good at different things. We utilize that when processing. So if one of the components are not fast enough, it will became a bottleneck.0 -
[quote="Christian Gruner" wrote:
[quote="Alain" wrote:
Hi
I don't get the effect op GPU's. When I do test with 9.0.1 while creating local masks (simple brush with zero hardness), the CPU is clearly the bottleneck at 95-100%, my GPU -AMD 7870- is running at 10-25% (and is not getting hotter, so this measurement seems rather correct).
My CPU is an older i5 2500K,but the newest consumer i7 6700K is only between 50% and 100% faster, enthusiast CPU's like the 5820K will be about 100% faster (if the 6 cores are used fully). This gives a big margin on the GPU before it becomes the bottleneck.
kind regards,
Alain
Getting a quicker CPU will help you a good bit (if your disk is fast enough to handle the increase throughput from the GPU, as the CPU will be able to feed it faster)
Thanks, but I suppose that the disk (for me a SSD) won't change a thing while "brushing 1 image.
I'm still curious what would be the speed difference between the i5 500k and a i76700k.0 -
[quote="Alain" wrote:
[quote="Christian Gruner" wrote:
[quote="Alain" wrote:
Hi
I don't get the effect op GPU's. When I do test with 9.0.1 while creating local masks (simple brush with zero hardness), the CPU is clearly the bottleneck at 95-100%, my GPU -AMD 7870- is running at 10-25% (and is not getting hotter, so this measurement seems rather correct).
My CPU is an older i5 2500K,but the newest consumer i7 6700K is only between 50% and 100% faster, enthusiast CPU's like the 5820K will be about 100% faster (if the 6 cores are used fully). This gives a big margin on the GPU before it becomes the bottleneck.
kind regards,
Alain
Getting a quicker CPU will help you a good bit (if your disk is fast enough to handle the increase throughput from the GPU, as the CPU will be able to feed it faster)
Thanks, but I suppose that the disk (for me a SSD) won't change a thing while "brushing 1 image.
I'm still curious what would be the speed difference between the i5 500k and a i76700k.
Correct, the disk is only really used to max potential during processing.0 -
I have an i7 920 @2,67ghz stock and 18gb of ram with two r9 280x Asus.
On the log I have a 0.081185 y 0.081334 on each gpu. One is @ 1000th clock and 1500mhz memory and the other a@ 1050mhz clock and 1600mhz.
Exporting from an ssd 850 evo to a 840 evo with catalog on the 850 I get cpu activity between 80-100 (average 85) and gpus between 20 to 65 ( average 30).
Exporting 111 NEFS 14 bit lossless compressed of d800 to jpg 300ppp resized to 30cm in widest takes 92 seconds.
I would like to know if it scales well adding more cores ( upgrading to a 6 or 8 core) and how efficient is. (Adhams law)I pwould think of lan upgrade because of ram...Capture one 9 uses more ram than capture one 8.
I got errors of using all ram with only capture one.
I can't link capture ones affinity to only 1 2 3 4 5 6 7 cores instead of all to calculate how efficient capture one is ( with open cl disabled to make calculations comparable). I raised a support case to clarify0 -
[quote="fgimenezc" wrote:
I have an i7 920 @2,67ghz stock and 18gb of ram with two r9 280x Asus.
On the log I have a 0.081185 y 0.081334 on each gpu. One is @ 1000th clock and 1500mhz memory and the other a@ 1050mhz clock and 1600mhz.
Exporting from an ssd 850 evo to a 840 evo with catalog on the 850 I get cpu activity between 80-100 (average 85) and gpus between 20 to 65 ( average 30).
Exporting 111 NEFS 14 bit lossless compressed of d800 to jpg 300ppp resized to 30cm in widest takes 92 seconds.
I would like to know if it scales well adding more cores ( upgrading to a 6 or 8 core) and how efficient is. (Adhams law)I pwould think of lan upgrade because of ram...Capture one 9 uses more ram than capture one 8.
I got errors of using all ram with only capture one.
I can't link capture ones affinity to only 1 2 3 4 5 6 7 cores instead of all to calculate how efficient capture one is ( with open cl disabled to make calculations comparable). I raised a support case to clarify
Hi normaly you can disable hyperthreading inside the bios to see what difference it makes.
A i7 6700k will be a faster cpu, probably about 2times faster. I don't know if that would have a big impact on c1?0
Post is closed for comments.
Comments
14 comments