⚠️ Please note that this topic or post has been archived. The information contained here may no longer be accurate or up-to-date. ⚠️

C1 Pro 11, Intel 8th Gen and UHD 620 not working together

Philippe GAUDENS

July 07, 2018 21:36

[/size]Hello,

I just got a new laptop with Intel 8th Core i5 with its integrated UHD 620 GPU...

I just made export testings and noticed that when GPU acceleration is used for pictures export, I only see the GPU 100% loaded into the Windows 10 task manager (with only 20% CPU loaded), while when I disable GPU acceleration it is obviously the CPU that is 100% loaded...

On my home desktop, I have a bi-XEON 12 cores and a nVidia Titan Black (GTX 780 generation), and both are 100% loaded when I export my RAWs... I mean all my CPU cores AND my GPU are 100% loaded... (90% CPU and 70% GPU in reality). I only need 2s to export a 50mp RAW, and I LOOOVE that !

Why C1 Pro 11 is not able to load my Core i5 8th gen CPU AND its UHD 620 GPU in parallel ?

My benchmarks gave me about 8s to export a 50mp RAW from my Canon 5DsR for both: GPU alone, and CPU alone...

It would be a big benefit if both could work in parallel.

Here is a screenshot:

Regards,
Philippe.

Comments

18 comments

Wesley

July 08, 2018 04:23
Perhaps the integrated GPU isn't fast enough to keep up with the CPU so it can't pass data at a equal pace.

I seen mods say task manager is not accurate for GPU usage. I think they said to use GPU-Z or something,
0
Christian Gruner

July 08, 2018 08:12
The internal GPU is (ratio wise for the filetypes you are processing) slower than your cpu, compared to your other system. That means that the bottleneck is moved to your GPU, instead being at a somewhat equilibrium on your desktop machine.

Where the bottleneck is depends also on what kinda files are processed. Many 20 mp files will saturate the cpu more, as more files have to be loaded. But loading equal amount of megapixels from 100 mp files, the load on the cpu is less, as fewer files have to be loaded compared to how much the GPU has to do.

When judging GPU load, the graph in Windows task manager isnâ€™t optimal. It besically doesnâ€™t show OpenCL computations, which CO uses.
0
Stefan Hoffmeister

July 08, 2018 19:12
[quote="Christian Gruner" wrote:
The internal GPU is (ratio wise for the filetypes you are processing) slower than your cpu, compared to your other system. That means that the bottleneck is moved to your GPU, instead being at a somewhat equilibrium on your desktop machine.

I may be misunderstanding the above, but I would hope that C1 schedules all (batch) tasks across *all* compute resources *available*, CPU or GPU or whatever?

Examples:

For batch importing files, I would expect the CPU to collect whatever needs to be imported, with specific import activity scheduled across all compute resources (GPU and CPU) in parallel - the scheduler wouldn't even be that complex, under the assumption that GPU and CPU are both available to C1.

For batch exporting files, the actual export processing would be scheduled across all compute resources, too - e.g. 10 files on the GPU, 2 files on the CPU. After all, the export result has been QA'ed to be binary identical for CPU and GPU, right?

Even with the most naive scheduler, this is superior as long as the CPU is no less than 50% slower than the GPU (and batch size N > 1).

I do appreciate that for a *single* activity (responsiveness to user input), the optimal choice is to pick the one compute resource which is expected to be fastest.
0
Christian Gruner

July 09, 2018 13:15
[quote="daffy" wrote:
[quote="Christian Gruner" wrote:
The internal GPU is (ratio wise for the filetypes you are processing) slower than your cpu, compared to your other system. That means that the bottleneck is moved to your GPU, instead being at a somewhat equilibrium on your desktop machine.

I may be misunderstanding the above, but I would hope that C1 schedules all (batch) tasks across *all* compute resources *available*, CPU or GPU or whatever?

Examples:

For batch importing files, I would expect the CPU to collect whatever needs to be imported, with specific import activity scheduled across all compute resources (GPU and CPU) in parallel - the scheduler wouldn't even be that complex, under the assumption that GPU and CPU are both available to C1.

For batch exporting files, the actual export processing would be scheduled across all compute resources, too - e.g. 10 files on the GPU, 2 files on the CPU. After all, the export result has been QA'ed to be binary identical for CPU and GPU, right?

Even with the most naive scheduler, this is superior as long as the CPU is no less than 50% slower than the GPU (and batch size N > 1).

I do appreciate that for a *single* activity (responsiveness to user input), the optimal choice is to pick the one compute resource which is expected to be fastest.

It is not that simple 😉

Some operations are faster on CPU, some faster on GPU.
In the above case, we could spend CPU resources on processing, but likely that would also cause the loading of files to to be slower, which in turn would hand them over slower to the GPU, and this could cause the processing overall to be slower. Have in mind that you are comparing a high-end GPU with an on-board GPU. Your Titan card will offload data much much quicker than your internal 620.

Feel free to write our Support-team with a Feature Request.
0
Stefan Hoffmeister

July 10, 2018 13:35
[quote="Christian Gruner" wrote:
Feel free to write our Support-team with a Feature Request.

Ah, but such a Feature Request would be rather ... non-sensical: "Please maximize use of all compute resources present in a system". So much, so obvious 😊

This kind of approach goes deep into software architecture, sufficiently granular decomposition of activities, and scheduled (parallel push-pull) execution of these activities.

From the outside world, the set of diagnostic tools available, and the level of familiarity with the (closed) architecture of C1, it is next to impossible to make useful requests :/
0
Philippe GAUDENS

July 10, 2018 15:05
Daffy,

I agree with you regarding a ticket to support team... I just wanted to better understand the difference between integrated GPU and graphic card GPU, except performances...

Christan Gruner,

I'm not sure that you theory regarding CPU or GPU feeding with big or small files is right... My testings shows that when I disable GPU acceleration, my CPU is well fed, as the load is 100%...
When I enable GPU accelration, the CPU load is 20%... But it can't be due to the file size... I use the same in each case...

The task manager may be not accurate for GPU load, but in this case, this the CPU load which interests me... And Windows handle the CPU load monitoring for years and years... I think we can trust it.

I agree with Daffy, even if the CPU is slower than a GPU, it should be fed to be 100% loaded, whatever if it treats data slower...

And as far as I know, when I export just one picture, on my home computer, I can see that both CPU and GPU are loaded the same way if I would export many pictures... That means they work together, despite there different performances.

anyway, thank you very much for your inputs 😉
0
Stefan Hoffmeister

July 10, 2018 19:13
[quote="Philippe GAUDENS" wrote:
I agree with Daffy, even if the CPU is slower than a GPU, it should be fed to be 100% loaded, whatever if it treats data slower...

There's just the challenge to keep the GPU fully saturated - which requires a bit of anticipatory / pull / priority-based scheduling of activities on the CPU 😊

This is non-trivial if performance(GPU) is reasonably close to performance(CPU) - take my measly ultrabook with integrated Intel graphics as an example. In case the GPU slaughters the world, simply and only saturating the GPU won't be too far off optimal throughput, with minimal implementation complexity and effort.

So, where is my low-weight high-OpenCL-TFLOPS long battery-life ultrabook ...?!
0
Christian Gruner

July 11, 2018 13:27
[quote="Philippe GAUDENS" wrote:
I'm not sure that you theory regarding CPU or GPU feeding with big or small files is right...

This is not theory, it is fact.

If you are using GPU acceleration, image computation will happen only on the GPU. The CPU will handle loading of raw-files, saving to disk and similar tasks. They each do what they do best.
If you only using the CPU (HW Acceleration to Never), the CPU obviously does everything.

When you see the CPU not being loaded to 100%, but your GPU is, it is a sign that your GPU is the bottleneck. Add more GPU power, and you will also see the the CPU load go up, as the CPU has deliver things faster to the GPU. CO supports up to 4 GPU's.

I know that the CPU could be used for image processing alongside the GPU, however, for the performance gained by this, vs adding more GPU power and the cost of GPU power, it is not currently worth it.

What you see in regards of CPU load in the very beginning of a processing queue using GPU, is likely the CPU adding the images to the queue, with all their setting etc.
0
Stefan Hoffmeister

July 11, 2018 17:28
[quote="Christian Gruner" wrote:
I know that the CPU could be used for image processing alongside the GPU, however, for the performance gained by this, vs adding more GPU power and the cost of GPU power, it is not currently worth it.

This tallies nicely with my "non-trivial scheduling".

Fundamentally, efforts to make C1 execute tasks faster are apparently directed at systems which have at least one non-lame GPU (the current crop of Intel GPUs are lame, without doubt). Personally, as the user of such lame systems, I would not be displeased if C1 could run _even faster_ on those. But then I have to concede that C1 performs comparatively quite well on such lame systems.

And, alas, I am too cheap to add an eGPU (https://egpu.io/forums/pro-applications ... post-35572) to that one of my lame systems which has a plug for Thunderbolt ...
0
Permanently deleted user

July 13, 2018 10:53
[quote="daffy" wrote:
[quote="Christian Gruner" wrote:
I know that the CPU could be used for image processing alongside the GPU, however, for the performance gained by this, vs adding more GPU power and the cost of GPU power, it is not currently worth it.

This tallies nicely with my "non-trivial scheduling".

Fundamentally, efforts to make C1 execute tasks faster are apparently directed at systems which have at least one non-lame GPU (the current crop of Intel GPUs are lame, without doubt). Personally, as the user of such lame systems, I would not be displeased if C1 could run _even faster_ on those. But then I have to concede that C1 performs comparatively quite well on such lame systems.

And, alas, I am too cheap to add an eGPU (https://egpu.io/forums/pro-applications ... post-35572) to that one of my lame systems which has a plug for Thunderbolt ...

Where are you going to run this scheduler? On a USB stick?

In other words, what do you think the CPU is doing? Playing Tetris?
0
Permanently deleted user

July 13, 2018 11:17
[quote="Philippe GAUDENS" wrote:
[/size]Hello,

I just got a new laptop with Intel 8th Core i5 with its integrated UHD 620 GPU...

I just made export testings and noticed that when GPU acceleration is used for pictures export, I only see the GPU 100% loaded into the Windows 10 task manager (with only 20% CPU loaded), while when I disable GPU acceleration it is obviously the CPU that is 100% loaded...

On my home desktop, I have a bi-XEON 12 cores and a nVidia Titan Black (GTX 780 generation), and both are 100% loaded when I export my RAWs... I mean all my CPU cores AND my GPU are 100% loaded... (90% CPU and 70% GPU in reality). I only need 2s to export a 50mp RAW, and I LOOOVE that !

Why C1 Pro 11 is not able to load my Core i5 8th gen CPU AND its UHD 620 GPU in parallel ?

My benchmarks gave me about 8s to export a 50mp RAW from my Canon 5DsR for both: GPU alone, and CPU alone...

It would be a big benefit if both could work in parallel.

Here is a screenshot:

Regards,
Philippe.

That's just how it is. This CPU can handle a vastly larger dataset than the however powerful integrated GPU that's tethered to it via an even more limiting x1 gen 3 PCI-e link.

The Titan X has its own memory and is handled on an x16 gen 3 link.

No comparison and no fault of Capture One.

...and about the CPU and GPU being used in parallel...

In a way they already are. Concurrently perhaps. Like Christian alluded to, it's different architectures and different algorithms with different levels of computation and register precision that need to achieve the same result.

They already thought of that...
0
Stefan Hoffmeister

July 13, 2018 18:28
[quote="gusferlizi" wrote:
In other words, what do you think the CPU is doing? Playing Tetris?

The CPU uses its left middle finger to drill for oil in its nose.

I'd rather have it expend this effort on contributing to render image #296 out of the monster set C1 is importing - and that's because I am too cheap to buy a GPU and an eGPU enclosure.

FWIW - I have a pretty decent idea how hard decompisition and resource scheduling is, in general, and I have zero idea how to perform *technically* optimal scheduling is for the tasks that C1 needs to run. And, just to make it explicit, *technically optimal* is only weakly correlated with *economically optimal*. This make me use very passive wording such as "not displeased" as opposed going all-out aggro.
0
Permanently deleted user

July 13, 2018 20:08
[quote="daffy" wrote:
I'd rather have it expend this effort on contributing to render image #296 out of the monster set C1 is importing - and that's because I am too cheap to buy a GPU and an eGPU enclosure.

Don't get me wrong, I had a similar dream 10 years ago.

I wanted to implement a resource exchange subsystem in the Linux kernel, that would somewhat glue the FPU clusters of a GPU to the traditional CPU/FPU package with some sort of architecture emulation that would result in the ultimate computational load levelling technology.

Fcuk if I had a clue! It builds up really thick really quick. Layer and layer of powerhogging cultural translation. Sooner than later it's worse than socialism, and my god, poor little sodomized CPU crashing through every one of those mean interrupt cycles.

Fast forward to today, OpenCL is the practical compromise.

(I tried Stand-Up Comedy too. Didn't work either.)

Also, the CPU usage indicator you see is a load over time 'average' to give the user a bit of a clue that there's something going on, since you can't count to 4 billion every second, nor would it be realistic to refresh the screen that fast, if the resources were available to begin with.

Rest assured; that little CPU is hard at work turning the crank on that set of GPU rollers to render your precious #296, and many other routine threads.
0
Permanently deleted user

July 13, 2018 20:40
Maybe an analogy for what you mean, is for the CPU to allocate and hold nails for the GPU to hammer on using it's right hand, while it hammers on another nail with it's left hand and both feet; trying to not get hammered on a hand nor hammer a foot.

It is possible, but a slight paradox at the same time, with little gained benefit, since one or the other is often quicker than both at the same time.
0
Philippe GAUDENS

July 13, 2018 21:01
[quote="Christian Gruner" wrote:
[quote="Philippe GAUDENS" wrote:
I'm not sure that you theory regarding CPU or GPU feeding with big or small files is right...

This is not theory, it is fact.

If you are using GPU acceleration, image computation will happen only on the GPU. The CPU will handle loading of raw-files, saving to disk and similar tasks. They each do what they do best.
If you only using the CPU (HW Acceleration to Never), the CPU obviously does everything.

When you see the CPU not being loaded to 100%, but your GPU is, it is a sign that your GPU is the bottleneck. Add more GPU power, and you will also see the the CPU load go up, as the CPU has deliver things faster to the GPU. CO supports up to 4 GPU's.

I know that the CPU could be used for image processing alongside the GPU, however, for the performance gained by this, vs adding more GPU power and the cost of GPU power, it is not currently worth it.

What you see in regards of CPU load in the very beginning of a processing queue using GPU, is likely the CPU adding the images to the queue, with all their setting etc.

Absolutely wrong... GPU acceleration activated, the CPU will also participate to computation! On my home killing machine GPU acceleration is activated and when I export RAWs, my CPU is 90% loaded. I have two Xeon 12 cores, so 48 threads working together at 90% and my GPU at 70%.
I can't imagine that feeding my GPU and handling read/write operation on my SSD would require such a load on the CPU...
0
Permanently deleted user

July 13, 2018 21:29
[quote="Philippe GAUDENS" wrote:
[quote="Christian Gruner" wrote:
[quote="Philippe GAUDENS" wrote:
I'm not sure that you theory regarding CPU or GPU feeding with big or small files is right...

This is not theory, it is fact.

If you are using GPU acceleration, image computation will happen only on the GPU. The CPU will handle loading of raw-files, saving to disk and similar tasks. They each do what they do best.
If you only using the CPU (HW Acceleration to Never), the CPU obviously does everything.

When you see the CPU not being loaded to 100%, but your GPU is, it is a sign that your GPU is the bottleneck. Add more GPU power, and you will also see the the CPU load go up, as the CPU has deliver things faster to the GPU. CO supports up to 4 GPU's.

I know that the CPU could be used for image processing alongside the GPU, however, for the performance gained by this, vs adding more GPU power and the cost of GPU power, it is not currently worth it.

What you see in regards of CPU load in the very beginning of a processing queue using GPU, is likely the CPU adding the images to the queue, with all their setting etc.

Absolutely wrong... GPU acceleration activated, the CPU will also participate to computation! On my home killing machine GPU acceleration is activated and when I export RAWs, my CPU is 90% loaded. I have two Xeon 12 cores, so 48 threads working together at 90% and my GPU at 70%.
I can't imagine that feeding my GPU and handling read/write operation on my SSD would require such a load on the CPU...

You mean 'not in my experience'.

I could be talking slightly out of my butt, but consider that: Xeons have lower clocks. Only one CPU is hardwired to the GPU. There is a bottleneck between the two CPU's when working on the same dataset (ideal scenario is having them do different things, talking to memory more often than between each other). Hyperthreading can be detrimental on some types of loads (think of the torque converter of an automatic car transmission maybe). Etc.

I don't know how CO is engineered to perform in this situation, however.
0
Christian Gruner

July 16, 2018 09:05
I'm sorry, but unless you are not willing to listen to facts, I cannot help you.
I am telling you that when you use OpenCL for processing, there is no image processing, for the full-res image, being done by the CPU. This comes directly from the developers doing our Image Processing.

There are work being done by the CPU creating thumbnails, but that take only a few miliseconds.
The rest is from handling loading, saving etc.

What Xeon processors are you running ?

[quote="Philippe GAUDENS" wrote:
[quote="Christian Gruner" wrote:
[quote="Philippe GAUDENS" wrote:
I'm not sure that you theory regarding CPU or GPU feeding with big or small files is right...

This is not theory, it is fact.

If you are using GPU acceleration, image computation will happen only on the GPU. The CPU will handle loading of raw-files, saving to disk and similar tasks. They each do what they do best.
If you only using the CPU (HW Acceleration to Never), the CPU obviously does everything.

When you see the CPU not being loaded to 100%, but your GPU is, it is a sign that your GPU is the bottleneck. Add more GPU power, and you will also see the the CPU load go up, as the CPU has deliver things faster to the GPU. CO supports up to 4 GPU's.

I know that the CPU could be used for image processing alongside the GPU, however, for the performance gained by this, vs adding more GPU power and the cost of GPU power, it is not currently worth it.

What you see in regards of CPU load in the very beginning of a processing queue using GPU, is likely the CPU adding the images to the queue, with all their setting etc.

Absolutely wrong... GPU acceleration activated, the CPU will also participate to computation! On my home killing machine GPU acceleration is activated and when I export RAWs, my CPU is 90% loaded. I have two Xeon 12 cores, so 48 threads working together at 90% and my GPU at 70%.
I can't imagine that feeding my GPU and handling read/write operation on my SSD would require such a load on the CPU...
0
Christian Gruner

July 16, 2018 09:08
[quote="gusferlizi" wrote:
Hyperthreading can be detrimental on some types of loads (think of the torque converter of an automatic car transmission maybe). Etc.

I don't know how CO is engineered to perform in this situation, however.

Internal tests have shown that enabling/disabling doesn't make much of a change. You might win 0.5-1% in raw processing time with hyperthreading disabled, but then loose it again in other areas of operation. In my mind, not worth messing around with.
0

Post is closed for comments.