Linux ommand line render never uses all CPU threads

Blender's renderer and external renderer export

Moderators: jesterKing, stiv

xorax
Posts: 8
Joined: Thu Oct 04, 2012 2:44 pm

Linux ommand line render never uses all CPU threads

Postby xorax » Sat Jan 26, 2013 6:33 pm

I'm running blender on Debian 6 64bit with CPU rendering on a Xeon E31220 cpu @3.1GHz (4 threads) and I render an animation of 128 frames to PNG files with 14 tiles (4*4).

Command line is like :

Code: Select all

blender -b model.blend -o '//###.png" -F PNG -s 0 -e 127 -noaudio --render-anim


Rendering takes ~48 seconds to render 128 files.

If I try to assign more threads than detected : 16 threads (I suppose it detected 4 threads) :

Code: Select all

blender -b model.blend -o '//###.png" -F PNG -s 0 -e 127 -t 16 -noaudio --render-anim


Same result (~45s).

In this 2 previous cases, the CPU is used between 110% to 200% maximum, when it could run at 400%.

So, I tested to render the images with multiple processes :

Code: Select all

#!/bin/sh
(date && blender -b model.blend -o '//###.png" -F PNG -s 0 -e 31 -t 4 -noaudio --render-anim && date) &
(date && blender -b model.blend -o '//###.png" -F PNG -s 32 -e 63 -t 4 -noaudio --render-anim && date) &
(date && blender -b model.blend -o '//###.png" -F PNG -s 64 -e 95 -t 4 -noaudio --render-anim && date) &
(date && blender -b model.blend -o '//###.png" -F PNG -s 96 -e 127 -t 4 -noaudio --render-anim && date) &


Win, now the rendering take ~22 seconds for each process, and a total of 23 seconds, and all the CPU is used (~390%).

So what's wrong with the render engine of blender ? Did I make something wrong with my blender model or in the first command line ?

I suspect blender waits the rendering of a frame before start rendering of the next frame, this is why if you don't have many tiles per image blender is low, because the render is threaded only by image, not on the entire animation. I'm right?

Thanks

rty
Posts: 6
Joined: Fri Jan 25, 2013 6:31 pm

Postby rty » Sat Jan 26, 2013 8:16 pm

I think that the bottleneck is your CPU and you gain something with a multiprocess approach because of the high performance of the linux kernel regarding process and memory management.

this is your CPU and as you can see it's not bleeding edge technology; most importantly there is not HT for your CPU meaning that, if you want to use your CPU for some multi-threading stuff you probably do not want to exceed the number of 3 threads, since you have to reserve 1 thread for the OS and other daemons in the background especially on your CPU that is clearly not meanted for hardcore multi-threaded stuff and can easily put your system stability in jeopardy.

I don't think that there is too much to comment on this, you have a bottleneck and it's your CPU that really doesn't shine even in the memory management section.

My suggestion is to keep the number of threads between 3 or 4 for each process or buy a new and better CPU if you can.

xorax
Posts: 8
Joined: Thu Oct 04, 2012 2:44 pm

Postby xorax » Sun Jan 27, 2013 12:23 am

I'm not sure to understand, the CPU is used entirely only if I run multiple render (so multiple process) and never when I use Blender to build the threads. I don't really know the problematics about shared memory between threads, but Blender should use all the cores and all the CPU, it's not normal that I take less time with multiple process than with only one.

But for proof that the problem is not the CPU, I tested on a bi-Xeon E5-2643 3.30GHz (2*4 cores, 2*8 threads, see the cpuinfo) and it's exactly the same result. Blender rendering take only between 180% to 250% of the CPU with the first command line, and take ~35s (this processor has a highter frequency and more porwerfull in short).


I know I should keep 1 thread for the system but it is never overloaded because CPU is never used too much in these case (except on the last cmd where I run 4*4 threads)

Another test with 3 process of 1 thread each on the previous Xeon E31220 :

Code: Select all

#!/bin/sh
(date && blender -b model.blend -o '//###.png' -F PNG -s 0 -e 42 -t 1 -noaudio --render-anim && date) &
(date && blender -b model.blend -o '//###.png' -F PNG -s 43 -e 85 -t 1 -noaudio --render-anim && date) &
(date && blender -b model.blend -o '//###.png' -F PNG -s 86 -e 127 -t 1 -noaudio --render-anim && date) &


The render take ~42s, a bit less than my first command line and the CPU is used at 3*~70%.

Another test with 4 process of 1 thread each :

Code: Select all

#!/bin/sh
(date && blender -b model.blend -o '//###.png' -F PNG -s 0 -e 31 -t 1 -noaudio --render-anim && date) &
(date && blender -b model.blend -o '//###.png' -F PNG -s 32 -e 63 -t 1 -noaudio --render-anim && date) &
(date && blender -b model.blend -o '//###.png' -F PNG -s 64 -e 95 -t 1 -noaudio --render-anim && date) &
(date && blender -b model.blend -o '//###.png' -F PNG -s 96 -e 127 -t 1 -noaudio --render-anim && date) &


The render take ~33s and each process take ~70% of the CPU, less time than the first standard cmd line that should full use 4 threads.

This is why the problem doesn't come from the CPU but from the rendering process of blender. As I said, I think Blender wait from finish the render of current frame before start the renderi of the next frame. Or maybe it's a wrong memory management, but I don't know tools to trace memories movements between thread...

Thanks

rty
Posts: 6
Joined: Fri Jan 25, 2013 6:31 pm

Postby rty » Sun Jan 27, 2013 3:54 am

quick lesson: benchmarks and percentages are for the marketing department, solving problems looking only at this numbers ( that mean nothing ) can be extremely easy and even a monkey can be a super-duper analyst.

you have 3 workers, we consider 1 hour of work, you assign tasks to them and they keep working for the entire hour without taking any break; question, what this mean to you as boss ? Exactly nothing, you have only a reference about the time and the fact that they never leave the room or they have never taken a break, this say nothing about their productivity, which is the real number that you want.

The difference here is the same difference between a number and a ratio, the productivity is a ratio between how much time you work and how much you produce, you are only considering the time or how much they produce, you are not considering both.

The point with multi-threaded programming is always the memory, the cost that you pay and what you can optimize and all the troubles, in the multithreading programming, are about memory, as funny as it sounds, the CPU is just an executer, the real player is always the memory controller. You want a fully working CPUs ? have fun with dozens of benchmarks or math scripts that fire up your CPU, but the productivity can't be determined by this percentage.

It's possible that Blender it's not optimized for that, it's possible that the memory on your setup is really slow for many reasons, the real problem here is that with threads, as you go higher, you need a really fast memory controller, really fast, memory matters more than the CPU most of the times.

In this regard your second option is not better either, it's as old as your first CPU, you want a real proof, try blender on a really high end machine, even one with the same clock, but with a better memory controller.

xorax
Posts: 8
Joined: Thu Oct 04, 2012 2:44 pm

Postby xorax » Wed Jan 30, 2013 2:02 am

This processor is out less than one year and running on 64GB of ECC memory on SSD with RAID hard, the problem donesn't come from the hardware. And the simple proof is it can assign and work with more memory in less time (4 processes VS 4 threads). My 3 workers as you said have the same memory capability and speed than one, and they can use more without restriction because the CPU is higher.

My goal is to render the animation as fast as possible. If I indicate 4 threads then I wait that theses 4 threads are fully used. I thought Blender was capable, but it doesn't, maybe I badly read the doc or it's a mistake in marketing arguments.

So I use Python Thread to create one Blender process per CPU thread. Less productive, but more fast. Even with this, the CPU is no ued at 100% all the time and it varies between 100% and 85% (* number of cores). That why I think blender waits the generation of an image before genereting the next (so threaded on tiles of one image)

psullie
Posts: 554
Joined: Thu Sep 11, 2008 11:09 am
Location: Ireland

Postby psullie » Fri Feb 01, 2013 9:57 pm

Have you tried increasing the tile count, 16 (4x4) could be your bottle neck Blender won't open the next frame until all times are done. With large tiles this can mean that 3 of your cores could be idle while waiting for the last to finish.
When you run 4 separate processes this effect is reduced, hence the speed increase. Try smaller tiles, I run an old quad Xeon (no HT) and find that ~520 tiles works best, but yours may differ.

xorax
Posts: 8
Joined: Thu Oct 04, 2012 2:44 pm

Postby xorax » Sat Feb 02, 2013 1:36 am

Thanks for the suggestion.

I tried to increase tiles to 360 (60x60) and run with 4 threads but, against all expectations, this not decrease the render time (+300s...) and the processor is used only at ~7% (~15% of one core).

If I increase threads to many more like 16, the render time decrease but always more long than normal (4*4 tiles).

Then, I tried to decrease to only one (1*1) : the render take ~50s, so only 2s more than with 4*4 tiles.

I don't have explanation for this bad performance. Definitively I think it's a bad memory management by Blender over multi thread or something wrong in the script.

The best way stay to run several Blender processes. Here the script I made and use to do it : https://gist.github.com/4694898

Gode
Posts: 1
Joined: Sat Feb 02, 2013 3:30 am

Postby Gode » Sat Feb 02, 2013 3:48 am

xorax wrote:Thanks for the suggestion.

I tried to increase tiles to 360 (60x60) and run with 4 threads but, against all expectations, this not decrease the render time (+300s...) and the processor is used only at ~7% (~15% of one core).

If I increase threads to many more like 16, the render time decrease but always more long than normal (4*4 tiles).

Then, I tried to decrease to only one (1*1) : the render take ~50s, so only 2s more than with 4*4 tiles.

I don't have explanation for this bad performance. Definitively I think it's a bad memory management by Blender over multi thread or something wrong in the script.

The best way stay to run several Blender processes. Here the script I made and use to do it : https://gist.github.com/4694898


I think the problem lies in the calculations that are done before every frame is rendered. So when you set a process for each frame what your doing is the doing the computations for each of the frames before the rendering even begins (EG a raytree) and then rendering a frame in each process. In the multiple thread scenario the precomputation is done only for the first frame, then the multithread rendering, then computation, then multithreaded rendering, etc. In this pre-computation I believe you are picking up idle cpu clocks. Also, it's possible that your operating system may be limiting a single process a maximum amount of cycles per second in order to maintain system stability, where as in the multiple process scenario they are all using a smaller amount of cycles but they add up to a higher amount together.

The multi-threading isn't really designed to benefit a 4 frame render, and honestly it's not even designed to necessarily boost speed - there are other aspects to consider during a render as rty mentioned - one multi-threaded process will always use less memory than multiple process. And in the long run, it will be more efficient without having to run multiple processes.

xorax
Posts: 8
Joined: Thu Oct 04, 2012 2:44 pm

Postby xorax » Sat Feb 02, 2013 5:18 am

It's a possibility, pre-computation can be limited to one thread. But the OS doesn't limit a process, it's Debian and the priority of the processes are managed by "nice" command. I can easily freeze the system by running a loop a the highest process priority, and it's not the case with a Blender render.

In all cases I think we can agree that the multi threaded Blender render only uses multi thread by frame, and not on the entire animation.

I don't agree that the multi-threding is not designed to boost speed. I would say, for what is it designed so ? Boost speed is the only goal of mutli threading.


Return to “Rendering”

Who is online

Users browsing this forum: No registered users and 2 guests