Dual core rendering????

Blender's renderer and external renderer export

Moderators: jesterKing, stiv

stiv
Posts: 0
Joined: Tue Aug 05, 2003 7:58 am
Location: 45N 86W

Post by stiv »

3. Why doesn't it go to 11?
That was *my* first thought!

There are a couple reasons. One is internal to blender - the data structures needed to handle threads and merge tiles. The other is a point of diminishing returns.

Contrary to popular belief, threads is not magic pixie dust you can sprinkle on applications to make them run faster. Using threads actually creates computing work - you have the original work + the overhead of creating and managing threads.

The only time this will be faster is if thread processes can be overlapped and the time saved is greater than the added thread overhead. This is the clue to why our Nexis-6 friend a couple posts north is not seeing his expected speedup with a Hyperthreaded P4.

Hyperthreads is a sort of poor man's dual core that splits a single processor. HT can speed up tasks like compiling that mix computes and i/o, but don't do much for compute bound tasks like rendering.

As an actual example for the point of diminishing returns, somewho was doing tests on a big Sun box and found that after more than a small number of threads ( 4 or 6 iirc ) render time started to increase due to the threading overhead.

So, in a nutshell, 8 threads max is a reasonable compromise for now.

sarah
Posts: 0
Joined: Sat Feb 10, 2007 11:29 pm

Post by sarah »

stiv wrote:The only time this will be faster is if thread processes can be overlapped and the time saved is greater than the added thread overhead. This is the clue to why our Nexis-6 friend a couple posts north is not seeing his expected speedup with a Hyperthreaded P4.

Hyperthreads is a sort of poor man's dual core that splits a single processor. HT can speed up tasks like compiling that mix computes and i/o, but don't do much for compute bound tasks like rendering.

As an actual example for the point of diminishing returns, somewho was doing tests on a big Sun box and found that after more than a small number of threads ( 4 or 6 iirc ) render time started to increase due to the threading overhead.
Yes, I've seen this myself. I also have a Mac Mini Core Duo in my office... rendering the same scene on my Mac Pro is significantly faster, even given their relative clock speeds, but rendering with four threads on the Mac Pro isn't anywhere near twice as fast as rendering with two on the same machine. I assumed the reason was memory bandwidth; with dual memory buses and two threads, both can run at full speed, but at 4 threads, the available bandwidth goes down by half.

I also know that sometimes MacOS has threading/process scheduling issues... I've found things that run slightly faster in Linux on a Parallels VM than they do on the Mac natively!

Do you know if there is additional blender-imposed threading overhead? I've only just started wandering through the blender source...

LetterRip
Posts: 0
Joined: Thu Mar 25, 2004 7:03 am

Post by LetterRip »

Stivs,

if current CVS worked for 16+ CPUs the current improvement per processor would still give improved performance for up to 16 or more probably. However above 8 processors the user is pretty likely to hit ram limits for anything non trivial, since each process currently is allocating its own ram for each tile.

LetterRip

stiv
Posts: 0
Joined: Tue Aug 05, 2003 7:58 am
Location: 45N 86W

Post by stiv »

Sorry, to make the simple assumption that if a few are better than one, then more is always faster is naive. There are a lot of variables besides the computing problem and how it is implemented that make a real world answer platform dependent - things like the OS, the threading model in use, the processors themselves, memory bandwith, cache size, available memory and so on. However, given any set of circumstances, for a fixed amount of work like rendering, you are going to reach that point where adding more threads increases the time. A beefy machine like an SGI that is designed for problems like this will do far better than run of the mill dual/quad/whatever servers, but you cannot ignore the administrative overhead of dividing up the work and managing the threads and their output. And as you point out, running out of memory can be a limiting factor.

The only real-world data I can point to at the moment is an experiment one of the developers did one afternoon on IRC. Memory (mine!) is a little sketchy, but she had access to Sun box with 8 (iirc) processors. Increasing the thread count gave decreasing times, but not in a linear way. At some point, around 4-6 (again iirc) threads there was an optimum, but adding more threads beyond that point led to increasing times.

I have seen similar results running multi-threaded server apps on 16 and 32 processor Sun Enterprise boxes at Enron, but that was a long time ago and I don't have any numbers to go with the anecdote.

_styken
Posts: 0
Joined: Sun Jul 13, 2003 10:32 pm
Location: Stockholm, Sweden

Post by _styken »

Strange, I've always had the assumption that if you put thread count = number of CPU = number of tiles, you will get maximum performance out of a multi core/cpu system.
The overhead can't be more than it takes for one cpu to render one tile! or can it? (We talk of course of large rendering times here!)

advs89
Posts: 0
Joined: Wed Apr 26, 2006 10:22 pm
Location: Roanoke, VA, USA

Quad-Core CPU/nVidia SLI tech.

Post by advs89 »

Yeah, I'm saving up for my new pc, and i'm looking at this so far:
-Two AMD Athlon 64 bit FX 3.0 GHz processors (dual core), making it Quad Core.
-Two nVidia GeForce 7600GTs, SLI'd together.
-2GB of fast Corsair memory
-and a really good case (with good airflow) and a good power supply

Don't know if the SLI technology will help any with Blender, but it will definitely help in other areas...

However, I'm curious if anyone has tried a quad-core system and/or two nVidia GPUs SLI'd together with Blender yet... If you have, and could let me know of your results, I'd be much appreciative.

Thx,
advs89

advs89
Posts: 0
Joined: Wed Apr 26, 2006 10:22 pm
Location: Roanoke, VA, USA

Also...

Post by advs89 »

Also, I was wondering if there is ever any reason at all to select "threads" if you only have a single cpu...

(Actually, i'm really just posting this because i forgot to check the "notify me when a reply is posted" box on my last post)

LetterRip
Posts: 0
Joined: Thu Mar 25, 2004 7:03 am

Post by LetterRip »

i don't think there is,

regarding your setup - I'd go with more ram rather than dual slis personally - 4GB ram especially with linux would be of huge benefit to you.

LetterRip

advs89
Posts: 0
Joined: Wed Apr 26, 2006 10:22 pm
Location: Roanoke, VA, USA

Thanks

Post by advs89 »

Well, it's funny, because I was originally considering that, but now, I might actually switch to that instead. I think I'll go with 4GB of dual-channel Corsair memory (I know it sounds like i'm advertising for them or something, but I just really, really, like that brand). Assuming my MB will support it, since there are very few MBs that support two CPUs, it leaves me little choice in other areas.

Thanks for your advice,
advs89

indigomonkey
Posts: 0
Joined: Fri Oct 08, 2004 12:48 pm

Post by indigomonkey »

I have a single-core AMD Sempron Mobile, and enabling threads does often improve the render times, somehow. The number of threads to optimise render times seems to vary according to the file, but it does do it quicker.

advs89
Posts: 0
Joined: Wed Apr 26, 2006 10:22 pm
Location: Roanoke, VA, USA

Yeah

Post by advs89 »

That's what I was thinking... I think it helps it take higher priority over the other system processes. Probably the equivalent of changing the thread priority, but at least this way, it's only during rendering.

RoyBatty
Posts: 0
Joined: Sat Feb 10, 2007 4:43 pm
Location: Czech Republic

Post by RoyBatty »

Looks like it helps when the CPU is fully utilized.
I did a little test with Folding@Home and simple 800x600 frame rendering:

FAH running
threads off: 40.45
threads on: 25.30

FAH paused
threads off: 26.12
threads on: 22.63

advs89
Posts: 0
Joined: Wed Apr 26, 2006 10:22 pm
Location: Roanoke, VA, USA

Post by advs89 »

RoyBatty wrote:Looks like it helps when the CPU is fully utilized.
I did a little test with Folding@Home and simple 800x600 frame rendering:

FAH running
threads off: 40.45
threads on: 25.30

FAH paused
threads off: 26.12
threads on: 22.63
And my guess would be that if you were to try a test with threads off, and the process priority raised to "high", then the results would be very close to the results of "threads on", in both scenarios (FAH Running/Paused).

This would prove that "threads on" helps raise the priority of the process in the same way that the system "thread priority" setting does, therefore, speeding up render times, even on a single single-core cpu.

indigomonkey
Posts: 0
Joined: Fri Oct 08, 2004 12:48 pm

Post by indigomonkey »

Ah, right, that makes sense as I use BOINC - my CPU is constantly at 100%.

RoyBatty
Posts: 0
Joined: Sat Feb 10, 2007 4:43 pm
Location: Czech Republic

Post by RoyBatty »

advs89 wrote:And my guess would be that if you were to try a test with threads off, and the process priority raised to "high", then the results would be very close to the results of "threads on", in both scenarios (FAH Running/Paused).
I'm afraid I can't confirm your theory - I tried to increase the process priority, but it didn't help to decrease the rendering time at all:

FAH running
threads off, priority High/Realtime: 1:09.47
threads on, priority Normal: 0:44.30

Post Reply