Page 2 of 3

Posted: Sun Feb 11, 2007 3:52 pm
by stiv
3. Why doesn't it go to 11?
That was *my* first thought!

There are a couple reasons. One is internal to blender - the data structures needed to handle threads and merge tiles. The other is a point of diminishing returns.

Contrary to popular belief, threads is not magic pixie dust you can sprinkle on applications to make them run faster. Using threads actually creates computing work - you have the original work + the overhead of creating and managing threads.

The only time this will be faster is if thread processes can be overlapped and the time saved is greater than the added thread overhead. This is the clue to why our Nexis-6 friend a couple posts north is not seeing his expected speedup with a Hyperthreaded P4.

Hyperthreads is a sort of poor man's dual core that splits a single processor. HT can speed up tasks like compiling that mix computes and i/o, but don't do much for compute bound tasks like rendering.

As an actual example for the point of diminishing returns, somewho was doing tests on a big Sun box and found that after more than a small number of threads ( 4 or 6 iirc ) render time started to increase due to the threading overhead.

So, in a nutshell, 8 threads max is a reasonable compromise for now.

Posted: Sun Feb 11, 2007 6:05 pm
by sarah
stiv wrote:The only time this will be faster is if thread processes can be overlapped and the time saved is greater than the added thread overhead. This is the clue to why our Nexis-6 friend a couple posts north is not seeing his expected speedup with a Hyperthreaded P4.

Hyperthreads is a sort of poor man's dual core that splits a single processor. HT can speed up tasks like compiling that mix computes and i/o, but don't do much for compute bound tasks like rendering.

As an actual example for the point of diminishing returns, somewho was doing tests on a big Sun box and found that after more than a small number of threads ( 4 or 6 iirc ) render time started to increase due to the threading overhead.
Yes, I've seen this myself. I also have a Mac Mini Core Duo in my office... rendering the same scene on my Mac Pro is significantly faster, even given their relative clock speeds, but rendering with four threads on the Mac Pro isn't anywhere near twice as fast as rendering with two on the same machine. I assumed the reason was memory bandwidth; with dual memory buses and two threads, both can run at full speed, but at 4 threads, the available bandwidth goes down by half.

I also know that sometimes MacOS has threading/process scheduling issues... I've found things that run slightly faster in Linux on a Parallels VM than they do on the Mac natively!

Do you know if there is additional blender-imposed threading overhead? I've only just started wandering through the blender source...

Posted: Sun Feb 11, 2007 8:15 pm
by LetterRip
Stivs,

if current CVS worked for 16+ CPUs the current improvement per processor would still give improved performance for up to 16 or more probably. However above 8 processors the user is pretty likely to hit ram limits for anything non trivial, since each process currently is allocating its own ram for each tile.

LetterRip

Posted: Sun Feb 11, 2007 11:35 pm
by stiv
Sorry, to make the simple assumption that if a few are better than one, then more is always faster is naive. There are a lot of variables besides the computing problem and how it is implemented that make a real world answer platform dependent - things like the OS, the threading model in use, the processors themselves, memory bandwith, cache size, available memory and so on. However, given any set of circumstances, for a fixed amount of work like rendering, you are going to reach that point where adding more threads increases the time. A beefy machine like an SGI that is designed for problems like this will do far better than run of the mill dual/quad/whatever servers, but you cannot ignore the administrative overhead of dividing up the work and managing the threads and their output. And as you point out, running out of memory can be a limiting factor.

The only real-world data I can point to at the moment is an experiment one of the developers did one afternoon on IRC. Memory (mine!) is a little sketchy, but she had access to Sun box with 8 (iirc) processors. Increasing the thread count gave decreasing times, but not in a linear way. At some point, around 4-6 (again iirc) threads there was an optimum, but adding more threads beyond that point led to increasing times.

I have seen similar results running multi-threaded server apps on 16 and 32 processor Sun Enterprise boxes at Enron, but that was a long time ago and I don't have any numbers to go with the anecdote.

Posted: Mon Feb 12, 2007 9:07 am
by _styken
Strange, I've always had the assumption that if you put thread count = number of CPU = number of tiles, you will get maximum performance out of a multi core/cpu system.
The overhead can't be more than it takes for one cpu to render one tile! or can it? (We talk of course of large rendering times here!)

Quad-Core CPU/nVidia SLI tech.

Posted: Mon Feb 12, 2007 11:39 pm
by advs89
Yeah, I'm saving up for my new pc, and i'm looking at this so far:
-Two AMD Athlon 64 bit FX 3.0 GHz processors (dual core), making it Quad Core.
-Two nVidia GeForce 7600GTs, SLI'd together.
-2GB of fast Corsair memory
-and a really good case (with good airflow) and a good power supply

Don't know if the SLI technology will help any with Blender, but it will definitely help in other areas...

However, I'm curious if anyone has tried a quad-core system and/or two nVidia GPUs SLI'd together with Blender yet... If you have, and could let me know of your results, I'd be much appreciative.

Thx,
advs89

Also...

Posted: Mon Feb 12, 2007 11:44 pm
by advs89
Also, I was wondering if there is ever any reason at all to select "threads" if you only have a single cpu...

(Actually, i'm really just posting this because i forgot to check the "notify me when a reply is posted" box on my last post)

Posted: Mon Feb 12, 2007 11:48 pm
by LetterRip
i don't think there is,

regarding your setup - I'd go with more ram rather than dual slis personally - 4GB ram especially with linux would be of huge benefit to you.

LetterRip

Thanks

Posted: Tue Feb 13, 2007 12:29 am
by advs89
Well, it's funny, because I was originally considering that, but now, I might actually switch to that instead. I think I'll go with 4GB of dual-channel Corsair memory (I know it sounds like i'm advertising for them or something, but I just really, really, like that brand). Assuming my MB will support it, since there are very few MBs that support two CPUs, it leaves me little choice in other areas.

Thanks for your advice,
advs89

Posted: Tue Feb 13, 2007 11:36 am
by indigomonkey
I have a single-core AMD Sempron Mobile, and enabling threads does often improve the render times, somehow. The number of threads to optimise render times seems to vary according to the file, but it does do it quicker.

Yeah

Posted: Tue Feb 13, 2007 9:43 pm
by advs89
That's what I was thinking... I think it helps it take higher priority over the other system processes. Probably the equivalent of changing the thread priority, but at least this way, it's only during rendering.

Posted: Wed Feb 14, 2007 9:13 pm
by RoyBatty
Looks like it helps when the CPU is fully utilized.
I did a little test with Folding@Home and simple 800x600 frame rendering:

FAH running
threads off: 40.45
threads on: 25.30

FAH paused
threads off: 26.12
threads on: 22.63

Posted: Wed Feb 14, 2007 9:30 pm
by advs89
RoyBatty wrote:Looks like it helps when the CPU is fully utilized.
I did a little test with Folding@Home and simple 800x600 frame rendering:

FAH running
threads off: 40.45
threads on: 25.30

FAH paused
threads off: 26.12
threads on: 22.63
And my guess would be that if you were to try a test with threads off, and the process priority raised to "high", then the results would be very close to the results of "threads on", in both scenarios (FAH Running/Paused).

This would prove that "threads on" helps raise the priority of the process in the same way that the system "thread priority" setting does, therefore, speeding up render times, even on a single single-core cpu.

Posted: Thu Feb 15, 2007 12:43 am
by indigomonkey
Ah, right, that makes sense as I use BOINC - my CPU is constantly at 100%.

Posted: Thu Feb 15, 2007 8:30 pm
by RoyBatty
advs89 wrote:And my guess would be that if you were to try a test with threads off, and the process priority raised to "high", then the results would be very close to the results of "threads on", in both scenarios (FAH Running/Paused).
I'm afraid I can't confirm your theory - I tried to increase the process priority, but it didn't help to decrease the rendering time at all:

FAH running
threads off, priority High/Realtime: 1:09.47
threads on, priority Normal: 0:44.30