Showing posts with label multi-thread. Show all posts
Showing posts with label multi-thread. Show all posts

Tuesday, June 3, 2008

OS scheduler and grilled beef skewers

BBQ
Did you try to cook beef skewers on a BBQ ? A BBQ is not a balanced heat source, and skewers are difficult to turn around. The result is that some pieces are burned while others are still raw.

For sure OS makers are aware of this problem. The proof ? Try to run an infinite loop in a shell, like in bash: while true; do set x 1; done, and look at the task manager to see how busy are your multi-core machine.

The naive answer suggests one core is 100% busy while others are free. However the reality is different. On XP, the OS scheduler happily migrates the infinite loop process from one core to another making all cores partially busy with this infinite process.

Advantages: the only one I see is that load balancing avoids extra heat on one part of the CPU, exactly as if the skewer where regularly turned and moved around all over the grid to be better cooked.

Drawback: the process migrates, which means in addition to context switch overhead the data are copied from one L2 cache to another. Overall time is longer than on a mono-core machine.

Workaround: You can pinpoint a thread on one core and prevent it to migrate elsewhere. This is called "single core affinity".



Tuesday, May 13, 2008

Multi-threading: the mine field is ahead

You think you master multi-thread programming ? CPU makers are asked to improve CPU efficiency, without paying
attention to the software behind who needs to cope with the new features. To put it clear, CPU makers are working on your next traps.
  1. memory access swapping
  2. instruction reordering
  3. asymmetric cores and unfair memory access

A read instruction is a lot faster than a write, and the memory bus cannot perform in both directions at the same time. Memory access, even a read, is still slower than CPU basic instructions. To keep the CPU busy, it may be needed to swap some memory access. For instance if the CPU has enough data in the cache to run without other memory access, it's a good time to perform an expensive write, even if read access are scheduled first. Also, some instructions can be reordered to optimize the processing pipeline and take advantage of the co-processors.

These arrangements inside the CPU are managed by a code analyzer which detects commutativity of sequence of instructions, and independence between variables. This analyzer works in a single thread context, so it's up to the developer to identify multi-thread issues and to forbid the CPU to arrange some parts of the code by synchronizing code and protecting data access.

Examples are numerous, here is a simple one, in a symbolic language

boolean ready = false;
int result = 0;

thread1:
while (!ready) {
Thread.yield();
}
print(result);

thread2:
result = 11;
ready = true;



The human understands quickly the purpose of this program. Thread2 is producing a result, toggling a boolean when completed. The other thread waits for the result to be ready before consuming it. But without any protection, this program can print 11, 0, of even nothing because thread1 loops forever!

That's because if you imagine yourself a single core executing thread2, you don't bother writing back the values result and ready because you don't need them afterward. Only a context switch forces you to flush the cache and then to update the memory location of these variables.

The core running thread1 is even worse. ready and result are not bound by an expression. They are independent so they can be fetched in any order. In particular, if result is read before ready, it prints 0. Also ready, once loaded on the cache may never been refreshed from the main memory, which leads to the infinite loop.

Sometimes it runs just like human expected, and the program prints 11. But that's a lucky execution, actually...

Last mine in our field, for the moment, is the asymetric architecture: a core has some privileges, runs faster or has more cache, or gets higher priority to the memory bus. It's a nice idea. An important thread can run faster than the other. But once again the software is far behind. OS kernel are usually SMP (symetrical multi-processor), and dispatch processes in a fair way across the processors, which is exactly what we don't want. Same for the higher level.

Assuming we have the possibility to pinpoint an important thread in the best core, the developer still have to think again the multi-thread logic to optimize the core assignments. Still good development time to go...

Friday, March 7, 2008

The Next Programming Language



Everybody knows Moore's law: "Computer performance doubles every 18 months". But programming languages have also their growing law: "A very new language appears every 10 years". "very new" is indeed relative, and should be understood as "language with new features and successful". Let's review the last ones:

  • 1972 : C was the first high level language bound to an operating system (unix). For the developer, it means a very fine grain control on the host machine and the ability to use a high level language to program at low level.
  • 1983 : C++: C with an object oriented layer. Note that one of the most claimed feature was the portability, which happened to be a disaster (C++ libraries were less compatible than C ones). C++ went to far. Macro, ability to override the operators, multiple inheritance made eventually the applications a nightmare to maintain and to integrate.
  • 1996 : Java: Object oriented, native multi-thread, GC, beans, exceptions handling, no macro, no multiple inheritance, clean packaging, portability, applets as the very first browser plugin, etc. A lot of advantages.
  • 200* : Web scripting languages. javascript, php, flex, XUL, etc. We can't say that one is the leader, but for sure they all contributed to empower the web and to make the web experience as sophisticated as a full fledged application. Some people says it's rather .NET/C#. Sorry, I disagree. There is no revolution with C#. It stays in between.
  • 2010 : What is waiting for us ? My guess is a new language that will support multi-thread as simply as Java managed the memory for the developer.
Multi-threading is really calling for a new language. The multicore architecture need a fine grain control over thread dispatching across the cores. When 2 threads are expected to communicate a lot, they should be running on two close cores so they can share the same L2 cache.

Most of multi-threaded languages, like Java, offers synchronization by locks which is probably simple to implement on the OS, but surely the most difficult for the developer. I really believe there are other viable solutions, like continuous transaction already working for database, which reliefs the developer synchronization logic and all race condition/block/starvation/CPU contention bugs. These bugs are a pain to track down and fix. We have almost no tool to help, and no background theory. Sigh.

Java is my favorite language, but I must admit it's out regarding multi-thread. The new JDK1.5 package java.util.concurrent helps a lot, but basically it doesn't get rid of the complexity.

Beside that, the next language will look very similar to Java, with smart packaging, Object oriented layer, GC, etc., and usual syntax for control statements.

Thursday, February 21, 2008

Multithread: the new era


Multi-threading is not new. This is, for instance, natively supported in Java since the beginning in 1996.

What is new started with this very popular article that appeared in Dr. Dobb's Journal, 30(3), March 2005. Basically the heat produced by the CPU doubles when the clock is 20% faster. Usual fans and radiators are saturated with CPU around 4GHz. It means the clock race is over. But processor manufacturers are smart. If they slow down the CPU by 20%, it produces half the heat, so let's put two of them in the same chip! The dual core is born. Same thermal envelope, 2 cores, 20% slower than mono core version. Moore's law still hold if we consider each core as a contributor of the total machine speed.

By the end of 2007, almost all new machines are equipped with multi-core processors.

From the software point of view, it has 2 major impacts:
  • Customers are now running your applications in true multi-threaded environment.
  • The multi-core hardware speeds up performance only if the application is multi-threaded.
We are still correcting bugs coming from first point. Even Java made mistakes, for example the famous single thread rule of Swing happens to fail on multi-core machines.

We are only starting to tackle the second point. And we have very few tools and theory background to help us so far.

After the Recursivity and the Object Oriented Languages, now comes the Multi-Threading.