kabado

Tuesday, June 3, 2008

OS scheduler and grilled beef skewers

Did you try to cook beef skewers on a BBQ ? A BBQ is not a balanced heat source, and skewers are difficult to turn around. The result is that some pieces are burned while others are still raw.

For sure OS makers are aware of this problem. The proof ? Try to run an infinite loop in a shell, like in bash: while true; do set x 1; done, and look at the task manager to see how busy are your multi-core machine.

The naive answer suggests one core is 100% busy while others are free. However the reality is different. On XP, the OS scheduler happily migrates the infinite loop process from one core to another making all cores partially busy with this infinite process.

Advantages: the only one I see is that load balancing avoids extra heat on one part of the CPU, exactly as if the skewer where regularly turned and moved around all over the grid to be better cooked.

Drawback: the process migrates, which means in addition to context switch overhead the data are copied from one L2 cache to another. Overall time is longer than on a mono-core machine.

Workaround: You can pinpoint a thread on one core and prevent it to migrate elsewhere. This is called "single core affinity".

Tuesday, May 13, 2008

Multi-threading: the mine field is ahead

You think you master multi-thread programming ? CPU makers are asked to improve CPU efficiency, without paying
attention to the software behind who needs to cope with the new features. To put it clear, CPU makers are working on your next traps.

memory access swapping
instruction reordering
asymmetric cores and unfair memory access

A read instruction is a lot faster than a write, and the memory bus cannot perform in both directions at the same time. Memory access, even a read, is still slower than CPU basic instructions. To keep the CPU busy, it may be needed to swap some memory access. For instance if the CPU has enough data in the cache to run without other memory access, it's a good time to perform an expensive write, even if read access are scheduled first. Also, some instructions can be reordered to optimize the processing pipeline and take advantage of the co-processors.

These arrangements inside the CPU are managed by a code analyzer which detects commutativity of sequence of instructions, and independence between variables. This analyzer works in a single thread context, so it's up to the developer to identify multi-thread issues and to forbid the CPU to arrange some parts of the code by synchronizing code and protecting data access.

Examples are numerous, here is a simple one, in a symbolic language

boolean ready = false;
int result = 0;

thread1:
while (!ready) {
Thread.yield();
}
print(result);

thread2:
result = 11;
ready = true;

The human understands quickly the purpose of this program. Thread2 is producing a result, toggling a boolean when completed. The other thread waits for the result to be ready before consuming it. But without any protection, this program can print 11, 0, of even nothing because thread1 loops forever!

That's because if you imagine yourself a single core executing thread2, you don't bother writing back the values result and ready because you don't need them afterward. Only a context switch forces you to flush the cache and then to update the memory location of these variables.

The core running thread1 is even worse. ready and result are not bound by an expression. They are independent so they can be fetched in any order. In particular, if result is read before ready, it prints 0. Also ready, once loaded on the cache may never been refreshed from the main memory, which leads to the infinite loop.

Sometimes it runs just like human expected, and the program prints 11. But that's a lucky execution, actually...

Last mine in our field, for the moment, is the asymetric architecture: a core has some privileges, runs faster or has more cache, or gets higher priority to the memory bus. It's a nice idea. An important thread can run faster than the other. But once again the software is far behind. OS kernel are usually SMP (symetrical multi-processor), and dispatch processes in a fair way across the processors, which is exactly what we don't want. Same for the higher level.

Assuming we have the possibility to pinpoint an important thread in the best core, the developer still have to think again the multi-thread logic to optimize the core assignments. Still good development time to go...

Friday, April 18, 2008

RIA: is Java out ?

I've just read Hybridizing Java from Bruce Eckel, the author of Thinking in Java. Bruce was an early adopter of Java and now he opens fire on the language. How could he change his mind so radically ?

Thinking in Java was my primary book to learn Java, 10 years ago.
I still recommend it to my students or colleagues as it is ideal for developers with already a good knowledge of computer languages. And it's online, and free (all but the last edition, though).

Hybridizing Java is puzzling me. Some statements are obviously wrong or inflated (e.g. more and more sites are not compatible with Firefox, Flex is the only plugin that solves nicely the UI question on the browser side and without installation hiccups). Being a Flex evangelist (and Adobe consultant) shouldn't allow to be so rude with what made him so famous before.

Besides that, some thinkings are right. If a language didn't address correctly some issues in ten years, it's normal users are getting less confident and look elsewhere. This is what happens with RIA (Rich Internet Application). Java applets should dominate the domain, but hasn't. Flex takes it over. Even if Flex is not as powerful as Java today, it may become in the future. At least it allows to create attractive RIA. Flex is good enough, and the quantity of sites that use flex is definitely a proof.

"make simple things easy, and difficult things possible" is a fundamental Java design guideline. Flex simply did it better than Java for RIA.

Friday, March 28, 2008

When guitars meet computers

I am always surprised by the ratio of amateur artists among engineers. For example, we are 5 guitarists among my 10 closest colleagues in the office.

I guess it's likely the need to balance rigid computer logic with forgiving art. Music is a frequent choice, guitar seems the winner over piano (because of yet another keyboard?).

But the interesting point is that one kind of guitar, the electric guitar, wakes up the computer geek when he plugs the guitar into the microphone outlet of a basic sound card: it works.

The geek has just entered into a new kingdom, the DSP (Digital Signal Processing) land. There are tons of software, mainly VST plugins, that simulate digital effects and turn a common PC into the equivalent of hundred of kilos of racks of hardware and kilometers of cable. As described in this page, in French but with a lot of images, the software is rich of attractive GUIs with buttons, sliders, and visualization gadgets. All you need is a computer, a basic sound card, decent speakers, and the software. With very few investments the result is impressive, because the sound is really great.

The geek is now ready for his first quest: the Perfect Sound.

Once he's satisfied with the sound, let's aim to the second quest: the content. Here is the second advantage of the computer: internet is a huge repository of songs, guitar tablatures, guitar lessons, and even video of guitar players.

Then it's not very fun to play alone, and here again the computer helps: it provides an orchestra, playing mp3 or midi songs along the guitar.

And here I take my revenge over the computer. It rejected my program because I forgot a semi-colon, I impose it all my rehearsal with always the same faults at the same place. And when it's finished "play it again, Sam". For once, it follows all what I want it to do.

More precisely, I'm specialized in Pink Floyd's solos, like Is there anybody there, Time, or Fat old sun. I'm very impressed by David Gilmour plays. The solos are usually slow-paced, with few notes, but each one sounds great and contributes to a beautiful harmony... He uses a lot of bends, which consists in pushing the cords on the fret to apply extra tension and then augment the note pitch gradually. Plus very small tempo shift, it gives a solo full of tensions that drag brain attention, followed by a relief as the play catches up back to normal pitch and tempo.

Gilmour's solos looks simple on the paper, but believe me there are very hard to work out with the same touch. Anyway. I'm having a lot of fun with the electric guitar plugged into my pc, the sound presets, the midi orchestra, ...

Thursday, March 20, 2008

The most expensive bug

1996 june 4th, the first european rocket Ariane V exploded 40 seconds after launch. The payload alone cost about US$370 million. The cause ? A bad cast in the software initiated a chain of dramatic errors and led to Ariane destruction.

The full report worths the reading, here is a summary.

The attitude of the rocket is given by an Inertial Reference System (IRS), which is a combination of gyro lasers and accelerometers. This critical piece of hardware sends a stream of data about position, height, speed and acceleration to the main computer, which controls the exhaust pipes and drives the rocket along its expected trajectory.

Ten years ago, on Ariane III, a software function performed pre-flight checks to test alignment of the IRS. This function was no longer used on Ariane IV, but still ran during take off. You know it's easier to leave harmless code than to remove it. This function used 8 variables, 3 of them were not correctly protected although it was not an issue, because the rocket trajectory remains in range of these 3 variables.

No surprise, this function was still working on Ariane V. Unfortunately, Ariane V trajectory was a bit different and now one of the variables, the horizontal velocity, casted from 64bits float to 16bits integer, went out of range and raised an uncaught exception.

So far, no big deal. A check function raised an exception. Let's forget the check function and resume the mission.

However, the assumption on Ariane design was that software is always right and hardware may fail. The software reported an error, interpreted as the SRI was out of order. Then the SRI was shut down.

That's probably the biggest mistake. A failing unit test is embarrassing enough, but doesn't always mean the software is out of business. In this case the SRI still delivered reliable information. Unplugged, it couldn't any more.

The backup SRI started providing replacement data, and was shut down 0.05 second later, because of the same bug. Once again the assumption "hardware may fail, software not" made the backup SRI totally useless in this case.

Without sensible guidance, the rocket was doomed. But to accelerate the disaster, the SRI modules started to send stack traces instead of normal data to the main computer. The computer interpreted the data just as if the rocket was upside down and went into an emergency half turn. It started to tear apart under the physical constraints and initiated self-destruction process.

The story is sad enough like that, no need to add that a suitable test or full simulation before the flight would have found the bug.

By chance, I knew one of the member of the investigation team. He told me something not in the final report: what greatly contributed to kill Ariane V is the absence of experimented computer scientists at top management. The sofware components were simply divided and individually conducted. A competent software supervisor with suitable power could have found one of the errors, and prevented the cast exception to eventually stop the delivery of correct IRS data.

But Ariane was a physicist toy, they didn't share with a software department...

PS: The lesson was positively received. Today Ariane V is very successful and crashed only once more in 37 flights.

Friday, March 14, 2008

Marcel-Paul Schützenberger and complexity

It may happen in your lifetime that you have the luck to meet extraordinary people. MPS is one of them. I attended his lectures when I was a student at the university (Paris 7). This guy is not really famous, he didn't look for fame. We were always less than 10 in the audience. So who is he ?

MPS is a physician, a mathematician, and a computer scientist. Obviously he is excellent in all three domains. His various knowledge gives him a very realistic vision of computer science. Basically, computers were for him a tool to develop mathematics, and have a great future in biology. The last part sounds obvious today, but he said so more than 20 years ago.

But what makes him so attractive during the lectures is that he has stories to tell. As a founder of modern computer science, he met all the other founders all over the world (ok, mainly in USA), invented a very important theorem in language theory, and so he has a lot of nice anecdotes to tell about this pioneer period. I barely remember a couple of them I keep for another blog entry.

For the moment I want to focus on a sentence he said, which is carved in my brain forever:

"There are 2 kinds of program. The short ones and the long ones."

This can be understood different ways. I think the basic idea is that we should keep our distance from computer power and program complexity, and whatever we are trying to develop, after all it's only a computer program, so let's just break it down in sequence of instructions.

For example in the 80's, the hype was on Artificial Intelligence. For a majority of people AI was the most complex applications we can even think about, so much complex that nobody could complete them, btw. Eventually AI died because it didn't fulfill its promises. Some blamed computer performance, although computer power doubles every 18 months it will never catch up with AI complexity which scales up to infinity. Other blamed the poor expressivity of computer languages, too low level. That's better, but no. The real reason that killed AI is the lack of theory. Without theory, no suitable language and no proof of algorithm termination. Then your computer program can be as long as you want, it won't implement correctly the specifications (or you won't be able to prove it). Conclusion is that the program doesn't make it all by itself, it's only a tool and it won't cope with a hole in the theory.

Note that MPS didn't want to under-evaluate complexity in programming. For the developer, the complexity is relative to the constraints s/he has to deal with, about the programming language, tools, software architecture and design, and available resources. But the program is only the implementation of an algorithm, and a correct and efficient algorithm relies on a strong theory.

Friday, March 7, 2008

The Next Programming Language

Everybody knows Moore's law: "Computer performance doubles every 18 months". But programming languages have also their growing law: "A very new language appears every 10 years". "very new" is indeed relative, and should be understood as "language with new features and successful". Let's review the last ones:

1972 : C was the first high level language bound to an operating system (unix). For the developer, it means a very fine grain control on the host machine and the ability to use a high level language to program at low level.
1983 : C++: C with an object oriented layer. Note that one of the most claimed feature was the portability, which happened to be a disaster (C++ libraries were less compatible than C ones). C++ went to far. Macro, ability to override the operators, multiple inheritance made eventually the applications a nightmare to maintain and to integrate.
1996 : Java: Object oriented, native multi-thread, GC, beans, exceptions handling, no macro, no multiple inheritance, clean packaging, portability, applets as the very first browser plugin, etc. A lot of advantages.
200* : Web scripting languages. javascript, php, flex, XUL, etc. We can't say that one is the leader, but for sure they all contributed to empower the web and to make the web experience as sophisticated as a full fledged application. Some people says it's rather .NET/C#. Sorry, I disagree. There is no revolution with C#. It stays in between.
2010 : What is waiting for us ? My guess is a new language that will support multi-thread as simply as Java managed the memory for the developer.

Multi-threading is really calling for a new language. The multicore architecture need a fine grain control over thread dispatching across the cores. When 2 threads are expected to communicate a lot, they should be running on two close cores so they can share the same L2 cache.

Most of multi-threaded languages, like Java, offers synchronization by locks which is probably simple to implement on the OS, but surely the most difficult for the developer. I really believe there are other viable solutions, like continuous transaction already working for database, which reliefs the developer synchronization logic and all race condition/block/starvation/CPU contention bugs. These bugs are a pain to track down and fix. We have almost no tool to help, and no background theory. Sigh.

Java is my favorite language, but I must admit it's out regarding multi-thread. The new JDK1.5 package java.util.concurrent helps a lot, but basically it doesn't get rid of the complexity.

Beside that, the next language will look very similar to Java, with smart packaging, Object oriented layer, GC, etc., and usual syntax for control statements.