Friday, June 13, 2008

Software = Algorithms + User Interface + Bugs

This point of view holds for desktop applications, but also for any embedded software or more generally, everywhere there is a programmable chip.

The algorithm part is the most obvious for developers. It represents the processing of data and more recently the integration of libraries at different levels to provide compatibility between the various operating systems, languages, and protocols (read for instance: the web is a mess). Algorithms are the engine of the software. When developing resources are low the temptation is high to focus only on this part, because it produces the main output.

User interface is the steering wheel and the dashboard of the software. It becomes more an more an issue especially for portable devices, like a mobile phone or PDA where it really makes the difference between a useful gadget and a piece of crap. A bad user interface cancels all efforts provided by the software and hardware behind. UI is not trivial. The communications between machine and human goes mainly through a screen in one direction, keyboard, buttons, and mouse in the other direction. This is not always intuitive, and that's why UI is critical as machines become more complex.

Unlike Algorithm which is an hard science backed up by the power of mathematics, UI is an cognitive science. There is no formula for UI, only general guidelines for developers and eventually statistical studies bring the most reliable metrics.

Bugs are the dark side of the software. Nobody wants them, they are here. They are just part of the process. In our car metaphor, the bugs are all what don't work as expected from the annoying inside light that don't work when the door is open, to what can cause or aggravate an accident.

UI is a cognitive science, bugs are no science at all... They are unpredictable, and apart research projects around formal provers that are applicable in specific conditions, only intensive test suites can find out the bugs in the general case. But it remains the environment around bug fixing. I mean if we accept software have bugs (and we should do), then let's spare resources and prepare tools to handle them efficiently once they are found.

To take care of bugs, a dedicated team named QA is needed. If QA has a price, the lack of QA has a much higher cost. For most secure applications like embedded software in planes, QA can represent up to 70% of the product price...

The common point between Algorithms, UI, and Bugs is that they are much more easier to handle when they are taken account early in a product life cycle. It's not once the product is out that you start taking care of UI or tests ...or the algorithms!

Tuesday, June 3, 2008

OS scheduler and grilled beef skewers

BBQ
Did you try to cook beef skewers on a BBQ ? A BBQ is not a balanced heat source, and skewers are difficult to turn around. The result is that some pieces are burned while others are still raw.

For sure OS makers are aware of this problem. The proof ? Try to run an infinite loop in a shell, like in bash: while true; do set x 1; done, and look at the task manager to see how busy are your multi-core machine.

The naive answer suggests one core is 100% busy while others are free. However the reality is different. On XP, the OS scheduler happily migrates the infinite loop process from one core to another making all cores partially busy with this infinite process.

Advantages: the only one I see is that load balancing avoids extra heat on one part of the CPU, exactly as if the skewer where regularly turned and moved around all over the grid to be better cooked.

Drawback: the process migrates, which means in addition to context switch overhead the data are copied from one L2 cache to another. Overall time is longer than on a mono-core machine.

Workaround: You can pinpoint a thread on one core and prevent it to migrate elsewhere. This is called "single core affinity".



Tuesday, May 13, 2008

Multi-threading: the mine field is ahead

You think you master multi-thread programming ? CPU makers are asked to improve CPU efficiency, without paying
attention to the software behind who needs to cope with the new features. To put it clear, CPU makers are working on your next traps.
  1. memory access swapping
  2. instruction reordering
  3. asymmetric cores and unfair memory access

A read instruction is a lot faster than a write, and the memory bus cannot perform in both directions at the same time. Memory access, even a read, is still slower than CPU basic instructions. To keep the CPU busy, it may be needed to swap some memory access. For instance if the CPU has enough data in the cache to run without other memory access, it's a good time to perform an expensive write, even if read access are scheduled first. Also, some instructions can be reordered to optimize the processing pipeline and take advantage of the co-processors.

These arrangements inside the CPU are managed by a code analyzer which detects commutativity of sequence of instructions, and independence between variables. This analyzer works in a single thread context, so it's up to the developer to identify multi-thread issues and to forbid the CPU to arrange some parts of the code by synchronizing code and protecting data access.

Examples are numerous, here is a simple one, in a symbolic language

boolean ready = false;
int result = 0;

thread1:
while (!ready) {
Thread.yield();
}
print(result);

thread2:
result = 11;
ready = true;



The human understands quickly the purpose of this program. Thread2 is producing a result, toggling a boolean when completed. The other thread waits for the result to be ready before consuming it. But without any protection, this program can print 11, 0, of even nothing because thread1 loops forever!

That's because if you imagine yourself a single core executing thread2, you don't bother writing back the values result and ready because you don't need them afterward. Only a context switch forces you to flush the cache and then to update the memory location of these variables.

The core running thread1 is even worse. ready and result are not bound by an expression. They are independent so they can be fetched in any order. In particular, if result is read before ready, it prints 0. Also ready, once loaded on the cache may never been refreshed from the main memory, which leads to the infinite loop.

Sometimes it runs just like human expected, and the program prints 11. But that's a lucky execution, actually...

Last mine in our field, for the moment, is the asymetric architecture: a core has some privileges, runs faster or has more cache, or gets higher priority to the memory bus. It's a nice idea. An important thread can run faster than the other. But once again the software is far behind. OS kernel are usually SMP (symetrical multi-processor), and dispatch processes in a fair way across the processors, which is exactly what we don't want. Same for the higher level.

Assuming we have the possibility to pinpoint an important thread in the best core, the developer still have to think again the multi-thread logic to optimize the core assignments. Still good development time to go...

Friday, April 18, 2008

RIA: is Java out ?


I've just read Hybridizing Java from Bruce Eckel, the author of Thinking in Java. Bruce was an early adopter of Java and now he opens fire on the language. How could he change his mind so radically ?

Thinking in Java was my primary book to learn Java, 10 years ago.
I still recommend it to my students or colleagues as it is ideal for developers with already a good knowledge of computer languages. And it's online, and free (all but the last edition, though).

Hybridizing Java is puzzling me. Some statements are obviously wrong or inflated (e.g. more and more sites are not compatible with Firefox, Flex is the only plugin that solves nicely the UI question on the browser side and without installation hiccups). Being a Flex evangelist (and Adobe consultant) shouldn't allow to be so rude with what made him so famous before.

Besides that, some thinkings are right. If a language didn't address correctly some issues in ten years, it's normal users are getting less confident and look elsewhere. This is what happens with RIA (Rich Internet Application). Java applets should dominate the domain, but hasn't. Flex takes it over. Even if Flex is not as powerful as Java today, it may become in the future. At least it allows to create attractive RIA. Flex is good enough, and the quantity of sites that use flex is definitely a proof.

"make simple things easy, and difficult things possible" is a fundamental Java design guideline. Flex simply did it better than Java for RIA.

Friday, March 28, 2008

When guitars meet computers


I am always surprised by the ratio of amateur artists among engineers. For example, we are 5 guitarists among my 10 closest colleagues in the office.

I guess it's likely the need to balance rigid computer logic with forgiving art. Music is a frequent choice, guitar seems the winner over piano (because of yet another keyboard?).

But the interesting point is that one kind of guitar, the electric guitar, wakes up the computer geek when he plugs the guitar into the microphone outlet of a basic sound card: it works.

The geek has just entered into a new kingdom, the DSP (Digital Signal Processing) land. There are tons of software, mainly VST plugins, that simulate digital effects and turn a common PC into the equivalent of hundred of kilos of racks of hardware and kilometers of cable. As described in this page, in French but with a lot of images, the software is rich of attractive GUIs with buttons, sliders, and visualization gadgets. All you need is a computer, a basic sound card, decent speakers, and the software. With very few investments the result is impressive, because the sound is really great.

The geek is now ready for his first quest: the Perfect Sound.

Once he's satisfied with the sound, let's aim to the second quest: the content. Here is the second advantage of the computer: internet is a huge repository of songs, guitar tablatures, guitar lessons, and even video of guitar players.

Then it's not very fun to play alone, and here again the computer helps: it provides an orchestra, playing mp3 or midi songs along the guitar.

And here I take my revenge over the computer. It rejected my program because I forgot a semi-colon, I impose it all my rehearsal with always the same faults at the same place. And when it's finished "play it again, Sam". For once, it follows all what I want it to do.

More precisely, I'm specialized in Pink Floyd's solos, like Is there anybody there, Time, or Fat old sun. I'm very impressed by David Gilmour plays. The solos are usually slow-paced, with few notes, but each one sounds great and contributes to a beautiful harmony... He uses a lot of bends, which consists in pushing the cords on the fret to apply extra tension and then augment the note pitch gradually. Plus very small tempo shift, it gives a solo full of tensions that drag brain attention, followed by a relief as the play catches up back to normal pitch and tempo.

Gilmour's solos looks simple on the paper, but believe me there are very hard to work out with the same touch. Anyway. I'm having a lot of fun with the electric guitar plugged into my pc, the sound presets, the midi orchestra, ...

Thursday, March 20, 2008

The most expensive bug


1996 june 4th, the first european rocket Ariane V exploded 40 seconds after launch. The payload alone cost about US$370 million. The cause ? A bad cast in the software initiated a chain of dramatic errors and led to Ariane destruction.

The full report worths the reading, here is a summary.

The attitude of the rocket is given by an Inertial Reference System (IRS), which is a combination of gyro lasers and accelerometers. This critical piece of hardware sends a stream of data about position, height, speed and acceleration to the main computer, which controls the exhaust pipes and drives the rocket along its expected trajectory.

Ten years ago, on Ariane III, a software function performed pre-flight checks to test alignment of the IRS. This function was no longer used on Ariane IV, but still ran during take off. You know it's easier to leave harmless code than to remove it. This function used 8 variables, 3 of them were not correctly protected although it was not an issue, because the rocket trajectory remains in range of these 3 variables.

No surprise, this function was still working on Ariane V. Unfortunately, Ariane V trajectory was a bit different and now one of the variables, the horizontal velocity, casted from 64bits float to 16bits integer, went out of range and raised an uncaught exception.

So far, no big deal. A check function raised an exception. Let's forget the check function and resume the mission.

However, the assumption on Ariane design was that software is always right and hardware may fail. The software reported an error, interpreted as the SRI was out of order. Then the SRI was shut down.

That's probably the biggest mistake. A failing unit test is embarrassing enough, but doesn't always mean the software is out of business. In this case the SRI still delivered reliable information. Unplugged, it couldn't any more.

The backup SRI started providing replacement data, and was shut down 0.05 second later, because of the same bug. Once again the assumption "hardware may fail, software not" made the backup SRI totally useless in this case.

Without sensible guidance, the rocket was doomed. But to accelerate the disaster, the SRI modules started to send stack traces instead of normal data to the main computer. The computer interpreted the data just as if the rocket was upside down and went into an emergency half turn. It started to tear apart under the physical constraints and initiated self-destruction process.

The story is sad enough like that, no need to add that a suitable test or full simulation before the flight would have found the bug.

By chance, I knew one of the member of the investigation team. He told me something not in the final report: what greatly contributed to kill Ariane V is the absence of experimented computer scientists at top management. The sofware components were simply divided and individually conducted. A competent software supervisor with suitable power could have found one of the errors, and prevented the cast exception to eventually stop the delivery of correct IRS data.

But Ariane was a physicist toy, they didn't share with a software department...

PS: The lesson was positively received. Today Ariane V is very successful and crashed only once more in 37 flights.

Friday, March 14, 2008

Marcel-Paul Schützenberger and complexity

It may happen in your lifetime that you have the luck to meet extraordinary people. MPS is one of them. I attended his lectures when I was a student at the university (Paris 7). This guy is not really famous, he didn't look for fame. We were always less than 10 in the audience. So who is he ?

MPS is a physician, a mathematician, and a computer scientist. Obviously he is excellent in all three domains. His various knowledge gives him a very realistic vision of computer science. Basically, computers were for him a tool to develop mathematics, and have a great future in biology. The last part sounds obvious today, but he said so more than 20 years ago.

But what makes him so attractive during the lectures is that he has stories to tell. As a founder of modern computer science, he met all the other founders all over the world (ok, mainly in USA), invented a very important theorem in language theory, and so he has a lot of nice anecdotes to tell about this pioneer period. I barely remember a couple of them I keep for another blog entry.

For the moment I want to focus on a sentence he said, which is carved in my brain forever:
"There are 2 kinds of program. The short ones and the long ones."
This can be understood different ways. I think the basic idea is that we should keep our distance from computer power and program complexity, and whatever we are trying to develop, after all it's only a computer program, so let's just break it down in sequence of instructions.

For example in the 80's, the hype was on Artificial Intelligence. For a majority of people AI was the most complex applications we can even think about, so much complex that nobody could complete them, btw. Eventually AI died because it didn't fulfill its promises. Some blamed computer performance, although computer power doubles every 18 months it will never catch up with AI complexity which scales up to infinity. Other blamed the poor expressivity of computer languages, too low level. That's better, but no. The real reason that killed AI is the lack of theory. Without theory, no suitable language and no proof of algorithm termination. Then your computer program can be as long as you want, it won't implement correctly the specifications (or you won't be able to prove it). Conclusion is that the program doesn't make it all by itself, it's only a tool and it won't cope with a hole in the theory.

Note that MPS didn't want to under-evaluate complexity in programming. For the developer, the complexity is relative to the constraints s/he has to deal with, about the programming language, tools, software architecture and design, and available resources. But the program is only the implementation of an algorithm, and a correct and efficient algorithm relies on a strong theory.