Wednesday, November 5, 2008
Who cares about algorithms ?
The real question is: what is the value of an application. Is it the algorithm behind, or the user environment provided by the application. For most applications, the expected level in terms of performance and memory footprint reaches only "good enough". But what makes the difference is indeed the user perception of the application, that is the documentation, the online support, automatic upgrades, ergonomic, etc. In brief: how much the application takes care of the user.
In this perspective, the algorithm is secondary. It needs to be "good enough", since the user won't feel it directly. Years ago, it was different. Machine resources were most important to care about, and only a good algorithm could cope with them. The user interface was at his beginning and expectation in this domain stayed quite low. Today it's different, web technology asks for "instant customer care" and "attractive interface". For example, what makes a good web browser ? A fast and efficient html renderer or an optimized user interface ? IMO, I'm satisfied once the page can be be rendered in less than half a second and I start looking at how easy it is to manage and use bookmarks, plugins, and the overall ergonomic of the browser.
Note also that usually creating an algorithm is a job for one smart person, although it requires a full team to build up a good user environment, including tech writer, ergonomic people, QA, support, etc. Maybe it's difficult to find the person who is able to invent the best algorithm, but it's also very difficult to hire such an heterogeneous team and obtain a successful product at the end.
Friday, October 24, 2008
The lost Beautiful Programming Language
It was old time when a new programming language comes out, trying to be the best of all existing languages, designed by a Guru, aiming too fill a hole, etc. I think the last one was Java. Today when we see what a developer can do with an IDE like eclipse on Java, we really wonder what will the next language be and how can it be better.
However this is not taking account what mankind can do for its own loss. When we can't do better, we do worse.
The web pressure made plenty of "web" languages pop up in the landscape. One of them, Flex, looks very popular. But after playing with it, it becomes clear to me that Flex is backtracking.
Flex comes with Flex builder, a plugin on eclipse. However, the integration in eclipse is very minimal. No refactoring, no code assist, no automatic indentation, complex project settings. Only the editor, the debugger, a profiler, and the expensive wysiwyg designer that generates xml are available. I'm not sure it worths the 400M of my RAM the flex builder is eating. All in all, I miss Java support in Eclipse.
But IDE support is not all. Flex syntax is confusing, especially when it mixes xml and ActionScript. Even the semantic is misleading. Let's take an example:
The right hand part creates an object, actually an "association list". It contains a property of type key/value, where key is "foo" and value is "bar". "foo" is a String literal, but "bar" is supposed to be an object. The value is stored in the association list, but is eventually converted to a string. To use it:
Note the very unusual use of "." and "[ ]". Anyway, this construct is very handy. Basically, an association list is an hashtable.
I let you imagine how much time it needs to figure out this simple feature from the documentation, and how error-prone can be a language with plenty of this kind of counter-intuitive syntax. I know, this is common with scripting language. But can we still speak about scripting languages when it has packages and an object-oriented layer?
So yes, Flex is the royal language for RIA, but no, it's not as slick as Java.
However this is not taking account what mankind can do for its own loss. When we can't do better, we do worse.
The web pressure made plenty of "web" languages pop up in the landscape. One of them, Flex, looks very popular. But after playing with it, it becomes clear to me that Flex is backtracking.
Flex comes with Flex builder, a plugin on eclipse. However, the integration in eclipse is very minimal. No refactoring, no code assist, no automatic indentation, complex project settings. Only the editor, the debugger, a profiler, and the expensive wysiwyg designer that generates xml are available. I'm not sure it worths the 400M of my RAM the flex builder is eating. All in all, I miss Java support in Eclipse.
But IDE support is not all. Flex syntax is confusing, especially when it mixes xml and ActionScript. Even the semantic is misleading. Let's take an example:
var t:Object = { foo:bar } ;
The right hand part creates an object, actually an "association list". It contains a property of type key/value, where key is "foo" and value is "bar". "foo" is a String literal, but "bar" is supposed to be an object. The value is stored in the association list, but is eventually converted to a string. To use it:
t.foo ;
// or
t["foo"] ;
Note the very unusual use of "." and "[ ]". Anyway, this construct is very handy. Basically, an association list is an hashtable.
I let you imagine how much time it needs to figure out this simple feature from the documentation, and how error-prone can be a language with plenty of this kind of counter-intuitive syntax. I know, this is common with scripting language. But can we still speak about scripting languages when it has packages and an object-oriented layer?
So yes, Flex is the royal language for RIA, but no, it's not as slick as Java.
Tuesday, October 14, 2008
I fixed a bug in 37 minutes
Obviously, this is not an outstanding performance. The statement looks very common, however it assumes a lot of things:
- I was able to measure the time it really takes. Actually I use mylin extension of Eclipse to help me drive my work by task. This time, I don't know why, I paid attention to the elapsed time to completion, a built-in feature of mylin I never used before.
- I was not disturbed or distracted during 37mn in a row. It seems to me it has been years I couldn't focus so much time on a single task.
- The bug itself wasn't a big deal, but I think I am the only one on earth to fix it in such a short delay. It's not because I'm super-developer, but because I know the best this part of code, and my developing environment is ready to tackle this kind of bug. Even a genius challenger would need a couple of dozen of minutes only to setup the right environment. Actually the difficulty of this bug was to find out the right place to fix it, and I used the debugger to find this place quickly and accurately, instead of navigating through the sources. The fix itself was simple.
- I searched in google code source an implementation of a method that was part of the solution. I could implement it myself, but it saved me a bit of time.
It's only unfortunate that the time I saved fixing this bug has been wasted writing this post!
Friday, September 19, 2008
Sofware equations
I've told already about:
Here is another one:
The two first, syntax and semantic, are mandatory to define the language. The other ones are simply making it respectively understandable, useful, and usable.
The documentation contains the reference manual, the user manual, tutorials, and sometimes other kind of document that helps the transition from another language to this one. Ususally, a language is not coming up from scratch. It reuses known concepts (e.g. object oriented, multithreads) and can be very close to another existing language. For instance, Java syntax is similar to C, so dedicated books explaining Java for C programmers are very popular.
Libraries are making today the differences between languages. The object oriented paradigm spreads the concept of library driven by an API. And the web technology is a great producer of layers and libraries. As more and more libraries are integrated, the language becomes more and more useful.
IDE is now inevitable. Once a developer has tested a serious IDE like eclipse or visual C++, it's very difficult to go back to antic era where code is written in a text editor, compiled, debugged with an external debug tool (when it exists), and deployed through a shell script. The IDE provides the editor, compiler, and debugger, but also it suggests code patterns, completion, syntax highlighting, online documentation, and provides powerful navigation, packaging, etc.
Software = Algorithms + User Interface + Bugs
Here is another one:
Programming Language = syntax + semantic + doc + libraries + IDE
The two first, syntax and semantic, are mandatory to define the language. The other ones are simply making it respectively understandable, useful, and usable.
The documentation contains the reference manual, the user manual, tutorials, and sometimes other kind of document that helps the transition from another language to this one. Ususally, a language is not coming up from scratch. It reuses known concepts (e.g. object oriented, multithreads) and can be very close to another existing language. For instance, Java syntax is similar to C, so dedicated books explaining Java for C programmers are very popular.
Libraries are making today the differences between languages. The object oriented paradigm spreads the concept of library driven by an API. And the web technology is a great producer of layers and libraries. As more and more libraries are integrated, the language becomes more and more useful.
IDE is now inevitable. Once a developer has tested a serious IDE like eclipse or visual C++, it's very difficult to go back to antic era where code is written in a text editor, compiled, debugged with an external debug tool (when it exists), and deployed through a shell script. The IDE provides the editor, compiler, and debugger, but also it suggests code patterns, completion, syntax highlighting, online documentation, and provides powerful navigation, packaging, etc.
Wednesday, July 2, 2008
flex / silverlight comparison
I've just attended a presentation comparing flex and silverlight. In short, both languages are equivalent in terms of functionalities and libraries, which is not really a surprise... However, the noticeable differences are:
- IDE: Flex builder exploits 10% of eclipse capabilities and is far behind visual studio (but is far less expensive). There is not even automatic indentation!
- performance: current silverlight version (2 beta) is 3 times slower than flex 3! Remember flex is itself 3 times slower than Java.
- deployment: flex plugin is deployed at 97%, silverlight is probably closer to 3%.
Of course these differences holds at this very time. Both platforms try to catch up with the other to become the leader, but no doubt flex is ahead for the moment.
- IDE: Flex builder exploits 10% of eclipse capabilities and is far behind visual studio (but is far less expensive). There is not even automatic indentation!
- performance: current silverlight version (2 beta) is 3 times slower than flex 3! Remember flex is itself 3 times slower than Java.
- deployment: flex plugin is deployed at 97%, silverlight is probably closer to 3%.
Of course these differences holds at this very time. Both platforms try to catch up with the other to become the leader, but no doubt flex is ahead for the moment.
Friday, June 13, 2008
Software = Algorithms + User Interface + Bugs
This point of view holds for desktop applications, but also for any embedded software or more generally, everywhere there is a programmable chip.
The algorithm part is the most obvious for developers. It represents the processing of data and more recently the integration of libraries at different levels to provide compatibility between the various operating systems, languages, and protocols (read for instance: the web is a mess). Algorithms are the engine of the software. When developing resources are low the temptation is high to focus only on this part, because it produces the main output.
User interface is the steering wheel and the dashboard of the software. It becomes more an more an issue especially for portable devices, like a mobile phone or PDA where it really makes the difference between a useful gadget and a piece of crap. A bad user interface cancels all efforts provided by the software and hardware behind. UI is not trivial. The communications between machine and human goes mainly through a screen in one direction, keyboard, buttons, and mouse in the other direction. This is not always intuitive, and that's why UI is critical as machines become more complex.
Unlike Algorithm which is an hard science backed up by the power of mathematics, UI is an cognitive science. There is no formula for UI, only general guidelines for developers and eventually statistical studies bring the most reliable metrics.
Bugs are the dark side of the software. Nobody wants them, they are here. They are just part of the process. In our car metaphor, the bugs are all what don't work as expected from the annoying inside light that don't work when the door is open, to what can cause or aggravate an accident.
UI is a cognitive science, bugs are no science at all... They are unpredictable, and apart research projects around formal provers that are applicable in specific conditions, only intensive test suites can find out the bugs in the general case. But it remains the environment around bug fixing. I mean if we accept software have bugs (and we should do), then let's spare resources and prepare tools to handle them efficiently once they are found.
To take care of bugs, a dedicated team named QA is needed. If QA has a price, the lack of QA has a much higher cost. For most secure applications like embedded software in planes, QA can represent up to 70% of the product price...
The common point between Algorithms, UI, and Bugs is that they are much more easier to handle when they are taken account early in a product life cycle. It's not once the product is out that you start taking care of UI or tests ...or the algorithms!
The algorithm part is the most obvious for developers. It represents the processing of data and more recently the integration of libraries at different levels to provide compatibility between the various operating systems, languages, and protocols (read for instance: the web is a mess). Algorithms are the engine of the software. When developing resources are low the temptation is high to focus only on this part, because it produces the main output.
User interface is the steering wheel and the dashboard of the software. It becomes more an more an issue especially for portable devices, like a mobile phone or PDA where it really makes the difference between a useful gadget and a piece of crap. A bad user interface cancels all efforts provided by the software and hardware behind. UI is not trivial. The communications between machine and human goes mainly through a screen in one direction, keyboard, buttons, and mouse in the other direction. This is not always intuitive, and that's why UI is critical as machines become more complex.
Unlike Algorithm which is an hard science backed up by the power of mathematics, UI is an cognitive science. There is no formula for UI, only general guidelines for developers and eventually statistical studies bring the most reliable metrics.
Bugs are the dark side of the software. Nobody wants them, they are here. They are just part of the process. In our car metaphor, the bugs are all what don't work as expected from the annoying inside light that don't work when the door is open, to what can cause or aggravate an accident.
UI is a cognitive science, bugs are no science at all... They are unpredictable, and apart research projects around formal provers that are applicable in specific conditions, only intensive test suites can find out the bugs in the general case. But it remains the environment around bug fixing. I mean if we accept software have bugs (and we should do), then let's spare resources and prepare tools to handle them efficiently once they are found.
To take care of bugs, a dedicated team named QA is needed. If QA has a price, the lack of QA has a much higher cost. For most secure applications like embedded software in planes, QA can represent up to 70% of the product price...
The common point between Algorithms, UI, and Bugs is that they are much more easier to handle when they are taken account early in a product life cycle. It's not once the product is out that you start taking care of UI or tests ...or the algorithms!
Tuesday, June 3, 2008
OS scheduler and grilled beef skewers
Did you try to cook beef skewers on a BBQ ? A BBQ is not a balanced heat source, and skewers are difficult to turn around. The result is that some pieces are burned while others are still raw.
For sure OS makers are aware of this problem. The proof ? Try to run an infinite loop in a shell, like in bash: while true; do
The naive answer suggests one core is 100% busy while others are free. However the reality is different. On XP, the OS scheduler happily migrates the infinite loop process from one core to another making all cores partially busy with this infinite process.
Advantages: the only one I see is that load balancing avoids extra heat on one part of the CPU, exactly as if the skewer where regularly turned and moved around all over the grid to be better cooked.
Drawback: the process migrates, which means in addition to context switch overhead the data are copied from one L2 cache to another. Overall time is longer than on a mono-core machine.
Workaround: You can pinpoint a thread on one core and prevent it to migrate elsewhere. This is called "single core affinity".
Tuesday, May 13, 2008
Multi-threading: the mine field is ahead
You think you master multi-thread programming ? CPU makers are asked to improve CPU efficiency, without paying
attention to the software behind who needs to cope with the new features. To put it clear, CPU makers are working on your next traps.
A read instruction is a lot faster than a write, and the memory bus cannot perform in both directions at the same time. Memory access, even a read, is still slower than CPU basic instructions. To keep the CPU busy, it may be needed to swap some memory access. For instance if the CPU has enough data in the cache to run without other memory access, it's a good time to perform an expensive write, even if read access are scheduled first. Also, some instructions can be reordered to optimize the processing pipeline and take advantage of the co-processors.
These arrangements inside the CPU are managed by a code analyzer which detects commutativity of sequence of instructions, and independence between variables. This analyzer works in a single thread context, so it's up to the developer to identify multi-thread issues and to forbid the CPU to arrange some parts of the code by synchronizing code and protecting data access.
Examples are numerous, here is a simple one, in a symbolic language
boolean ready = false;
int result = 0;
thread1:
while (!ready) {
Thread.yield();
}
print(result);
thread2:
result = 11;
ready = true;
The human understands quickly the purpose of this program. Thread2 is producing a result, toggling a boolean when completed. The other thread waits for the result to be ready before consuming it. But without any protection, this program can print 11, 0, of even nothing because thread1 loops forever!
That's because if you imagine yourself a single core executing thread2, you don't bother writing back the values result and ready because you don't need them afterward. Only a context switch forces you to flush the cache and then to update the memory location of these variables.
The core running thread1 is even worse. ready and result are not bound by an expression. They are independent so they can be fetched in any order. In particular, if result is read before ready, it prints 0. Also ready, once loaded on the cache may never been refreshed from the main memory, which leads to the infinite loop.
Sometimes it runs just like human expected, and the program prints 11. But that's a lucky execution, actually...
Last mine in our field, for the moment, is the asymetric architecture: a core has some privileges, runs faster or has more cache, or gets higher priority to the memory bus. It's a nice idea. An important thread can run faster than the other. But once again the software is far behind. OS kernel are usually SMP (symetrical multi-processor), and dispatch processes in a fair way across the processors, which is exactly what we don't want. Same for the higher level.
Assuming we have the possibility to pinpoint an important thread in the best core, the developer still have to think again the multi-thread logic to optimize the core assignments. Still good development time to go...
attention to the software behind who needs to cope with the new features. To put it clear, CPU makers are working on your next traps.
- memory access swapping
- instruction reordering
- asymmetric cores and unfair memory access
A read instruction is a lot faster than a write, and the memory bus cannot perform in both directions at the same time. Memory access, even a read, is still slower than CPU basic instructions. To keep the CPU busy, it may be needed to swap some memory access. For instance if the CPU has enough data in the cache to run without other memory access, it's a good time to perform an expensive write, even if read access are scheduled first. Also, some instructions can be reordered to optimize the processing pipeline and take advantage of the co-processors.
These arrangements inside the CPU are managed by a code analyzer which detects commutativity of sequence of instructions, and independence between variables. This analyzer works in a single thread context, so it's up to the developer to identify multi-thread issues and to forbid the CPU to arrange some parts of the code by synchronizing code and protecting data access.
Examples are numerous, here is a simple one, in a symbolic language
boolean ready = false;
int result = 0;
thread1:
while (!ready) {
Thread.yield();
}
print(result);
thread2:
result = 11;
ready = true;
The human understands quickly the purpose of this program. Thread2 is producing a result, toggling a boolean when completed. The other thread waits for the result to be ready before consuming it. But without any protection, this program can print 11, 0, of even nothing because thread1 loops forever!
That's because if you imagine yourself a single core executing thread2, you don't bother writing back the values result and ready because you don't need them afterward. Only a context switch forces you to flush the cache and then to update the memory location of these variables.
The core running thread1 is even worse. ready and result are not bound by an expression. They are independent so they can be fetched in any order. In particular, if result is read before ready, it prints 0. Also ready, once loaded on the cache may never been refreshed from the main memory, which leads to the infinite loop.
Sometimes it runs just like human expected, and the program prints 11. But that's a lucky execution, actually...
Last mine in our field, for the moment, is the asymetric architecture: a core has some privileges, runs faster or has more cache, or gets higher priority to the memory bus. It's a nice idea. An important thread can run faster than the other. But once again the software is far behind. OS kernel are usually SMP (symetrical multi-processor), and dispatch processes in a fair way across the processors, which is exactly what we don't want. Same for the higher level.
Assuming we have the possibility to pinpoint an important thread in the best core, the developer still have to think again the multi-thread logic to optimize the core assignments. Still good development time to go...
Friday, April 18, 2008
RIA: is Java out ?
I've just read Hybridizing Java from Bruce Eckel, the author of Thinking in Java. Bruce was an early adopter of Java and now he opens fire on the language. How could he change his mind so radically ?
Thinking in Java was my primary book to learn Java, 10 years ago.
I still recommend it to my students or colleagues as it is ideal for developers with already a good knowledge of computer languages. And it's online, and free (all but the last edition, though).
Hybridizing Java is puzzling me. Some statements are obviously wrong or inflated (e.g. more and more sites are not compatible with Firefox, Flex is the only plugin that solves nicely the UI question on the browser side and without installation hiccups). Being a Flex evangelist (and Adobe consultant) shouldn't allow to be so rude with what made him so famous before.
Besides that, some thinkings are right. If a language didn't address correctly some issues in ten years, it's normal users are getting less confident and look elsewhere. This is what happens with RIA (Rich Internet Application). Java applets should dominate the domain, but hasn't. Flex takes it over. Even if Flex is not as powerful as Java today, it may become in the future. At least it allows to create attractive RIA. Flex is good enough, and the quantity of sites that use flex is definitely a proof.
"make simple things easy, and difficult things possible" is a fundamental Java design guideline. Flex simply did it better than Java for RIA.
Friday, March 28, 2008
When guitars meet computers
I am always surprised by the ratio of amateur artists among engineers. For example, we are 5 guitarists among my 10 closest colleagues in the office.
I guess it's likely the need to balance rigid computer logic with forgiving art. Music is a frequent choice, guitar seems the winner over piano (because of yet another keyboard?).
But the interesting point is that one kind of guitar, the electric guitar, wakes up the computer geek when he plugs the guitar into the microphone outlet of a basic sound card: it works.
The geek has just entered into a new kingdom, the DSP (Digital Signal Processing) land. There are tons of software, mainly VST plugins, that simulate digital effects and turn a common PC into the equivalent of hundred of kilos of racks of hardware and kilometers of cable. As described in this page, in French but with a lot of images, the software is rich of attractive GUIs with buttons, sliders, and visualization gadgets. All you need is a computer, a basic sound card, decent speakers, and the software. With very few investments the result is impressive, because the sound is really great.
The geek is now ready for his first quest: the Perfect Sound.
Once he's satisfied with the sound, let's aim to the second quest: the content. Here is the second advantage of the computer: internet is a huge repository of songs, guitar tablatures, guitar lessons, and even video of guitar players.
Then it's not very fun to play alone, and here again the computer helps: it provides an orchestra, playing mp3 or midi songs along the guitar.
And here I take my revenge over the computer. It rejected my program because I forgot a semi-colon, I impose it all my rehearsal with always the same faults at the same place. And when it's finished "play it again, Sam". For once, it follows all what I want it to do.
More precisely, I'm specialized in Pink Floyd's solos, like Is there anybody there, Time, or Fat old sun. I'm very impressed by David Gilmour plays. The solos are usually slow-paced, with few notes, but each one sounds great and contributes to a beautiful harmony... He uses a lot of bends, which consists in pushing the cords on the fret to apply extra tension and then augment the note pitch gradually. Plus very small tempo shift, it gives a solo full of tensions that drag brain attention, followed by a relief as the play catches up back to normal pitch and tempo.
Gilmour's solos looks simple on the paper, but believe me there are very hard to work out with the same touch. Anyway. I'm having a lot of fun with the electric guitar plugged into my pc, the sound presets, the midi orchestra, ...
I guess it's likely the need to balance rigid computer logic with forgiving art. Music is a frequent choice, guitar seems the winner over piano (because of yet another keyboard?).
But the interesting point is that one kind of guitar, the electric guitar, wakes up the computer geek when he plugs the guitar into the microphone outlet of a basic sound card: it works.
The geek has just entered into a new kingdom, the DSP (Digital Signal Processing) land. There are tons of software, mainly VST plugins, that simulate digital effects and turn a common PC into the equivalent of hundred of kilos of racks of hardware and kilometers of cable. As described in this page, in French but with a lot of images, the software is rich of attractive GUIs with buttons, sliders, and visualization gadgets. All you need is a computer, a basic sound card, decent speakers, and the software. With very few investments the result is impressive, because the sound is really great.
The geek is now ready for his first quest: the Perfect Sound.
Once he's satisfied with the sound, let's aim to the second quest: the content. Here is the second advantage of the computer: internet is a huge repository of songs, guitar tablatures, guitar lessons, and even video of guitar players.
Then it's not very fun to play alone, and here again the computer helps: it provides an orchestra, playing mp3 or midi songs along the guitar.
And here I take my revenge over the computer. It rejected my program because I forgot a semi-colon, I impose it all my rehearsal with always the same faults at the same place. And when it's finished "play it again, Sam". For once, it follows all what I want it to do.
More precisely, I'm specialized in Pink Floyd's solos, like Is there anybody there, Time, or Fat old sun. I'm very impressed by David Gilmour plays. The solos are usually slow-paced, with few notes, but each one sounds great and contributes to a beautiful harmony... He uses a lot of bends, which consists in pushing the cords on the fret to apply extra tension and then augment the note pitch gradually. Plus very small tempo shift, it gives a solo full of tensions that drag brain attention, followed by a relief as the play catches up back to normal pitch and tempo.
Gilmour's solos looks simple on the paper, but believe me there are very hard to work out with the same touch. Anyway. I'm having a lot of fun with the electric guitar plugged into my pc, the sound presets, the midi orchestra, ...
Thursday, March 20, 2008
The most expensive bug
1996 june 4th, the first european rocket Ariane V exploded 40 seconds after launch. The payload alone cost about US$370 million. The cause ? A bad cast in the software initiated a chain of dramatic errors and led to Ariane destruction.
The full report worths the reading, here is a summary.
The attitude of the rocket is given by an Inertial Reference System (IRS), which is a combination of gyro lasers and accelerometers. This critical piece of hardware sends a stream of data about position, height, speed and acceleration to the main computer, which controls the exhaust pipes and drives the rocket along its expected trajectory.
Ten years ago, on Ariane III, a software function performed pre-flight checks to test alignment of the IRS. This function was no longer used on Ariane IV, but still ran during take off. You know it's easier to leave harmless code than to remove it. This function used 8 variables, 3 of them were not correctly protected although it was not an issue, because the rocket trajectory remains in range of these 3 variables.
No surprise, this function was still working on Ariane V. Unfortunately, Ariane V trajectory was a bit different and now one of the variables, the horizontal velocity, casted from 64bits float to 16bits integer, went out of range and raised an uncaught exception.
So far, no big deal. A check function raised an exception. Let's forget the check function and resume the mission.
However, the assumption on Ariane design was that software is always right and hardware may fail. The software reported an error, interpreted as the SRI was out of order. Then the SRI was shut down.
That's probably the biggest mistake. A failing unit test is embarrassing enough, but doesn't always mean the software is out of business. In this case the SRI still delivered reliable information. Unplugged, it couldn't any more.
The backup SRI started providing replacement data, and was shut down 0.05 second later, because of the same bug. Once again the assumption "hardware may fail, software not" made the backup SRI totally useless in this case.
Without sensible guidance, the rocket was doomed. But to accelerate the disaster, the SRI modules started to send stack traces instead of normal data to the main computer. The computer interpreted the data just as if the rocket was upside down and went into an emergency half turn. It started to tear apart under the physical constraints and initiated self-destruction process.
The story is sad enough like that, no need to add that a suitable test or full simulation before the flight would have found the bug.
By chance, I knew one of the member of the investigation team. He told me something not in the final report: what greatly contributed to kill Ariane V is the absence of experimented computer scientists at top management. The sofware components were simply divided and individually conducted. A competent software supervisor with suitable power could have found one of the errors, and prevented the cast exception to eventually stop the delivery of correct IRS data.
But Ariane was a physicist toy, they didn't share with a software department...
PS: The lesson was positively received. Today Ariane V is very successful and crashed only once more in 37 flights.
Friday, March 14, 2008
Marcel-Paul Schützenberger and complexity
It may happen in your lifetime that you have the luck to meet extraordinary people. MPS is one of them. I attended his lectures when I was a student at the university (Paris 7). This guy is not really famous, he didn't look for fame. We were always less than 10 in the audience. So who is he ?
MPS is a physician, a mathematician, and a computer scientist. Obviously he is excellent in all three domains. His various knowledge gives him a very realistic vision of computer science. Basically, computers were for him a tool to develop mathematics, and have a great future in biology. The last part sounds obvious today, but he said so more than 20 years ago.
But what makes him so attractive during the lectures is that he has stories to tell. As a founder of modern computer science, he met all the other founders all over the world (ok, mainly in USA), invented a very important theorem in language theory, and so he has a lot of nice anecdotes to tell about this pioneer period. I barely remember a couple of them I keep for another blog entry.
For the moment I want to focus on a sentence he said, which is carved in my brain forever:
"There are 2 kinds of program. The short ones and the long ones."
This can be understood different ways. I think the basic idea is that we should keep our distance from computer power and program complexity, and whatever we are trying to develop, after all it's only a computer program, so let's just break it down in sequence of instructions.
For example in the 80's, the hype was on Artificial Intelligence. For a majority of people AI was the most complex applications we can even think about, so much complex that nobody could complete them, btw. Eventually AI died because it didn't fulfill its promises. Some blamed computer performance, although computer power doubles every 18 months it will never catch up with AI complexity which scales up to infinity. Other blamed the poor expressivity of computer languages, too low level. That's better, but no. The real reason that killed AI is the lack of theory. Without theory, no suitable language and no proof of algorithm termination. Then your computer program can be as long as you want, it won't implement correctly the specifications (or you won't be able to prove it). Conclusion is that the program doesn't make it all by itself, it's only a tool and it won't cope with a hole in the theory.
Note that MPS didn't want to under-evaluate complexity in programming. For the developer, the complexity is relative to the constraints s/he has to deal with, about the programming language, tools, software architecture and design, and available resources. But the program is only the implementation of an algorithm, and a correct and efficient algorithm relies on a strong theory.
Friday, March 7, 2008
The Next Programming Language
Everybody knows Moore's law: "Computer performance doubles every 18 months". But programming languages have also their growing law: "A very new language appears every 10 years". "very new" is indeed relative, and should be understood as "language with new features and successful". Let's review the last ones:
- 1972 : C was the first high level language bound to an operating system (unix). For the developer, it means a very fine grain control on the host machine and the ability to use a high level language to program at low level.
- 1983 : C++: C with an object oriented layer. Note that one of the most claimed feature was the portability, which happened to be a disaster (C++ libraries were less compatible than C ones). C++ went to far. Macro, ability to override the operators, multiple inheritance made eventually the applications a nightmare to maintain and to integrate.
- 1996 : Java: Object oriented, native multi-thread, GC, beans, exceptions handling, no macro, no multiple inheritance, clean packaging, portability, applets as the very first browser plugin, etc. A lot of advantages.
- 200* : Web scripting languages. javascript, php, flex, XUL, etc. We can't say that one is the leader, but for sure they all contributed to empower the web and to make the web experience as sophisticated as a full fledged application. Some people says it's rather .NET/C#. Sorry, I disagree. There is no revolution with C#. It stays in between.
- 2010 : What is waiting for us ? My guess is a new language that will support multi-thread as simply as Java managed the memory for the developer.
Multi-threading is really calling for a new language. The multicore architecture need a fine grain control over thread dispatching across the cores. When 2 threads are expected to communicate a lot, they should be running on two close cores so they can share the same L2 cache.
Most of multi-threaded languages, like Java, offers synchronization by locks which is probably simple to implement on the OS, but surely the most difficult for the developer. I really believe there are other viable solutions, like continuous transaction already working for database, which reliefs the developer synchronization logic and all race condition/block/starvation/CPU contention bugs. These bugs are a pain to track down and fix. We have almost no tool to help, and no background theory. Sigh.
Java is my favorite language, but I must admit it's out regarding multi-thread. The new JDK1.5 package java.util.concurrent helps a lot, but basically it doesn't get rid of the complexity.
Beside that, the next language will look very similar to Java, with smart packaging, Object oriented layer, GC, etc., and usual syntax for control statements.
Most of multi-threaded languages, like Java, offers synchronization by locks which is probably simple to implement on the OS, but surely the most difficult for the developer. I really believe there are other viable solutions, like continuous transaction already working for database, which reliefs the developer synchronization logic and all race condition/block/starvation/CPU contention bugs. These bugs are a pain to track down and fix. We have almost no tool to help, and no background theory. Sigh.
Java is my favorite language, but I must admit it's out regarding multi-thread. The new JDK1.5 package java.util.concurrent helps a lot, but basically it doesn't get rid of the complexity.
Beside that, the next language will look very similar to Java, with smart packaging, Object oriented layer, GC, etc., and usual syntax for control statements.
Thursday, February 21, 2008
Multithread: the new era
Multi-threading is not new. This is, for instance, natively supported in Java since the beginning in 1996.
What is new started with this very popular article that appeared in Dr. Dobb's Journal, 30(3), March 2005. Basically the heat produced by the CPU doubles when the clock is 20% faster. Usual fans and radiators are saturated with CPU around 4GHz. It means the clock race is over. But processor manufacturers are smart. If they slow down the CPU by 20%, it produces half the heat, so let's put two of them in the same chip! The dual core is born. Same thermal envelope, 2 cores, 20% slower than mono core version. Moore's law still hold if we consider each core as a contributor of the total machine speed.
By the end of 2007, almost all new machines are equipped with multi-core processors.
From the software point of view, it has 2 major impacts:
- Customers are now running your applications in true multi-threaded environment.
- The multi-core hardware speeds up performance only if the application is multi-threaded.
We are still correcting bugs coming from first point. Even Java made mistakes, for example the famous single thread rule of Swing happens to fail on multi-core machines.
We are only starting to tackle the second point. And we have very few tools and theory background to help us so far.
After the Recursivity and the Object Oriented Languages, now comes the Multi-Threading.
Thursday, February 14, 2008
Function as parameter
All decent programming language allows function calls, with parameters.
Level 1 of parameter is a literal. It's safe, the literal is read-only and has no side effect outside the scope of the function.
Level 2 of parameter is a variable. More useful, but may have side effect if the variable is modified in the body of the function. Even when variables are copied before the function call, they can still points to data that may be modified.
Level 3 is a function. Even more power, but more risks. Some programming languages allow to pass functions as parameter, directly or indirectly (like a parameter of type Runnable in Java).
Why is it more dangerous than variables ?
Let's take a symbolic function, name it delta, with the following body:
However problem occurs when you apply, for instance, delta to itself:
DD = (\x.xx)(\x.xx), and when you apply D to D, you substitute x in the first D by the second D, and you obtain (\x.xx)(\x.xx), i.e. DD.
So no way to escape, when the machine computes DD, the result is still DD and we enter in an infinite loop.
A slight variation, D' = (\x.xxx) is even worth because D'D' -> D'D'D' -> D'D'D'D' -> etc. D'D' is then a growing infinite loop.
So you see the hazard. The body of D (or D') looks safe. There is no explicit infinite loop inside. Actually D explodes only with suitable parameter that behaves as the second half of the bomb. That's something that couldn't occur with level1 or level2 parameter. Careful with that level3 parameter, Eugen.
Level 1 of parameter is a literal. It's safe, the literal is read-only and has no side effect outside the scope of the function.
Level 2 of parameter is a variable. More useful, but may have side effect if the variable is modified in the body of the function. Even when variables are copied before the function call, they can still points to data that may be modified.
Level 3 is a function. Even more power, but more risks. Some programming languages allow to pass functions as parameter, directly or indirectly (like a parameter of type Runnable in Java).
Why is it more dangerous than variables ?
Let's take a symbolic function, name it delta, with the following body:
void delta(x) { x(x); }which reads "delta of x does: x applies to x". In lambda-calculus, it is written D = (\x.xx). I'll stick with this notation as it is more compact. Delta looks harmless. I mean, there is no obvious bug in the body of delta.
However problem occurs when you apply, for instance, delta to itself:
DD = (\x.xx)(\x.xx), and when you apply D to D, you substitute x in the first D by the second D, and you obtain (\x.xx)(\x.xx), i.e. DD.
So no way to escape, when the machine computes DD, the result is still DD and we enter in an infinite loop.
A slight variation, D' = (\x.xxx) is even worth because D'D' -> D'D'D' -> D'D'D'D' -> etc. D'D' is then a growing infinite loop.
So you see the hazard. The body of D (or D') looks safe. There is no explicit infinite loop inside. Actually D explodes only with suitable parameter that behaves as the second half of the bomb. That's something that couldn't occur with level1 or level2 parameter. Careful with that level3 parameter, Eugen.
World Wide Web: The Boostrap Age
Back in 1990, Tim Berners-Lee started the Web with the first web server. At this time internet wasn't mainstream and I was lucky enough to work in a research institute, probably the only one in France to be connected.
At this time internet was used to send email (no spam at all), download files through ftp and read news on newsgroups.
I met several times TBL. He's a pure geek. No surprise it's major invention, the URL, has such an ugly syntax. Ugly, but functional. At least the advantage is that today everybody recognizes an URL and knows what it means, and even basic text editors turn it into an hyperlink connected to a web browser.
This URL is for me what made the WWW possible. It contains all: the protocol, the server, the server port, the document path, anchor, and query. Mixing all this in one "word" is a masterpiece. For example, when the Web is made of one server (actually the one you see in the picture), you should have a good intuition to put the IP of the server as a mandatory item of the URL. And it happens that the URL scaled up to today structure, with billions of servers and web pages...
Next major breakthrough was "mosaic", in 1993. Mosaic is yet another proof that the UI is essential. The Web as we know it today started with mosaic (which became netscape, then firefox), because it attracted people to the Web technology and started the snow ball effect of always more servers, more documents, and then more reasons to join the Web community.
Next one was alta vista search engine (1995), because Web is anarchy and it became very hard to find out something without the URL. Other search engines existed already, but Alta Vista used Scooter, a web crawler to index all reachable web pages. 10 years ago, Google swept away alta vista with the added value of Page Rank that sorts the result of a search in popular order.
No doubt that these three software milestones marked the history of the Web.
At this time internet was used to send email (no spam at all), download files through ftp and read news on newsgroups.
I met several times TBL. He's a pure geek. No surprise it's major invention, the URL, has such an ugly syntax. Ugly, but functional. At least the advantage is that today everybody recognizes an URL and knows what it means, and even basic text editors turn it into an hyperlink connected to a web browser.
This URL is for me what made the WWW possible. It contains all: the protocol, the server, the server port, the document path, anchor, and query. Mixing all this in one "word" is a masterpiece. For example, when the Web is made of one server (actually the one you see in the picture), you should have a good intuition to put the IP of the server as a mandatory item of the URL. And it happens that the URL scaled up to today structure, with billions of servers and web pages...
Next major breakthrough was "mosaic", in 1993. Mosaic is yet another proof that the UI is essential. The Web as we know it today started with mosaic (which became netscape, then firefox), because it attracted people to the Web technology and started the snow ball effect of always more servers, more documents, and then more reasons to join the Web community.
Next one was alta vista search engine (1995), because Web is anarchy and it became very hard to find out something without the URL. Other search engines existed already, but Alta Vista used Scooter, a web crawler to index all reachable web pages. 10 years ago, Google swept away alta vista with the added value of Page Rank that sorts the result of a search in popular order.
No doubt that these three software milestones marked the history of the Web.
Tuesday, February 12, 2008
User Interface
There are several domains in computer science. Database, algorithms, networks, etc. I'm rather specialized in User Interface.
Usually, UI shortcuts Graphical User Interface. However, UI in general means the relationship between users and computers, and as computers are connected together, interfaces between users.
This encompasses software tools to help users communicate and share data, through the computer.
Note that a particular case of user to user communication is when the user is communicating with him/herself, i.e. when using PIM (Personal Information Manager) and other productivity tools that power up the user experience on a computer.
Usually, UI shortcuts Graphical User Interface. However, UI in general means the relationship between users and computers, and as computers are connected together, interfaces between users.
This encompasses software tools to help users communicate and share data, through the computer.
Note that a particular case of user to user communication is when the user is communicating with him/herself, i.e. when using PIM (Personal Information Manager) and other productivity tools that power up the user experience on a computer.
Hello World
I don't know how many blogs start with such a post. I could have set the title to "Yet Another Hello World Initial Post", which is level 2, and same feed of computer joke...
So why I started this blog ? Well, I have things to tell, like everybody else. I'm modest enough to realize I may catch attention of 1 person over a million. That's few. But there are now several millions of people accessing the web now, so I can expect several readers from now on...
Next posts will be more informative. I promise. Just let me get confident with the interface.
Subscribe to:
Posts (Atom)