gmarceau.qc.ca/blog/2009/05/speed-size-and-dependability-of.html
Saturday, May 30, 2009 The speed, size and dependability of programming languages Note: This article has been updated. The earlier version was based on 2005 data, and included data from benchmarks that should not have been included.
The Computer Language Benchmarks Game is a collection of 429 programs, consisting of 13 benchmark reimplemented across 33 programming languages. It is a fantastic resource if you are trying to compare programming languages quantitatively. Which, oddly, very few people seems to be interested in doing.
benchmarks are always flawed and that the whole exercise is pointless. In fact, I've found that The Game is remarkably effective at predicting which forum hosts programmers annoyed at the slowness of their language, and that's good enough for me.
source-code-size metric for each benchmark programs in each language. Thanks to this The Game let us at explore a fascinating aspect of programming language design: the tension that exist between expressiveness and performance. It is this tension that gives the expression "higher-level programming language" a pejorative connotation. When you are coding this high, you might be writing beautiful code, but you are so far away from the hardware you can't possibly get good performance, right? If you drew the benchmark results on an XY chart you could name the four corners. The fast but verbose languages would cluster at the top left. The elegantly concise but sluggish languages would cluster at the bottom right. That is, languages which have since been outclassed by newer languages, unless they offer some quirky attraction that is not captured by the data here. And finally, in the bottom left corner you would find probably nothing, since this is the space of the ideal language, the one which is at the same time fast and short and a joy to use. Each pinkish dot in this chart comes from one language implementing one benchmark solution, so there are 429 dots, minus a few missing implementations. That is, if a particular solution is not the best one, the axis show how many times worse it is when compared to the best. The barrier of dots on the left side means that it is common to have many solutions near the best performer. On the right side and beyond it, there are a number of distant points which are clipped out of view by the edge. The distribution of pink points is more uniform along the Y axis (verbosity) than along the X (slowness), suggesting that the world has not hit a wall in the progression of the expressiveness of programming languages the way it has with performance. Like many scientific datasets, the data coming from The Computer Language Benchmark Game is rich in shapes, insight and stories. In order to retain as much of the shape as possible, it is critical to avoid calculating averages, as averages tend to smooth over the data and hide interesting sources of variation. The average function does to numbers what Gaussian blur does to pictures. One such source of variation that attracted my curiosity was dependability: how well does the language performs across a variety of tasks, such as those composing the benchmark suite? A language might be concise most of the time, but if once a month a quirk of the language forces the code to be five times as large as what it ought to be, it's a problem. In order to show dependability, and to avoid relying on averages and standard deviations, I drew star charts in the following manner.
Scala, which is a mix of functional programming and Java that runs on the JVM. Starting with the previous chart and its 429 dots, I added a gray line from the XY position of each Scala benchmark to the position of the overall average of all the Scala programs. The center of the star is Scala's average performance, and the branches shoot out to the individual benchmarks. On the X axis (slowness), the points often come close to the left wall, showing that Scala can take advantage of the optimizations done by the JVM. But the performance is not consistent, and in one case the performance is all the way to the right. On the Y axis (code size), we see that most of its scores are amongst the background crowd, but some of the faster benchmarks might have needed convolutions to reach the speed they have, including the one data point off the chart high above. The next chart arranges the entire collection of the 33 programming languages available at The Computer Language Benchmark Game into a 6x6 grid. The chart is a so-called 'small multiples' design: each swatch in the grid has the same axes in the same scales as each other. It's the same setup as the one for Scala that we just saw. The intent is to make it easy to compare the shape of the star between languages (across the page), and against the general trend (in the background). The swatch of the languages are grouped into columns according to their overall performance. Thus the fastest languages are in the first column on the left and the slowest are on the right. Within each column the swatches are sorted by average code size, with the best one at the bottom. In this way, the disposition of the grid mimics the axes within the swatches. The languages in the first column all have tall thin pogo-stick stars.
Java stands proudly among that group, having earned its place after 10 years of intense research in run-time optimization. Their code sizes, on the other hand, are spread all over. In the rightmost two columns we find many bushy stars, flat and wide. These are the scripting languages whose communities have not invested as much effort into building optimizing compilers for their language as they have spent tweaking its expressiveness.
Lua, which has always been noted for its good performance among scripting languages, shows a much rounder star in the swatch at (4, 1), counting from the bottom left.
I am told that writing high-performance programs in Haskell is a bit of a black art, and that the tweaks introduced to boost the performance occupy a lot of code space.
Haskell star at (2, 5) is extremely tall, reaching from the very top the very bottom, while having decent performance over all. Clean is a lazy language just like Haskell, but its star is much more compact, especially in code size, as if a huge effort of optimization had paid off and that it is now possible to write performance code naturally in Clean.
V8 implementation of JavaScript (4, 1) can make a claim at having reliable good performance. It will remain to be seen whether it can maintain its good profile as more benchmarks gets implemented. In the following chart, the ordering is the same as in the large chart. Languages which include functional features such as lambda, map, and tail call optimization are highlighted in green. The greens are spread all over, with more presence in the left (top and bottom) than on the right. Ultimately the first factor of performance is the maturity of the implementation.
Sergio : May 31, 2009 11:23 AM YARV and JRuby aren't marked green, but they're exactly the same as Ruby. Actually YARV is basically the forerunner to the official Ruby now. Also, when mentioning Ruby, YARV is a fairer comparison. The uber-slow benchmarks on your charts are from Ruby 18 which is still in common usage but is not representative of Ruby as a whole.
Peter Cooper : May 31, 2009 11:27 AM Thank you for the superb "tabula rasa" of computer program solutions expressed in different languages. Personally I feel a taint of morbid attraction when I pitch a language against another considering that they exist to complement each other, not to deprecate each other away.
Javier : May 31, 2009 11:37 AM I'd like to mention that of the functional features you mentioned, lua has them all except a native map function. But it does have lambda expressions and tail calls, and map can be implemented in like 10 lines.
were made in 2005 and none of which were made later than mid-2008. Some language implementations have changed quite a bit since then. "Flaw" has a range of meaning from "imperfection, blemish" through to "invalidating defect".
Skorgu : May 31, 2009 12:13 PM What, no FORTRAN (or COBOL) ? Obviously a hole in the original data, no...
|