Here are the results of a little benchmark I ran a while back, trying to figure out how expensive are the different primitive operations in Flash 9 (ie. bytecode compiled ActionScript 3).
Quick note on methodology: I ran these primitive operations in a tight loop of a few million iterations, to get average runtime per iteration (subtracting out the runtime of an empty loop, of course), and then repeated this a handful of times. I’m reporting mean numbers below, and standard deviations only when they’re unusual.
The raw times aren’t very important (benchmarks were done on an old Athlon 2400+ that’s living out its retirement years under my kitchen table). But the proportional differences between them can be surprising - especially the order-of-magnitude ones.
Now, on to some delicious results!
Numbers
Some obligatory numeric operation benchmarks:
Signed integer addition: 6.3ns
Signed integer multiplication: 13.1ns
Signed integer division: 15.3nsUnsigned integer addition: 9.6ns
Unsigned integer multiplication: 28.3ns
Unsigned integer division: 32.4nsDouble addition: 9.4ns
Double multiplication: 7.4ns
Double division: 7.1ns
There’s really no reason not to use doubles for virtually all math operations; they’re much cheaper (possibly because they don’t try to do any bounds checks).
The one exception would be array indexing, where one really wants to guarantee that the index is an integer. Fortunately, conversions are pretty inexpensive as well:
Integer -> Number conversion: 7.3ns
Number -> integer conversion: 5.0ns
Now for some math library operations:
Math.sqrt: 137.9ns
Math.sin: 183.3ns
Math.round: 197.9ns
Math.floor: 160.2ns
That’s pretty good, much faster than regular function calls (see below) - probably because these are native, instead of bottoming out in bytecode like everything else. Still, a square root is on the order of about 20 times more expensive than arithmetic operations.
Function Calls
Speaking of which, here are the regular function calls - just the call, with no body:
Function call (no arguments): 239.8ns
Function call (unary): 257.6ns
Function call (binary): 269.0ns
Pretty standard stuff, with cost increasing (linearly?) as a function of the number of arguments. It’s more expensive than calling native functions, but not significantly higher. Still, it does put an upper limit of about 4 million function calls a second, or 100,000 calls per frame, on my test machine.
By the way, this was completely unexpected:
Anonymous function declaration: 11368.8ns (s = 105.3)
Creating an instance of an anonymous function costs 11 microseconds? Ouch. I’m a big fan of treating everything as lists of data, and map-filtering them with ad-hoc anonymous functions. But now I have to be more careful.
Arrays
For array access, I’ve only checked access of a single array element, many times over. Some numbers:
Writing from variable to array: 69.0ns, s = 1.9
Reading from array to variable: 106.3ns, s = 105.5
Reading from object with integer keys: 72.0ns, s = 1.9
Reading from object with string keys: 242.8ns, s = 1.8
First, I thought it was interesting that an object with integer keys is just as fast as an array with integer keys, suggesting a shared underlying implementation. An object with string keys is slower, however.
Two, the array reading measurement is not only pretty high, it also has a very high standard deviation, as if the measurements included a few really expensive outliers. Am I running into intermittent delays when accessing array data; perhaps cache stalls?
Finally, some results trying to iterate over a large array:
Iteration over array using ‘for each’ special syntax: 8576.8ns, s = 2.4
Iteration over array using int: 20483.5ns, s = 4.6
Iteration over array using uint: 19665.3ns, s = 2.8
Iteration over array using forEach: 31416.8ns, s = 47.7
I’m not susprised that code inside a for loop is faster than code inside a function called via forEach. What I am surprised about, is that the “for each ( … )” special form is much faster than the standard “for ( ; ; )” iteration with array lookup. That’s not what you’d expect in a language like C; I’m curious what’s going on under the hood.
Jono Spiro | 15-Aug-08 at 4:20 pm | Permalink
Anonymous functions/closures have tricky semantics in AS3, which is why you saw such painful numbers.
AS3 is not block-scoped, so all variable definitions are hoisted to the top-level of their scope — in this case, the function body of the surrounding function, not the anonymous function. This is why you cannot have two for loops with the same ‘var i:Number’ definition — you would get a ‘Duplicate name i’ error.
So, when you introduce an anonymous function, the cost of *every* variable (within the outer function down to the deepest variable definition) is increased considerably. You have to do a more expensive lookup. This is why the Flex SDK doesn’t use them, ever
Sometimes, you *need* a closure for an event listener… My solution is to create a simple factory function that takes the values you want to close over, and returns a Function object. This doesn’t get around the cost of creating a closure, and it adds the cost of the factory function call, but it isolates the lookup costs to the factory function.
In small functions, it may be more expensive to use this technique, but I never tested it.
—
As for for-each loops: The special form returns values from an array in unspecified order because the backing object is a hashtable (it sometimes begins life as a dense array, but often becomes a hashtable). Assuming that your for(;;) test was iterating over indexes from 0..len-1, each iteration does a hash lookup. We’d like lookup to be O(1) amortized, but there is considerable hidden overhead in doing the lookup.
The hash function is a quadratic probe, and we have a high load factor (~80%), so a large array has a high probability of collision. To calculate the address, integers indexes must be cast to Number (not terrible, but this is why the Flex SDK uses Number instead of Integer). If you index on a string or the cast fails, we stringify the index, intern it, and use that (!).
You *might* be able to tell if you have a dense array if the ‘for each’ loop returns items in ascending order, but that’s just a guess. You could try it with a 10 element array versus a 1000 element array, perhaps.
Player 10 should improve some of the casting costs and differences in performance between numerical types.
robert | 15-Aug-08 at 9:42 pm | Permalink
This is *fantastic* feedback, and greatly illuminates the inner workings of the language. Thank you for posting this comment!
Jono Spiro | 16-Aug-08 at 2:05 pm | Permalink
Just remembered another gotcha. Though no professional uses them, ‘with’ blocks introduce the same overhead that anonymous functions do, without any of usefulness. You push the with object onto the scope chain, and then have to search the scope chain for every call within that block.
Too bad we didn’t implement them the way VB did, where you could tell which statements used the with object because they all started with a “.” (of course, there would be limitations).
Many equivalent expressions in AS3 are not equivalent in byte size or performance. Many of the convenience mechanisms, like with, are useless because their performance is so poor. I always meant to explore this and actually write posts in my blog, flexamphetamine.
By the way: Have you tested different kinds of variable accesses? I forget what my results were (two years ago…), but this might have been the order of expense (from least): argument variables (strange, huh?), local variables, class variables. I don’t think I tested statics.