JavaScript Array Performance: Initialize to Optimize

After delving into the issue of JavaScript array performance, I came upon a pair of blog entries on MSDN addressing the topic. These posts describe situations where the then-forthcoming IE8 could provide more performant arrays. The blog also partially confirms, and partially refutes, my thoughts on how JavaScript interpreters handle arrays. The blog’s focus, of course, is the JScript engine in IE, but the advice that it offers seems to work well with other major browsers.

The blog confirms that IE’s JavaScript interpreter manages arrays using a nonlinear structure, essentially a hash table. I’d guessed at this after seeing lower performance in trimOOWB when its lookup table was implemented as a large, and sparse, array. But contrary to my guess, this isn’t about avoiding re-allocation bottlenecks. It’s for dealing with sparse arrays without consuming unreasonable amounts of memory1. (I’m feeling a little sheepish about this: Even though I’d noted the sparseness of the lookup table, I hadn’t made the connection. In fact, I’d been somewhat dismissive of that possibility. Live and learn.)

The blog also says that as of IE8, the JScript engine has heuristics for determining if an array is “dense”. It implements a dense array with a linear, array-like index in addition to its standard hash-like data structure; this index can make random access into the array much faster. The blog indicates that IE8 considers an array to be dense if either of the following conditions is true:

  • You construct the array with an explicit size, and you don’t grow the array past that initial size.
  • After creating the array, you initialize a continuous range of indices, starting from 0, up to and including the highest index that you expect to use. You must do this before writing into random indices within the array.2

As the writer says, using either of these two techniques should ensure that the array heuristics in IE 8 will treat your array as a dense array. The interpreter is free to decide to treat other arrays as dense, too. However, the testing I’ve done suggests that it’s not quite as simple as this. Contrary to the blog’s guidelines, I found that:

  • Creating the array with an explicit size (whether or not followed by initializing the elements) had no measurable impact on the speed of the trim method.
  • Initializing the lookup table’s full range of 12,289 elements (the vast majority of which are not whitespace) did improve performance.

This finding is specific to the trim method, and so these results should not be used as general performance guidelines. The usage pattern for a particular array can greatly affect its performance profile. I’ve found unrelated cases where specifying the array’s size could help or hurt performance. (I’ll write more about this in my next post.)

Following up on these hints led to a new version of trim that’s simpler and faster than my previous effort. This version is actually more closely related to the trim17 method from Yesudeep Mangalapilly than it is to trimOOWB. Hence I’ve named this newer one trim18 to reaffirm its heritage. (In Steve Levithan’s original post on JavaScript trim methods, he assigned numbers to each of the trim implementations that he examined. Yesudeep took up that naming scheme, and now I’m doing so as well.)

An Intermediary Thought

Remember that premature optimization is folly, and all optimization has its costs. In this case, we’re buying performance by fully initializing a 12K JavaScript array. If the interpreter stores this using a hash table as well as an array, then the table will likely consume over 36K of memory. Is the benefit worth the cost? It could be, especially if we’re not running on a cell phone. But please don’t take away from this that you should initialize every array that you create. That would ultimately be self-defeating: One of the worst things you can do for performance is to consume more memory than you really need.

Performance

Since the last post, I’ve revised the benchmark for better precision. The structure is the same—both benchmarks call the three trim methods repeatedly with a fixed set of input data—but in the newer benchmark, the input strings are much longer, and the number of iterations is lower. This should yield more precise measurements of execution time. All the same, these results should be used with care; as is often said in the U.S.A, your mileage will vary.

The benchmark results are shown below. Please remember that these tables are not directly comparable to the numbers in my previous post.

ASCII data (only spaces and tabs used for whitespace)

trim17 trimOOWB % saved vs. trim17 trim18 % saved vs. trim17
Internet Explorer 8 74,453 69,609 6.5 69,922 6.1
Firefox 3.5 6,776 3,732 44.9 3,003 55.7
Google Chrome 3.0 2,530 824 67.4 754 70.2

Unicode data (using all ASCII and Unicode whitespace characters)

trim17 trimOOWB % saved vs. trim17 trim18 % saved vs. trim17
Internet Explorer 8 76,188 74,468 2.3 72,484 4.9
Firefox 3.5 6,779 5,025 25.6 3,029 55.3
Google Chrome 3.0 2,780 940 66.2 810 70.9

trim18 implementation


I left the explicit size in the array constructor call, even though I’d found no benefit from using it. It seems unlikely to cause any harm, and there may be environments where this method’s performance might benefit from it.

var trim18 = (function() {

    var tableSize = 0x3000 + 1;
    var whiteSpace = new Array(tableSize);

    // Initialize the array elements before populating the data.
    // (This may help performance, by hinting to the interpreter that
    //  the array should not be managed as a sparse array.)

    for (var i = 0; i < tableSize; i++) {
        whiteSpace[i] = false;
    }

    whiteSpace[0x0009] = true;  whiteSpace[0x000a] = true;
    whiteSpace[0x000b] = true;  whiteSpace[0x000c] = true;
    whiteSpace[0x000d] = true;  whiteSpace[0x0020] = true;
    whiteSpace[0x0085] = true;  whiteSpace[0x00a0] = true;
    whiteSpace[0x1680] = true;  whiteSpace[0x180e] = true;
    whiteSpace[0x2000] = true;  whiteSpace[0x2001] = true;
    whiteSpace[0x2002] = true;  whiteSpace[0x2003] = true;
    whiteSpace[0x2004] = true;  whiteSpace[0x2005] = true;
    whiteSpace[0x2006] = true;  whiteSpace[0x2007] = true;
    whiteSpace[0x2008] = true;  whiteSpace[0x2009] = true;
    whiteSpace[0x200a] = true;  whiteSpace[0x200b] = true;
    whiteSpace[0x2028] = true;  whiteSpace[0x2029] = true;
    whiteSpace[0x202f] = true;  whiteSpace[0x205f] = true;
    whiteSpace[0x3000] = true;

    function trim18(str) {
        var len = str.length, ws = whiteSpace, i = 0;
        while (ws[str.charCodeAt(--len)]);
        if (++len){
            while (ws[str.charCodeAt(i)]){ ++i; }
        }
        return str.substring(i, len);
    }

    return trim18;
})();

1 It’s also relevant that every JavaScript object needs a hash table to manage properties. Hence sparse arrays can be implemented easily by using this hash table for the same purpose. (Considering the design of JavaScript’s for...in loop statement, it looks as if the language designers intended this.)
2 That is, you would need to initialize the array if you otherwise won’t be populating all of its elements, or if you won’t be populating them in strict order starting from 0. On the other hand, if your script would normally add data to the array starting from index 0 and working up from there, leaving no gaps, then you’re already squared away with IE8’s heuristics.
, ,

One Comment

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>