<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Out Of What Box? &#187; JavaScript arrays</title>
	<atom:link href="http://www.outofwhatbox.com/blog/tag/javascript-arrays/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.outofwhatbox.com/blog</link>
	<description>Ruminations on software and other impossible things</description>
	<lastBuildDate>Thu, 08 Sep 2011 15:57:30 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>JavaScript Array Performance, And Why It Matters</title>
		<link>http://www.outofwhatbox.com/blog/2009/12/javascript-array-performance-and-why-it-matters/</link>
		<comments>http://www.outofwhatbox.com/blog/2009/12/javascript-array-performance-and-why-it-matters/#comments</comments>
		<pubDate>Tue, 01 Dec 2009 15:21:13 +0000</pubDate>
		<dc:creator>Dan Breslau</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[JavaScript arrays]]></category>
		<category><![CDATA[JavaScript performance]]></category>

		<guid isPermaLink="false">http://www.outofwhatbox.com/blog/?p=588</guid>
		<description><![CDATA[Arrays can present an unexpected performance bottleneck in JavaScript. Here I show that array performance is influenced by some surprising details of the array and how it's used.]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.outofwhatbox.com/blog/2009/11/javascript-array-performance-initialize-to-optimize/">My last post</a> described how a JavaScript array had become a performance bottleneck. Here, I&#8217;ll delve further into how some JavaScript programs become array-bound, and how to break that bind when you need to.</p>
<p>Let&#8217;s start with this:</p>
<div class="oowbcenter" ><img title="JavaScript array elements can be much slower than scalar variables" src="http://img.outofwhatbox.com/JSArrayOptimizationII/ArraysVsScalars.png" alt="JavaScript array elements can be much slower than scalar variables" /></div>
<p>As the graph shows, JavaScript array performance ranges from OK (nearly equivalent to scalars) to awful. The reasons for poor performance vary, but most can be boiled down to this: The JavaScript interpreter is essentially left guessing about how an array will be used. And guess it does. When an array is <a href="http://www.outofwhatbox.com/blog/2009/11/javascript-array-performance-initialize-to-optimize/">constructed and initialized</a>, the interpreter can observe where data is stored into a new array. Certain kinds of patterns may nudge it to optimize for better speed, or lower memory consumption. So, performance suffers when the interpreter makes the wrong interpretation.</p>
<p>I looked into performance of JavaScript arrays with three popular Windows browsers: Internet Explorer 8, Google Chrome 3.0, and Firefox 3.5. Broadly speaking, these all seem to look for two or three types of arrays:</p>
<ul>
<li><em>Dense arrays</em>: These provide faster access to individual elements of the array.</li>
<li><em>Sparse arrays</em>: These are typically optimized for lower memory use.
</li>
<li><em>Sized arrays</em>: A variant of sparse arrays that are optimized for speed for certain cases.</li>
</ul>
<p>The interpreters optimize for dense arrays if the array&#8217;s elements are initialized in a continuous range, starting at index 0. This initialization can be done before there&#8217;s any data for the array, by using placeholders such as <code>0</code>, <code>null</code>, or even <code>undefined</code>. (There are other ways to get this optimization; this is simply the most reliable approach.)</p>
<p>Passing the array&#8217;s size in the constructor may also bring better performance. In my testing, I saw this only with sparse arrays in IE8 and (perhaps) in Chrome. I&#8217;ll refer to these as <em>sized arrays</em>; see <a href="#jsarraysII-sized">below</a> for some additional notes.</p>
<p>If an array is created as neither a dense nor a sized array, it&#8217;s treated as a sparse array.</p>
<p>As a general rule, the array&#8217;s behavior is established shortly after construction. If you sparsely populate an array, then assign other values to the remaining elements, you&#8217;re left with a densely-populated sparse array.<a href="#jsarraysII-note-1" name="jsarraysII-ref-1"><sup>1</sup></a></p>
<div class="oowbsidebar">
Hold it there. A <em>densely populated sparse array</em>? Holy <a href="http://www.joelonsoftware.com/articles/LeakyAbstractions.html">leaky abstractions</a>, Batman!</p>
<p>Perhaps we&#8217;re lacking some terms. If we&#8217;re trying to coax better performance from a <em>sparse</em> array by filling it up with placeholder values, that doesn&#8217;t make it a <em>dense</em> array—not from your program&#8217;s perspective, at least. To describe it from the interpreter&#8217;s perspective, I’ll refer to an array that’s initialized after creation—with live or dead data—as a <em>cleared</em> array. An uninitialized array that is populated at random indices is a <em>default</em> array, regardless of how dense it eventually becomes. Finally, a <em>sized</em> array is, well, a sized array. So, that &#8220;densely populated sparse array&#8221; is now a &#8220;densely populated default array.&#8221; That&#8217;s still an awkward phrase, but at least it&#8217;s not an oxymoron.</div>
<p>Now that we can finally get to some data, let&#8217;s discuss scalability. This graph represents the time taken to read about 30,000,000 values from arrays of various sizes, using three Windows browsers. Seven data sets are presented: Cleared and default arrays for each of IE8, Google Chrome 3.0, and Firefox 3.5, plus sized arrays for IE8.</p>
<div class="oowbcenter" ><a href="http://img.outofwhatbox.com/JSArrayOptimizationII/ArraySizeVsRunTimeLarge.png" target="blank"><img title="As array size increases, run time may increase as well. (Click for larger graph with legend)" src="http://img.outofwhatbox.com/JSArrayOptimizationII/ArraySizeVsRunTimeSmall.png" alt="As array size increases, run time may increase as well. (Click for larger graph with legend)" /></a>
<p class="oowb-caption-text">Array Size Vs. Access Time. (Click for larger graph with legend)</p>
</div>
<p>Note that in three of the seven cases, time grows linearly as size increases<a href="#jsarraysII-note-2" name="jsarraysII-ref-2"><sup>2</sup></a>. This growth effectively multiplies the program&#8217;s <a href="http://en.wikipedia.org/wiki/Big_O_notation">complexity</a> by O(N). That is, an algorithm that might normally have O(N) performance instead shows O(N<sup>2</sup>) performance, and so on. This can have a sizable impact on scalability.</p>
<p>The graph makes it clear that the fastest arrays are cleared. However, making a very sparse cleared array would gobble up a large swath of memory for a small number of values. For such cases, you may want to stick with sparse or sized arrays. Here&#8217;s a closer look at their performance.</p>
<p>The following tests all used arrays of 180,000 elements. Each data point represents performance with a different &#8220;hit ratio&#8221;; that is, the ratio of defined values read from the array. While array density also varied, its effect appears in only one case, which I&#8217;ll show separately<a href="#jsarraysII-note-3" name="jsarraysII-ref-3"><sup>3</sup></a>. For reference, these graphs also show the performance with cleared arrays.</p>
<p>This graph shows the performance of sized, default, and cleared arrays in IE8. Notice that the &#8220;sized&#8221; trendline (in blue) begins significantly above the &#8220;default&#8221; trendline, and ends slightly above the &#8220;cleared&#8221; trendline. Clearly, IE optimizes sized arrays to work better when reading defined values.</p>
<div class="oowbcenter" ><a href="http://img.outofwhatbox.com/JSArrayOptimizationII/IEHitsLarge.png" target="blank"><img title="Performance of defined and undefined array elements in IE8. (Click for larger graph with legend)" src="http://img.outofwhatbox.com/JSArrayOptimizationII/IEHitsSmall.png" alt="Performance of defined and undefined array elements in IE8. (Click for larger graph with legend)" /></a>
<p class="oowb-caption-text">Array Hits Vs. Access Time in IE8. (Click for larger graph with legend)</p>
</div>
<p>Firefox 3.5 also looks slower when accessing undefined elements, but there&#8217;s no real gain from passing a size parameter to the array constructor:</p>
<div class="oowbcenter" ><a href="http://img.outofwhatbox.com/JSArrayOptimizationII/FFHitsLarge.png" target="blank"><img title="Performance of defined and undefined array elements in Firefox 3.5. (Click for larger graph with legend)" src="http://img.outofwhatbox.com/JSArrayOptimizationII/FFHitsSmall.png" alt="Performance of defined and undefined array elements in Firefox 3.5. (Click for larger graph with legend)" /></a>
<p class="oowb-caption-text">Array Hits Vs. Access Time in Firefox 3.5. (Click for larger graph with legend)</p>
</div>
<p>In Chrome, array access times are clustered into high and low ranges of values, based on array density. Chrome performs markedly better with arrays having a density of 10% or higher, vs. arrays of lower density. The data sets below have been split accordingly. Since there&#8217;s much more variation <em>between</em> these two ranges than here is <em>among</em> them, it&#8217;s a good bet that the 10% threshold is hard-coded somewhere in Chrome&#8217;s V8 JavaScript interpreter.</p>
<div class="oowbcenter" ><a href="http://img.outofwhatbox.com/JSArrayOptimizationII/ChromeSplitLarge.png" target="blank"><img title="Array Hits Vs. Access Time in Chrome, by array density. (Click for larger graph with legend)" src="http://img.outofwhatbox.com/JSArrayOptimizationII/ChromeSplitSmall.png" alt="Array Hits Vs. Access Time in Chrome, by array density. (Click for larger graph with legend)" /></a>
<p class="oowb-caption-text">Array Hits Vs. Access Time in Chrome, by array density. (Click for larger graph with legend)</p>
</div>
<p><a name="jsarraysII-sized"></a><br />
<h3>Notes on sized arrays</h3>
<p>It makes sense that <i>if</i> a size is passed to the array constructor, <i>and</i> the size value is accurate, then the interpreter can optimize the array for that size. The problem is that the size value won&#8217;t always be accurate; nor does it suggest how dense the array might become. Hence the interpreter may not be able to rely on the size value even when it&#8217;s present.</p>
<p>Sized array behavior in IE8 isn&#8217;t what I&#8217;d expected. I&#8217;d found a note written prior to IE8&#8242;s release by one of its developers, which I had taken to mean that <a href="http://blogs.msdn.com/jscript/archive/2008/03/25/performance-optimization-of-arrays-part-i.aspx">these arrays should perform like cleared arrays</a>, but that&#8217;s clearly not the case. Sized arrays in IE8 actually incur a small <em>penalty</em> when accessing undefined array elements, to the point where if you access <em>only</em> undefined elements, a sized array may be slower than a default array. On a closer reading, the note refers to &#8220;any <em>indexed</em> entry&#8221; within the array&#8217;s range <em>[emphasis mine.]</em> Of course, undefined entries wouldn&#8217;t be indexed. The note was accurate, but arguably didn&#8217;t go far enough.</p>
<p>In Chrome, using an explicit size for a sparse array does seem to make a small but measurable difference in some cases. But on the whole, there&#8217;s little reason to use this for improved performance, especially since Chrome is currently the browser <em>least in need</em> of a performance boost in these tests. </p>
<p>The performance of Firefox 3.5 suggests that it ignores the constructor&#8217;s size parameter. However, a quick trip through the <a href="http://hg.mozilla.org/releases/mozilla-1.9.1">browser&#8217;s source code</a> indicates that the size <i>should</i> make a difference. Perhaps other kinds of tests would be able to draw this out, or perhaps there&#8217;s an opportunity for improvement in the code.</p>
<h3>Memory Consumption</h3>
<p>Measuring the physical size of JavaScript data is a sketchy undertaking. About the best one can do is to compare the virtual memory size of the browser process before and after creating a large array. This isn&#8217;t going to be very accurate. All the same, where there&#8217;s a large difference between values, it&#8217;s probably significant.</p>
<table style="" border="0" cellspacing="0" cellpadding="2" >
<tbody>
<tr class="oowbfirstRow">
<th style="text-align: left;" ><strong><code>Browser</code></strong></th>
<th style="text-align: center;" ><strong><code>Size Estimate, Default / Sized Arrays (Bytes)</code></strong></th>
<th style="text-align: center;" ><strong><code>Size Estimate, Cleared Arrays (Bytes)</code></strong></th>
</tr>
<tr>
<td><strong>Internet Explorer 8</strong></td>
<td style="text-align: left;"><em>(# of values)</em> x (approximately 76)</td>
<td style="text-align: left;"><em>(Length of array)</em> x (approx. 46)</td>
</tr>
<tr>
<td><strong>Firefox 3.5</strong></td>
<td style="text-align: left;"><em>(# of values)</em> x (approx. 63)</td>
<td style="text-align: left;"><em>(Length of array)</em> x (approx. 5)</td>
</tr>
<tr>
<td><strong>Google Chrome 3.0</strong></td>
<td style="text-align: left;"><em>(Length of array)</em> x (30–70)</td>
<td style="text-align: left;"><em>(Length of array)</em> x (approx. 14)</td>
</tr>
</tbody>
</table>
<p>In my last post, I wrote that a sparse 12K array of integers, implemented as a hash table, would likely consume over 36K of memory. If these measurements are reasonable, that was a gross understatement; a sparse 12K array could actually consume more than 70 bytes per element, or upwards of 860K.</p>
<p>At the extremes, for two arrays with the same length, one that&#8217;s very dense may use <em>less</em> memory than one of lower density—even though the lower density array, by definition, contains fewer values. Chrome and Firefox seem to account for this internally as they organize the array structure, but I&#8217;m not sure whether IE8 does.</p>
<h3>Recommendations</h3>
<div class="oowbbtw">Remember that premature optimization is folly, and all optimization has its costs. If you use the hints that I&#8217;ve described, keep in mind that the interpreter isn&#8217;t obliged to obey your intentions. Consider creating a factory method for arrays, so that <del>if</del> <ins>as</ins> the rules change, you can most easily adapt your code to suit.</div>
<ul>
<li>Use cleared arrays if speed is critical, <strong>or</strong> if the array reaches around 50% density. But don&#8217;t use them habitually, especially not for sparse arrays.</li>
<li>Because of IE8&#8242;s quirks, you should think twice before creating sized arrays. They&#8217;re helpful <strong>only if</strong> you have sparse data <strong>and</strong> you won&#8217;t often access an undefined array element.</li>
</ul>
<h3>Looking ahead</h3>
<p>JavaScript&#8217;s arrays are pleasant enough in normal use; but like all abstractions, they have their leaks. As JavaScript is used for increasingly sophisticated applications, it might be worthwhile for its designers to take a fresh look at scalability. There are ways to get the desired results, but negotiation via secret handshake doesn&#8217;t scale terribly well.</p>
<p>Without marring JavaScript&#8217;s simplicity, it should be possible to extend the language so that, <em>when necessary</em>, the developer can make plain to the interpreter what it should expect for a particular array. For example, this could mean adding APIs or syntax that let the developer declare the array&#8217;s expected size, density, and so forth. Or the problem could be addressed from the other direction: Simply allow the developer to request a structure that favors higher speed, or more compactness, for a particular array.</p>
<p>And although read-only objects may be a special case, I started down this path by looking at an <a href="http://www.outofwhatbox.com/blog/2009/11/trimming-trim-via-razing-arrays-javascript/">array used as a lookup table</a> which is effectively read-only after initialization. It would be nice if I could let the interpreter know this. A version of <a href="http://ruby-doc.org/core/classes/Object.html#M000356">Ruby&#8217;s <code>freeze</code> method</a> would fit the bill. This would give the interpreter a hint to optimize the array for read-only access, though it wouldn&#8217;t be required to do so.</p>
<p>Dreaming further, I&#8217;m also holding out hope that closures can be better optimized. After initialization, the lookup table in my version of <code>String.trim()</code> was only accessible <a href="http://www.outofwhatbox.com/blog/2009/11/trimming-trim-via-razing-arrays-javascript/">to a single, read-only method</a>. If the interpreter can verify that nothing&#8217;s going to change the array, it could move it to faster storage. Yes, this brings us back around to secret handshakes. But since closures have their own merits, I see this more as the icing on the cake.</p>
<p />
<hr />
<div><a name="jsarraysII-note-1" href="#jsarraysII-ref-1"><sup>1</sup></a> Making a fresh copy using <code>slice(0)</code> should get you better performance. (Well, it&#8217;s worked for me, but I make no promises.)</p>
<p><a name="jsarraysII-note-2" href="#jsarraysII-ref-2"><sup>2</sup></a> There are really four cases, but the fourth one (cleared arrays in IE8) shows <em>very</em> slow growth in runtime. Also note that the numbers suggest that Chrome&#8217;s performance <em>improves</em> slightly as array size increases. This may be an artifact of the benchmark.</p>
<p><a name="jsarraysII-note-3" href="#jsarraysII-ref-3"><sup>3</sup></a> Density and hit ratio can be covariant, but here they aren&#8217;t. That is, these tests were designed so that, as long as the density was under 100%, the hit ratio could vary independently.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.outofwhatbox.com/blog/2009/12/javascript-array-performance-and-why-it-matters/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>JavaScript Array Performance: Initialize to Optimize</title>
		<link>http://www.outofwhatbox.com/blog/2009/11/javascript-array-performance-initialize-to-optimize/</link>
		<comments>http://www.outofwhatbox.com/blog/2009/11/javascript-array-performance-initialize-to-optimize/#comments</comments>
		<pubDate>Tue, 17 Nov 2009 16:39:22 +0000</pubDate>
		<dc:creator>Dan Breslau</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[JavaScript arrays]]></category>
		<category><![CDATA[JavaScript performance]]></category>

		<guid isPermaLink="false">http://www.outofwhatbox.com/blog/?p=584</guid>
		<description><![CDATA[The JavaScript interpreters in the most popular browsers distinguish between sparse and dense arrays. Hence, you may get more  "array-like" performance if you initialize an array before using it. But this trades memory for speed, so it may not always be the best choice.]]></description>
			<content:encoded><![CDATA[<p>After delving into the issue of <a href="http://www.outofwhatbox.com/blog/2009/11/trimming-trim-via-razing-arrays-javascript/">JavaScript array performance</a>, I came upon a <a href="http://blogs.msdn.com/jscript/archive/2008/04/08/performance-optimization-of-arrays-part-ii.aspx">pair of</a> <a href="http://blogs.msdn.com/jscript/archive/2008/03/25/performance-optimization-of-arrays-part-i.aspx">blog entries</a> on MSDN addressing the topic. These posts describe situations where the then-forthcoming IE8 could provide more performant arrays. The blog also partially confirms, and partially refutes, my thoughts on how JavaScript interpreters handle arrays. The blog&#8217;s focus, of course, is the JScript engine in IE, but the advice that it offers seems to work well with other major browsers.</p>
<p>The blog confirms that IE&#8217;s JavaScript interpreter manages arrays using a nonlinear structure, essentially a hash table. I&#8217;d guessed at this after seeing lower performance in <code><a href="http://www.outofwhatbox.com/blog/2009/11/trimming-trim-via-razing-arrays-javascript/">trimOOWB</a></code> when its lookup table was implemented as a large, and sparse, array. But contrary to my guess, this isn&#8217;t about avoiding re-allocation bottlenecks. It&#8217;s for dealing with <a href="http://www.dragonthoughts.biz/technical/sparsearray.html">sparse arrays</a> without consuming unreasonable amounts of memory<sup><a href="#jsarray-init-note-1" name="jsarray-init-ref-1">1</a></sup>. (I&#8217;m feeling a little sheepish about this: Even though I&#8217;d noted the sparseness of the lookup table, I hadn&#8217;t made the connection. In fact, I&#8217;d been somewhat dismissive of that possibility. <a href="http://www.outofwhatbox.com/blog/2009/04/how_to_make_mistakes/">Live and learn</a>.)</p>
<p>The blog also says that as of IE8, the JScript engine has heuristics for determining if an array is &#8220;dense&#8221;. It implements a dense array with a linear, array-like index in addition to its standard hash-like data structure; this index can make random access into the array much faster. The blog indicates that IE8 considers an array to be dense if <strong>either</strong> of the following conditions is true:</p>
<ul>
<li>You construct the array with an explicit size, and you don&#8217;t grow the array past that initial size.</li>
<li>After creating the array, you initialize a continuous range of indices, starting from 0, up to and including the highest index that you expect to use. You must do this before writing into random indices within the array.<sup><a href="#jsarray-init-note-2" name="jsarray-init-ref-2">2</a></sup></li>
</ul>
<p>As the writer says, using either of these two techniques should <em>ensure</em> that the array heuristics in IE 8 will treat your array as a dense array. The interpreter is free to decide to treat other arrays as dense, too. However, the testing I&#8217;ve done suggests that it&#8217;s not quite as simple as this. Contrary to the blog&#8217;s guidelines, I found that:</p>
<ul>
<li>Creating the array with an explicit size (whether or not followed by initializing the elements) <em><strong>had no measurable impact</strong></em> on the speed of the <code>trim</code> method.
</li>
<li>Initializing the lookup table&#8217;s full range of 12,289 elements (the vast majority of which are not whitespace) <em><strong>did</strong></em> improve performance.
</li>
</ul>
<p>This finding is specific to the trim method, and so these results <strong>should not be used as general performance guidelines.</strong> The usage pattern for a particular array can greatly affect its performance profile. I&#8217;ve found unrelated cases where specifying the array&#8217;s size could help <em><strong>or</strong></em> hurt performance. (I&#8217;ll write more about this in my next post.)</p>
<p>Following up on these hints led to a <a href="#jsarray-trim18-implementation">new version of </a><a href="#jsarray-trim18-implementation"><code>trim</code></a> that&#8217;s simpler and faster than my previous effort. This version is actually more closely related to the <code><a href="http://yesudeep.wordpress.com/2009/07/31/even-faster-string-prototype-trim-implementation-in-javascript/">trim17</a></code> method from Yesudeep Mangalapilly than it is to <code>trimOOWB</code>. Hence I&#8217;ve named this newer one <code>trim18</code> to reaffirm its heritage. (In Steve Levithan&#8217;s original post on <a href="http://blog.stevenlevithan.com/archives/faster-trim-javascript">JavaScript trim methods</a>, he assigned numbers to each of the <code>trim</code> implementations that he examined. Yesudeep took up that naming scheme, and now I&#8217;m doing so as well.)</p>
<div class="oowbbtw">
<h3>An Intermediary Thought</h3>
<p>Remember that premature optimization is folly, and all optimization has its costs. In this case, we&#8217;re buying performance by fully initializing a 12K JavaScript array. If the interpreter stores this using a hash table <em>as well as</em> an array, then the table will likely consume over 36K of memory. Is the benefit worth the cost? It could be, especially if we&#8217;re not running on a cell phone. But <strong>please</strong> don&#8217;t take away from this that you should initialize <strong>every</strong> array that you create. That would ultimately be self-defeating: One of the worst things you can do for performance is to consume more memory than you really need.</div>
<h2>Performance</h2>
<p>Since the last post, I&#8217;ve revised the benchmark for better precision. The structure is the same—both benchmarks call the three <code>trim</code> methods repeatedly with a fixed set of input data—but in the newer benchmark, the input strings are much longer, and the number of iterations is lower. This should yield more precise measurements of execution time. All the same, these results should be used with care; as is often said in the U.S.A, your mileage will vary. </p>
<p>The benchmark results are shown below. Please remember that these tables are not directly comparable to the numbers in my previous post.</p>
<h3>ASCII data (only spaces and tabs used for whitespace)</h3>
<table style="" border="0" cellspacing="0" cellpadding="2" >
<tbody>
<tr class="oowbfirstRow">
<th style="text-align: right;"></th>
<th style="text-align: right;" ><strong><code>trim17</code></strong></th>
<th style="text-align: right;" ><strong><code>trimOOWB</code></strong></th>
<th style="text-align: right;" ><strong><code>% saved vs. trim17</code></strong></th>
<th style="text-align: right;" ><strong><code>trim18</code></strong></th>
<th style="text-align: right;" ><strong><code>% saved vs. trim17</code></strong></th>
</tr>
<tr>
<td><strong>Internet Explorer 8</strong></td>
<td style="text-align: right;">74,453</td>
<td style="text-align: right;">69,609</td>
<td style="text-align: right;">6.5</td>
<td style="text-align: right;" >69,922</td>
<td style="text-align: right;" >6.1</td>
</tr>
<tr>
<td><strong>Firefox 3.5</strong></td>
<td style="text-align: right;">6,776</td>
<td style="text-align: right;">3,732</td>
<td style="text-align: right;">44.9</td>
<td style="text-align: right;" >3,003</td>
<td style="text-align: right;" >55.7</td>
</tr>
<tr>
<td><strong>Google Chrome 3.0</strong></td>
<td style="text-align: right;">2,530</td>
<td style="text-align: right;">824</td>
<td style="text-align: right;">67.4</td>
<td style="text-align: right;" >754</td>
<td style="text-align: right;" >70.2</td>
</tr>
</tbody>
</table>
<h3>Unicode data (using all ASCII and Unicode whitespace characters)</h3>
<table style="" border="0" cellspacing="0" cellpadding="2" >
<tbody>
<tr class="oowbfirstRow">
<th style="text-align: right;"></th>
<th style="text-align: right;" ><strong><code>trim17</code></strong></th>
<th style="text-align: right;" ><strong><code>trimOOWB</code></strong></th>
<th style="text-align: right;" ><strong><code>% saved vs. trim17</code></strong></th>
<th style="text-align: right;" ><strong><code>trim18</code></strong></th>
<th style="text-align: right;" ><strong><code>% saved vs. trim17</code></strong></th>
</tr>
<tr>
<td><strong>Internet Explorer 8</strong></td>
<td style="text-align: right;">76,188</td>
<td style="text-align: right;">74,468</td>
<td style="text-align: right;">2.3</td>
<td style="text-align: right;" >72,484</td>
<td style="text-align: right;" >4.9</td>
</tr>
<tr>
<td><strong>Firefox 3.5</strong></td>
<td style="text-align: right;">6,779</td>
<td style="text-align: right;">5,025</td>
<td style="text-align: right;">25.6</td>
<td style="text-align: right;" >3,029</td>
<td style="text-align: right;" >55.3</td>
</tr>
<tr>
<td><strong>Google Chrome 3.0</strong></td>
<td style="text-align: right;">2,780</td>
<td style="text-align: right;">940</td>
<td style="text-align: right;">66.2</td>
<td style="text-align: right;" >810</td>
<td style="text-align: right;" >70.9</td>
</tr>
</tbody>
</table>
<p><a name="jsarray-trim18-implementation"><br />
<h2>trim18 implementation</h2>
<p></a><br />
<em>I left the explicit size in the array constructor call, even though I&#8217;d found no benefit from using it. It seems unlikely to cause any harm, and there may be environments where this method&#8217;s performance might benefit from it.</em></p>
<pre class="brush: jscript;">
var trim18 = (function() {

    var tableSize = 0x3000 + 1;
    var whiteSpace = new Array(tableSize);

    // Initialize the array elements before populating the data.
    // (This may help performance, by hinting to the interpreter that
    //  the array should not be managed as a sparse array.)

    for (var i = 0; i &lt; tableSize; i++) {
        whiteSpace[i] = false;
    }

    whiteSpace[0x0009] = true;  whiteSpace[0x000a] = true;
    whiteSpace[0x000b] = true;  whiteSpace[0x000c] = true;
    whiteSpace[0x000d] = true;  whiteSpace[0x0020] = true;
    whiteSpace[0x0085] = true;  whiteSpace[0x00a0] = true;
    whiteSpace[0x1680] = true;  whiteSpace[0x180e] = true;
    whiteSpace[0x2000] = true;  whiteSpace[0x2001] = true;
    whiteSpace[0x2002] = true;  whiteSpace[0x2003] = true;
    whiteSpace[0x2004] = true;  whiteSpace[0x2005] = true;
    whiteSpace[0x2006] = true;  whiteSpace[0x2007] = true;
    whiteSpace[0x2008] = true;  whiteSpace[0x2009] = true;
    whiteSpace[0x200a] = true;  whiteSpace[0x200b] = true;
    whiteSpace[0x2028] = true;  whiteSpace[0x2029] = true;
    whiteSpace[0x202f] = true;  whiteSpace[0x205f] = true;
    whiteSpace[0x3000] = true;

    function trim18(str) {
        var len = str.length, ws = whiteSpace, i = 0;
        while (ws[str.charCodeAt(--len)]);
        if (++len){
            while (ws[str.charCodeAt(i)]){ ++i; }
        }
        return str.substring(i, len);
    }

    return trim18;
})();
</pre>
<hr />
<div><sup><a name="jsarray-init-note-1" href="#jsarray-init-ref-1">1</a></sup> It&#8217;s also relevant that <em>every</em> JavaScript object needs a hash table to manage properties. Hence sparse arrays can be implemented easily by using this hash table for the same purpose. (Considering the design of JavaScript&#8217;s <code>for...in</code> loop statement, it looks as if the language designers intended this.)</div>
<div><sup><a name="jsarray-init-note-2" href="#jsarray-init-ref-2">2</a></sup> That is, you would need to initialize the array if you otherwise won&#8217;t be populating all of its elements, or if you won&#8217;t be populating them in strict order starting from 0. On the other hand, if your script would normally add data to the array starting from index 0 and working up from there, leaving no gaps, then you&#8217;re already squared away with IE8&#8242;s heuristics.</div>
]]></content:encoded>
			<wfw:commentRss>http://www.outofwhatbox.com/blog/2009/11/javascript-array-performance-initialize-to-optimize/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Trimming trim via razing arrays (JavaScript)</title>
		<link>http://www.outofwhatbox.com/blog/2009/11/trimming-trim-via-razing-arrays-javascript/</link>
		<comments>http://www.outofwhatbox.com/blog/2009/11/trimming-trim-via-razing-arrays-javascript/#comments</comments>
		<pubDate>Thu, 05 Nov 2009 01:08:53 +0000</pubDate>
		<dc:creator>Dan Breslau</dc:creator>
				<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[JavaScript arrays]]></category>
		<category><![CDATA[JavaScript performance]]></category>

		<guid isPermaLink="false">http://www.outofwhatbox.com/blog/?p=576</guid>
		<description><![CDATA[It may well be that for better JavaScript performance, large arrays should be avoided where possible. That seems fairly clear in this case, where an already fast implementation of the missing String.trim() method was made even faster by using much smaller lookup table arrays.]]></description>
			<content:encoded><![CDATA[<p>Back in 2007, Steve Levithan <a href="http://blog.stevenlevithan.com/archives/faster-trim-javascript">compared the speed</a> of different implementations for the missing JavaScript <code>String.trim()</code> function. Steve&#8217;s blog post has launched The Comment Thread That Will Not Die, as a number of folks have been tempted to try their hand at writing their own implementation.</p>
<p>Count me in.</p>
<p>It started when <a href="http://yesudeep.wordpress.com/2009/07/31/even-faster-string-prototype-trim-implementation-in-javascript/">Yesudeep Mangalapilly&#8217;s version</a> caught my attention. Yesudeep, working with an idea from <a href="http://blog.stevenlevithan.com/archives/faster-trim-javascript#comment-25052">Michael Lee Finney</a>, had a fast implementation that didn&#8217;t use regular expressions. Instead of regexps, Yesudeep&#8217;s and Michael&#8217;s versions scan the string one character at a time, from the front and back ends, checking each character against a lookup table to determine if it&#8217;s whitespace.</p>
<p>However: The largest Unicode code point that&#8217;s counted as whitespace is <a href="http://unicode.org/charts/PDF/U3000.pdf">U+3000 <em>(pdf)</em></a> (12288 in decimal), the <a href="http://en.wikipedia.org/wiki/Space_%28punctuation%29">Ideographic Space</a> character. Hence, the lookup table array in Michael&#8217;s and Yasudeep&#8217;s implementations has a length of 12289, with most entries undefined. That&#8217;s a pretty large array, and a pretty sparse one.</p>
<p>Even though these were already among the fastest of the <code>trim</code>s, I wondered if using a large array as a lookup table might carry any performance penalty. My concern stemmed from the fact that <a href="https://developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Objects/Array#Increasing_the_array_length_indirectly">JavaScript arrays grow dynamically</a>, adjusting in size to hold the highest index assigned into them. This poses a challenge to the interpreter: If it always places an array in a linear block of memory (as in C++), then accommodating array growth is likely to be a problem. So, to allow for reasonable performance at the array grows, the interpreter might not use a linear storage model for arrays. Non-linear models (trees or linked lists, for example) may make random access to the array slower, but would allow for reasonable performance when growing the array, while exhibiting reasonable memory consumption<sup><a name="trim-ref-mem" href="#trim-note-mem">1</a></sup>.</p>
<p>Through testing in three popular browsers, I found reason to be concerned about large arrays. I profiled the <code>trim17</code> method (Yasudeep&#8217;s implementation), using input strings that contained only spaces and tabs for whitespace. After this profiling run, <a name="ReducedArraySize">I trimmed <code>trim17</code>&#8216;s lookup table</a>—hacked it, really—by removing all entries above U+0020, and so limiting it to recognizing only ASCII whitespace chars. Then I profiled it again. </p>
<p>The table below shows the milliseconds spent within the two versions of <code>trim17</code>; the difference between their runtimes is most likely due to the change in size of the lookup table array.</p>
<table style="height: 77px;" border="0" cellspacing="0" cellpadding="0" width="517">
<tbody>
<tr class="oowbfirstRow">
<th></th>
<th style="text-align: right;" ><strong>Original <code>trim17</code></strong></th>
<th style="text-align: right;" ><strong>Reduced <code>trim17</code></strong></th>
<th style="text-align: right;" ><strong>% Saved</strong></th>
</tr>
<tr>
<td><strong>Internet Explorer 8</strong></td>
<td>
<p align="right">27,328</p>
</td>
<td>
<p align="right">20,281</p>
</td>
<td>
<p align="right">26</p>
</td>
</tr>
<tr>
<td><strong>Firefox 3.5</strong></td>
<td>
<p align="right">3,689</p>
</td>
<td>
<p align="right">2,978</p>
</td>
<td>
<p align="right">20</p>
</td>
</tr>
<tr>
<td><strong>Chrome 3.0</strong></td>
<td>
<p align="right">610</p>
</td>
<td>
<p align="right">191</p>
</td>
<td>
<p align="right">69</p>
</td>
</tr>
</tbody>
</table>
<p>These results confirm that, in JavaScript, accessing larger arrays can be slower than accessing smaller arrays.</p>
<hr style="margin:1em 0"/>
<p>But, perhaps applying these results to all arrays is an overgeneralization. Ideally, at least, it should be possible for an interpreter to recognize a &#8220;read-only&#8221; array, and use a more efficient layout for it. That is, if the interpreter can verify that an array isn&#8217;t modified after its initial construction, then perhaps it can safely flatten out the array into a linear block of memory.</p>
<p>The array in <code>trim17</code> was constructed as a property of the <code>String</code> prototype. An array couldn&#8217;t be more modifiable than that, and so I wouldn&#8217;t expect it to be flattened by the interpreter. But suppose the array were accessible only from within a single function (a <a href="https://developer.mozilla.org/en/Core_JavaScript_1.5_Guide/Working_with_Closures">closure</a>). In that case, if that single function isn&#8217;t modifying the array, we know that nothing will. Depending on the JavaScript interpreter, that might allow for better performance.</p>
<p>I changed the code accordingly, but the actual improvement in speed was&#8230; unremarkable. Nonexistent, even. It may have eked out around a 5% gain in some tests, but there&#8217;s enough noise in the measurements that it&#8217;s hard to be sure. Still, I dislike globals (and, especially, modifiable globals), so I decided to stick with the closure. (Besides, maybe someday, somewhere, an optimizing interpreter will know how to make use of it.)</p>
<p>The next approach was to try using a smaller array. Or, rather, two smaller arrays: One to represent the whitespace characters at and below U+0020, and another to represent the whitespace characters between U+2000 and U+205f. To keep the second array small, its indices are offset by <code>–0x2000</code>; this gives it a size of <code>0x0060</code> (96 decimal) entries. (There are three whitespace values that aren&#8217;t in either array: U+1680, U+180e, and U+3000. The new code checks for these explicitly.)</p>
<p>Even with smaller arrays, I found that random access into them is still slower than making a few comparisons on scalar variables. Hence the code is written so that, for any character value, it consults no more than one of the two lookup tables, and then only if the character is in a reasonable range for that table.</p>
<p>Here&#8217;s the new method:</p>
<pre class="brush: jscript;">
var trimOOWB = (function() {

 var whiteSpace = new Array(0x00a0 + 1);
 whiteSpace[0x0009] = true;    whiteSpace[0x000a] = true;
 whiteSpace[0x000b] = true;    whiteSpace[0x000c] = true;
 whiteSpace[0x000d] = true;    whiteSpace[0x0020] = true;
 whiteSpace[0x0085] = true;    whiteSpace[0x00a0] = true;

 var whiteSpace2 = new Array(0x005f + 1);
 var base = 0x2000;
 whiteSpace2[0x2000 - base] = true;  whiteSpace2[0x2001 - base] = true;
 whiteSpace2[0x2002 - base] = true;  whiteSpace2[0x2003 - base] = true;
 whiteSpace2[0x2004 - base] = true;  whiteSpace2[0x2005 - base] = true;
 whiteSpace2[0x2006 - base] = true;  whiteSpace2[0x2007 - base] = true;
 whiteSpace2[0x2008 - base] = true;  whiteSpace2[0x2009 - base] = true;
 whiteSpace2[0x200a - base] = true;  whiteSpace2[0x200b - base] = true;
 whiteSpace2[0x2028 - base] = true;  whiteSpace2[0x2029 - base] = true;
 whiteSpace2[0x202f - base] = true;  whiteSpace2[0x205f - base] = true;

    function trimOOWB2(str) {
        var ws = whiteSpace, ws2 = whiteSpace2;
        var i=0, len=str.length, ch;
        while ((ch = str.charCodeAt(--len)) &amp;amp;&amp;amp;
               (ch &amp;lt;= 0x00A0 ? ws[ch] :
                (ch &amp;gt;= 0x2000 ? (ch===0x3000 || ws2[ch - 0x2000])
                 : (ch===0x1680 || ch===0x180e ))))
            ;

        if (++len) {
            while ((ch = str.charCodeAt(i)) &amp;amp;&amp;amp;
                   (ch &amp;lt;= 0x00A0 ? ws[ch] :
                    (ch &amp;gt;= 0x2000 ? (ch===0x3000 || ws2[ch - 0x2000])
                     : (ch===0x1680 || ch===0x180e )))) {
                ++i;
            }
        }
        return str.substring(i, len);
    }

    return trimOOWB2;
})();
</pre>
<p>I benchmarked the new function (<code>trimOOWB</code>) and Yesudeep&#8217;s <code>trim17</code> function, using two sets of input strings. In the first test, only legacy ASCII whitespace characters (e.g., spaces and tabs) were used, as in the test above. The second test data set used the full set of whitespace in the Unicode character set. The numbers shown are milliseconds spent within the trim functions; lower is better.</p>
<table style="" border="0" cellspacing="0" cellpadding="2" >
<tbody>
<tr class="oowbfirstRow">
<th style="text-align: right;"></th>
<th style="text-align: right;" ><strong><code>trim17 (ASCII)</code></strong></th>
<th style="text-align: right;" ><strong><code>trimOOWB (ASCII)</code></strong></th>
<th style="text-align: right;" ><strong><code>% saved</code></strong></th>
<th style="text-align: right;" ><strong><code>trim17 (Unicode)</code></strong></th>
<th style="text-align: right;" ><strong><code>trimOOWB (Unicode)</code></strong></th>
<th style="text-align: right;" ><strong><code>% saved</code></strong></th>
</tr>
<tr>
<td><strong>Internet Explorer 8</strong></td>
<td style="text-align: right;">25,953</td>
<td style="text-align: right;">22,359</td>
<td style="text-align: right;">14</td>
<td style="text-align: right;" >27,469</td>
<td style="text-align: right;" >26,500</td>
<td style="text-align: right;" >4</td>
</tr>
<tr>
<td><strong>Firefox 3.5</strong></td>
<td style="text-align: right;">3,706</td>
<td style="text-align: right;">3,089</td>
<td style="text-align: right;">17</td>
<td style="text-align: right;" >3,831</td>
<td style="text-align: right;" >3,439</td>
<td style="text-align: right;" >10</td>
</tr>
<tr>
<td><strong>Google Chrome 3.0</strong></td>
<td style="text-align: right;">604</td>
<td style="text-align: right;">139</td>
<td style="text-align: right;">77</td>
<td style="text-align: right;" >664</td>
<td style="text-align: right;" >166</td>
<td style="text-align: right;" >75</td>
</tr>
</tbody>
</table>
<h3>A final thought</h3>
<p>It&#8217;s interesting that there&#8217;s a direct correlation between the base time for the browser to execute <code>trim17</code>, and the percentage of time saved by <code>trimOOWB</code>. In other words, the <em>percentage</em> of performance boost from using smaller arrays increases as the browser&#8217;s speed increases: IE showed the highest time and lowest gain, followed by Firefox on both counts, and finishing with Chrome, which had the lowest base time and the highest percentage gain.</p>
<p>I&#8217;m guessing here, but I think the easiest way to explain this is that all three JavaScript interpreters are using roughly equivalent strategies for managing arrays. The percentages gained are different because the same <em>absolute</em> time savings in Chrome results in a higher <em>relative</em> performance boost when compared to IE or Firefox.</p>
<p>That raises a question, though: Is it possible that the structure of <code>trimOOWB</code> gives it any performance advantage over <code>trim17</code> <em>aside from</em> the savings generated through reducing the array size? I&#8217;ve looked for such artifacts in the tests. In short, and skipping the details for now, I think it&#8217;s likely that such artifacts couldn&#8217;t account for more than one quarter of the overall speed boost; it&#8217;s probably much less than that. It&#8217;s at least as likely that the overhead added by <code>trimOOWB</code> is <em>obscuring</em> part of the overall performance boost.</p>
<h3>Another final thought</h3>
<p>The multiple comparisons made in <code>trimOOWB</code> might raise the question: Why not try using a <code>switch</code> statement instead of all those conditionals? Well, I <em>did</em> try this, with mixed results. On one hand, Firefox showed a significant speedup, around 25%. On the other hand, IE may have been a little slower, and Chrome was more than twice as slow. (Besides, it was a <code>switch</code> statement. We&#8217;re looking for speed, but a fellow&#8217;s got to have <em>some</em> standards.)</p>
<h3>A final final thought</h3>
<p>Using a closure offers another advantage: It&#8217;s simple to redirect calls to <code>trim</code> to the native String.trim() function, if the browser supports it. All that&#8217;s required is to change the <code>return</code> in the outer (anonymous) function from this:</p>
<pre class="brush: jscript;">
    return trimOOWB2;
</pre>
<p>to:</p>
<pre class="brush: jscript;">
    return String.trim || trimOOWB2;
</pre>
<p><a href="http://ejohn.org/blog/ecmascript-5-strict-mode-json-and-more/">ECMAScript 5.0 is slated</a> to include <code>String.trim</code>, so it&#8217;s probably worth thinking ahead for this. In Firefox 3.5—the only current browser that I know of that supports <code>String.trim</code>—calls to the native <code>String.trim</code> run in a fraction of the time of any of the JavaScript implementations.</p>
<div class="oowbbtw"><strong>Note</strong>: Firefox&#8217;s native implementation of <code>String.trim</code> does not count U+1680 or U+180E as whitespace. It does treat U+3000 as whitespace.</div>
<hr />
<div><sup><a name="trim-note-mem" href="#trim-ref-mem">1</a></sup>Yes, I wrote &#8220;JavaScript&#8221; and &#8220;reasonable memory consumption&#8221; in the same article. Go ahead and snicker.</div>
<p />
]]></content:encoded>
			<wfw:commentRss>http://www.outofwhatbox.com/blog/2009/11/trimming-trim-via-razing-arrays-javascript/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

