Riff is a a recent spin-off project from Jesse Sielaff’s work to bring a Ruby-like language interpreter to the browser. Riff is a plugin for Racc (a LARL(1) parser generator for Ruby). Riff extends the output generation portions of Racc and writes javascript instead of Ruby. If you’re familiar with LALR grammars this should add a handy tool to your javascript arsenal.
Jesse’s been using Riff in his latest stab at Red, but we were eager to throw it at a smaller text format and see how it compared to a solid hand-written parser. We took a weekend and paired on rewriting Jan Lehnardt’s javascript implementation of Chris Wanstrath’s mustache template language.
Our mustache.js has three little differences from Jan’s (we wanted to keep as close to Chris’ Ruby/Python mustache implementations as possible):
- no explicit iterator
{{.}}(use{{ toString }}) - partial context is stored in Mustache.partials instead of passed as an argument. You can switch this context to a different object when you need to.
- we have
Mustache.render(template, object)instead ofMustache.to_html(template, object, partials, callback)
Other than those, the two parsers are roughly identical in functionality but quite distinct in implementation. Jan’s mustache.js is a handwritten parser, our’s is generated from Riff and Racc.
A generated parser is broken up into two main parts: grammar definition and semantic action. Take a look at Mustache’s grammar. The first part defines the grammar of the language (it’s pretty small), from which a finite state machine is generated. The part of the grammar enclosed in { and } defines the semantic rule conversion (e.g. what to do when a token is found).
For example, line 13 of our grammar looks like this:
"{{{" "text" "}}}" { $$ = Array(4, $2); }
This means when you find }, create a new Array object with 4 (which is the arbitrary code number we gave to “escaped text” nodes) as the first element and the string value of the “text” node as the second element. What you choose to do in your parser depends how you’d like to handle evaluation. You might create javascript literals, use a jquery class, etc. We’ve opted to create a tree of nodes using Array objects in javascript, but a different solution might have been:
"{{{" "text" "}}}" { $$ = { node_type:'unescaped-text', content: $2 } }
The grammar file is run through Riff and converted to generated parser. Everything up to line 214 is generated, the remainder is copied over from the ---- footer section of the grammar file.
Initially the footer for Mustache contained every function necessary to parse, evaluate, and render a mustache formatted file. Jesse later separated this into two separate sections – a parser and an evaluator – and the end results are pretty awesome (big reveal later, I promise).
Starting on line 214, we have two major functions of Mustache: compile and next_token. compile sets up a state for parsing and calls do_parse (which is generated by Riff from the grammar, not written by the programmer). next_token is called from inside do_parse and walks through the input string character by character, searching for semantically meaningful elements (e.g. {{#something}}).
This part can be a little dense if you’re unfamiliar with look-ahead parsing. If you can’t visualize the pointer moving through a string, see the StringScanner branch, where Jesse uses his javascript port of Ruby’s string scanner class and regular expressions to manipulate the string pointer.
So, this code gets you a tree of Array objects. The evaluator part of the library takes this tree and renders it into a string with the appropriate values inserted. The main mojo in the evaluator is the render function. This is the only function you’ll call in your code with Mustache.render(template, object). The evaluator is organized into one function for semantic type (e.g. evaluate_inverse_block_node for {{^foo}} inverse blocks)
Internally, render checks the type of template object you passed in. Up until now, we’ve talked about templates only as strings in mustache format. If you pass in a String template, it runs through Mustache.compile and comes out the other end as the series of Arrays mentioned above, is evaluated, and finally is converted back into a String with data inserted in appropriate locations.
Now, imagine we could start with a series of Arrays and skip the parsing stage. Obviously, nobody wants to develop their templates like this:
[0,[0,[0,[0,[0,[0,[0,[0,'<img src="',[3,'src']],'" width="'],[3,'width']],'" height="'],[3,'height']],'" alt="'],[3,'name']],'">']
we want to write templates like this:
<img src="{{src}}" width="{{width}}" height="{{height}}" alt="{{name}}">
In production, of course, the computer doesn’t care what your templates look like. If you’re working with javascript on the server, you can pre-compile your templates with Mustache.compile. Jesse also whipped up an example pre-compiler for mustache in Ruby in case you have different server needs.
This let’s us a use mustache.js, mustache.evaluator.js, and String templates while developing and move to just mustache.evaluator.js (which is a tiny, tiny file) and compiled templates when we’re ready to move into production and want a speed-boost.
How’d Riff do?
Not bad for a weekend pairing project. We went from nothing to a spec’ed, running format parser in two afternoons. Adjusting for comments and code style (Jesse is more liberal with his white space around curly braces), there’s 20% fewer lines of code over the handwritten parser. Run through a javascript minifier (Dean Edward’s in this example), Jan’s parser is 8,747 bytes, Jesse’s 6,292 bytes, and in production, you can get by with just the evaluator at 2,244 bytes.
Organizationally, I prefer separating out language definition, parsing, and evaluation. I personally find it reads more easily than putting everything in a single structure. Plus, it gives us the benefit of using compiled templates.
Speedwise, Jan’s Mustache ranges from identical up to 4x faster (depending the example), averaging 1.3x faster than ours. But as Jan kindly noted code size and simplicity are probably more important than incremental speed gains. Plus, if you’re desperate for cycles, using compiled templates is 14x faster (9x faster than Jan’s).
Here are the speed test results (run in Firefox, on my last-gen Macbook):
| template | janl | jesse | jesse compiled |
|---|---|---|---|
| basic | 207ms | 356ms | 25ms |
| complex | 676ms | 885ms | 74ms |
| friend | 202ms | 227ms | 22ms |
| long | 192ms | 450ms | 29ms |
| looping | 278ms | 354ms | 25ms |
| partial | 892ms | 945ms | 81ms |
| 2447 | 3217 | 256 |
If you find yourself hand-parsing text in javascript, give Riff a try. We think you’ll like it.