Dowemo
0 0 0 0


Question:

I was trying to write a simple html template engine (for fun), and wanna parse a structure like this

A. normal lines are HTML

B. if a line starts with $ then view it as a java code line

$ if (isSuper) {
    <span>Are you wearing red underwear?</span>
$ }

C. if ${} wraps multiple lines, all code in it should be java code.

D. if a line starts with $include then do some trick on the line (call another template)

$include anotherTemplate(id, name)

this will create a new instance of anotherTemplate, and call it's render() method

E. and there would be more "commands" other than $include, such as $def, $val.

How can I express this in parser combinators? In effect it is a conditional fork

for 1. and 2., I got something like this:

'$' ~> ( '{' ~> upto('}') <~ '}' |  not('{') <~ newline )

where upto is borrowed from Scalate Scamel parser (which I just start to read and can't quite understand)

I used not('{') to distinguish $.... code line with ${...} block. But this is cumbersome, and won't extend to other "commands"

So How can I do this?


Best Answer:


Your use of not is redundant. The | method implements ordered choice; the second thing is tried only if the first has failed. This should do the trick:

def directive: Parser[Directive] =
  ( '$' ~>
    ( '{' ~> javaStuff <~ '}'
    | "include" ~> includeDirective
    | "def"     ~> defDirective
    | "val"     ~> valDirective
    | javaDirective
    )
  | htmlDirective
  )
def templateFile: Parser[List[Directive]] = (directive <~ 'n').*

For faster parsing and better error messages, you should "commit" your parsers as often as possible. I think this is what you were trying to get at when you used not('{').

Right now, if the above parser sees a '$' followed by a '{' and then doesn't see javaStuff, it'll backtrack and consider each of the four remaining '$'-alternatives in order (include, def, val, and finally javaDirective), and then backtrack to before '$' to try htmlDirective, before failing with a baffling error message. But if we see a '{', we know that none of the other alternatives could possibly succeed, so why should we check them? Likewise, a line that starts with '$' can never be an htmlDirective.

We want things like '{' to be points of no backtrack; if the after-'{' parser fails and wants to backtrack, we should stop it in its tracks and propagate the backtrack-causing failure directly to the user as an error.

The way to do this is with commit. This function/combinator, when applied to a parser p, looks at the ParseResult coming out of p and changes it to an Error (the give-up-entirely signal) if it was originally a Failure (the backtrack signal), leaving it unchanged otherwise. With appropriate use of commit, the directive parser becomes:

def directive: Parser[Directive] =
  ( '$' ~> commit( '{' ~> commit(javaStuff <~ '}')
                 | "include" ~> commit(includeDirective)
                 | "def"     ~> commit(defDirective)
                 | "val"     ~> commit(valDirective
                 | javaDirective
                 )
  | htmlDirective
  )

When I first learned to use the parsing library, I found it really helpful to look at the source code for Parsers; it makes some of this stuff a bit more clear.

(Some other tips: The purpose of append and ParseResult#append is to decide which failure from a sequence of parse-alternatives should be propagated to the user. Just ignore those for now. Also, I wouldn't worry too much about >>/flatMap/into until you've gotten some more practice; when it's time, read Daniel Sobral's explanation. Finally, I've never had to use |||, and you probably won't either. Happy parsing!)

Hope this helps.




Copyright © 2011 Dowemo All rights reserved.    Creative Commons   AboutUs