4. Parsing Expressions (Recursive Descent)
In the previous chapter, we built the foundation of our parser and successfully parsed primary expressions: literals and groupings. However, programming languages are mostly composed of operations combining these primary expressions.
In this chapter, we will implement the rest of our expression grammar. We will deal with operator precedence and associativity, rewrite our grammar to be unambiguous, and implement a full Recursive Descent parser. Finally, we will add robust error recovery so our parser does not crash at the first sign of invalid code.
1. Ambiguity and the Parsing Game
At the end of Chapter 3, we looked at a simple grammar for expressions. It looked something like this:
expression → primary
| unary
| binary ;
primary → NUMBER | STRING | "true" | "false" | "nil"
| "(" expression ")" ;
unary → ( "-" | "!" ) expression ;
binary → expression operator expression ;
operator → "==" | "!=" | "<" | "<=" | ">" | ">="
| "+" | "-" | "*" | "/" ;
This grammar is ambiguous. Consider the expression 5 + 3 * 2. If we play the "parsing game" with these rules, we could generate two entirely different Abstract Syntax Trees (ASTs):
Wrong Tree (Ignores Precedence) Right Tree (Follows Precedence)
* +
/ \ / \
+ 2 5 *
/ \ / \
5 3 3 2
Evaluates to: (5 + 3) * 2 = 16 Evaluates to: 5 + (3 * 2) = 11
Math dictates that multiplication has higher precedence than addition, so the second tree is the correct one. However, our flat grammar has no way of knowing this. To fix this, we must define rules for precedence and associativity.
- Precedence determines which operator binds tighter. Higher precedence operators (like
*) are evaluated first, meaning they appear lower in the syntax tree. - Associativity determines the evaluation order for a series of the same operator. Left-associative operators evaluate left-to-right (
5 - 3 - 1becomes(5 - 3) - 1). Right-associative operators evaluate right-to-left.
The Stratified Grammar
We fix ambiguity by stratifying our grammar. We define a separate rule for each precedence level. Each rule only matches expressions at its precedence level or higher.
Notice that in our new grammar, the generic binary and operator rules from the previous chapter have been completely eliminated. Instead of one ambiguous catch-all rule, we split the binary operations up into four distinct rules (equality, comparison, term, and factor), one for each level of precedence.
Here is the unambiguous, stratified grammar for Snek expressions, ordered from lowest to highest precedence:
expression → equality ;
equality → comparison ( ( "!=" | "==" ) comparison )* ;
comparison → term ( ( ">" | ">=" | "<" | "<=" ) term )* ;
term → factor ( ( "-" | "+" ) factor )* ;
factor → unary ( ( "/" | "*" ) unary )* ;
unary → ( "!" | "-" ) unary | primary ;
primary → NUMBER | STRING | "true" | "false" | "nil" | "(" expression ")" ;
Notice how expression simply routes to equality. Then equality looks for == or !=, but its operands must be comparison expressions (or higher). This forces the parser to descend to the bottom of the precedence ladder before it can match a lower-precedence operator, perfectly mirroring mathematical rules.
2. Top-Down Recursive Descent
A recursive descent parser translates these grammar rules directly into code. We start at the top (expression) and work our way down to the leaves (primary).
Let us update our entry point. Open parser.py and modify the expression() method:
# --- Parser Entry Point ---
def expression(self) -> expr.Expr:
return self.equality()
Now, we just need to implement the missing methods down the chain.
3. Parsing Binary Operators
We will start with equality(). Look at the grammar rule: equality → comparison ( ( "!=" | "==" ) comparison )* ;
The * in BNF means "zero or more times". In imperative code, a repeating sequence translates perfectly to a while loop. Add this method to your Parser class (above primary()):
def equality(self) -> expr.Expr:
e = self.comparison()
while self.match(TokenType.BANG_EQUAL, TokenType.EQUAL_EQUAL):
operator = self.previous()
right = self.comparison()
e = expr.Binary(e, operator, right)
return e
Technical Note: Left-Associativity
The
whileloop is the secret to left-associativity. Consider parsing the expression1 + 2 + 3. We want the AST to lean to the left, representing(1 + 2) + 3. Here is how thewhileloop builds it step-by-step:
- We parse the first operand (
e = self.factor()).eis nowLiteral(1).- We enter the
whileloop because we see a+. We parse the right operand (Literal(2)).- We construct a new
Binarynode and assign it back toe.- The loop runs again because we see the second
+. We parse the right operand (Literal(3)).- We construct another
Binarynode, using our entire existing tree (e) as the new left operand.Step 1: Step 2 (First Loop): Step 3 (Second Loop): 1 + + / \ / \ 1 2 + 3 / \ 1 2Because the old
econstantly gets pushed down to become the left child of the new node, the tree naturally grows up and to the right, leaving the leftmost operations at the bottom to be evaluated first.
The remaining binary operators follow the exact same pattern. Implement comparison(), term(), and factor():
def comparison(self) -> expr.Expr:
e = self.term()
while self.match(TokenType.GREATER, TokenType.GREATER_EQUAL, TokenType.LESS, TokenType.LESS_EQUAL):
operator = self.previous()
right = self.term()
e = expr.Binary(e, operator, right)
return e
def term(self) -> expr.Expr:
e = self.factor()
while self.match(TokenType.MINUS, TokenType.PLUS):
operator = self.previous()
right = self.factor()
e = expr.Binary(e, operator, right)
return e
def factor(self) -> expr.Expr:
e = self.unary()
while self.match(TokenType.SLASH, TokenType.STAR):
operator = self.previous()
right = self.unary()
e = expr.Binary(e, operator, right)
return e
You might notice that these four methods look nearly identical. In a recursive descent parser, this repetition is standard. It keeps the precedence explicitly mapped to the call stack, making the parser robust and easy to read.
4. Parsing Unary Operators
Next, we reach the unary rule: unary → ( "!" | "-" ) unary | primary ;
If you completed the challenge at the end of the previous chapter, you might already have this! Notice that unary calls itself on the right side of the operator. Add this method:
def unary(self) -> expr.Expr:
if self.match(TokenType.BANG, TokenType.MINUS):
operator = self.previous()
right = self.unary()
return expr.Unary(operator, right)
return self.primary()
Because unary() calls self.unary() to parse its operand, it evaluates right-to-left. If you type !!true, it parses the first !, then recursively calls unary() to parse the second !, making the second one the child of the first. This naturally creates right-associativity.
If the current token is not a ! or -, we simply fall through to self.primary(), completing the chain.
5. Syntax Errors and Synchronization
Our parser can now build complex trees. However, we must consider what happens when the user types invalid code.
A parser has two main jobs:
- Produce an AST for valid code.
- Detect errors, report them, and recover quickly so it can find subsequent errors.
If we encounter a syntax error, we raise a ParseError. But if we simply let the exception crash the program, the user only sees one error at a time. We want to catch that error, realign our parser's internal state, and keep going. This technique is called Panic Mode recovery.
When a parser goes into panic mode, it knows its state is corrupted. It must discard tokens until it finds a boundary where it knows it can safely resume parsing. In C-like languages, statement boundaries are marked by semicolons or keywords that begin a new statement (like class, fun, var, or if).
Add the following synchronize() method to your Parser class under the Error Handling section:
def synchronize(self):
"""Discards tokens until a statement boundary is found."""
self.advance()
while not self.is_at_end():
if self.previous().type == TokenType.SEMICOLON:
return
# If the next token is a keyword that starts a statement, we are safe.
next_type = self.peek().type
if next_type in (TokenType.CLASS, TokenType.FUN, TokenType.VAR,
TokenType.FOR, TokenType.IF, TokenType.WHILE,
TokenType.PRINT, TokenType.RETURN):
return
self.advance()
Technical Note: Exception Unwinding
Right now, we catch the
ParseErrorin our top-levelparse()method and simply returnNone. Because we only parse single expressions at this stage, we have no statement boundaries to synchronize to.In Chapter 7, when we add statements and blocks, we will update
parse()to catchParseError, callself.synchronize(), and continue parsing the next line. We write this method now to lay down the correct architectural foundation.
6. Wiring it Up
Your parser is now fully capable of handling mathematical precedence. Let us prove it.
Ensure snek.py is unchanged from the end of Chapter 3, where it calls scanner.scan_tokens(), passes them to parser.parse(), and prints the result using AstPrinter.
Test it out: Run your REPL: python snek.py Type the following expression: -123 * (45.67 + 8) == 10;
Press Enter. The AstPrinter should output the exact hierarchical structure of the AST: (== (* (- 123.0) (group (+ 45.67 8.0))) 10.0)
If your output matches, congratulations! You have successfully implemented a recursive descent parser. Your interpreter now possesses a rigorous understanding of mathematics and logic. In the next chapter, we will bring these trees to life by writing the actual evaluation logic.
7. Challenges
1. The Exponentiation Operator (**)
Add an exponentiation operator (**) to Snek.
Requirements:
- It should have higher precedence than multiplication and division (
factor), but lower precedence than unary operators like-(unary). - Unlike addition and multiplication, exponentiation is right-associative. Mathematical rules state that
2 ** 3 ** 2must be evaluated as2 ** (3 ** 2)(which is2 ** 9 = 512), NOT(2 ** 3) ** 2(which is8 ** 2 = 64). - Your updated grammar rules for this section of the precedence ladder should look like this (notice the
?in thepowerrule, which means the operator and right-hand side are optional - can appear zero or one time before recursing):factor → power ( ( "/" | "*" ) power )* ; power → unary ( "**" power )? ; unary → ( "!" | "-" ) unary | primary ;
Implementation Hints
To implement this successfully, you will need to touch a few different parts of the pipeline:
- The Scanner:
- Add a new
STAR_STARenum value toTokenType. - In your scanner, modify the logic that handles
*. Useself.match('*')to check if a second asterisk follows the first. If it does, emit aSTAR_STARtoken; otherwise, emit a standardSTAR.
- Add a new
- The AST:
- You do not need a new AST node. Exponentiation is just another
Binaryexpression. YourAstPrinterwill automatically handle it using the operator's lexeme.
- You do not need a new AST node. Exponentiation is just another
- The Parser (Precedence):
- Create a new method named
power(). - To slot it into the correct precedence level, update your
factor()method to callself.power()instead ofself.unary(). - Your new
power()method will then be the one to callself.unary().
- Create a new method named
- The Parser (Associativity):
- Because exponentiation evaluates right-to-left, you cannot parse it using a
whileloop like the other binary operators. - Instead, inside
power(), first callself.unary()to get the left operand. - Then, use an
if self.match(TokenType.STAR_STAR):block. If you find the operator, grab the previous token, and recursively callself.power()to get the right operand. Finally, wrap them in anexpr.Binarynode and return it.
- Because exponentiation evaluates right-to-left, you cannot parse it using a
2. The Ternary Operator (? :)
Many languages feature a C-style conditional operator, also known as the ternary operator: condition ? true_expr : false_expr.
Add this to your AST and parser.
Requirements:
- It should have the lowest precedence, right below
equality. - Ternary operators are right-associative. The expression
a ? b : c ? d : eevaluates asa ? b : (c ? d : e). - Your updated grammar rules for this section of the precedence ladder should look like this:
expression → ternary ; ternary → equality ( "?" expression ":" ternary )? ; equality → comparison ( ( "!=" | "==" ) comparison )* ;
Implementation Hints
- The Scanner:
- Add
QUESTIONandCOLONtoTokenType. - Update
scanner.pyto recognize?and:as single-character tokens.
- Add
- The AST Node:
- You will need a new node in
expr.py:@dataclass class Ternary(Expr): condition: Expr then_branch: Expr else_branch: Expr - Update
ast_printer.pywith a newcaseto format this node (e.g., returning something like(? condition then_branch else_branch)).
- You will need a new node in
- Precedence:
- The ternary operator should have the lowest precedence, sitting right below
equality. - Update your
expression()entry point to call a newternary()method instead ofequality().
- The ternary operator should have the lowest precedence, sitting right below
- Associativity and Parsing Logic:
- Ternary operators are right-associative. The expression
a ? b : c ? d : eevaluates asa ? b : (c ? d : e). - In your new
ternary()method, start by parsing the condition usingself.equality(). - Then, use
if self.match(TokenType.QUESTION):to check for the operator. - If it matches, parse the
then_branch. You can useself.expression()here to allow any valid expression. - Next, use
self.consume(TokenType.COLON, "...")to ensure the colon is present. - Finally, for the
else_branch, recursively callself.ternary()to ensure right-associativity, and return your newexpr.Ternarynode.
- Ternary operators are right-associative. The expression
- Verification:
- If you run the REPL and type
1 == 2 ? 3 : 4 == 4 ? 5 : 6;, yourAstPrintershould output exactly:(? (== 1.0 2.0) 3.0 (? (== 4.0 4.0) 5.0 6.0))
- If you run the REPL and type