Scribbling

Recursive Descent Parsing - Python Implementation 본문

Computer Science/Algorithms & Data Structures

Recursive Descent Parsing - Python Implementation

focalpoint 2022. 11. 12. 05:19

 

Video I recommend: 

https://www.youtube.com/watch?v=SToUyjAsaFk 

 

In this post, I will cover "Recursive Descent Parsing". Details about it are well elaborated in the above video. This post is to summarize core concepts and implement the algorithm with an actual code.

 

Goal: Implement a parser that parse an arithmetic experssion.

- ( 3 + 5) * -2

 

Elements:

Numbers

Infix Operators (+, -, *, /)

Prefix Operator (+, -)

Parenthesis ( )

 

Precedence

3 + 5 * 6

should be 3 + (5 * 6) not (3 + 5) * 6

 

Associativity

7 - 5 + 1

= (7 - 5) + 1 not 7 - ( 5 + 1)

 

Grammar

Grammar (Non-Terminal) Symbols: E: Expression, T: Term, F: Factor

Terminal Symbols: *, -, 37, 62, +, /, (, )

T+E, T-E, T --> E

F * T, F / T, F --> F

Integer, (E), -F --> F

 

Tokens

Integers {digits}

 

Lexical Analysis

function scanToken(): scans the input and moves nextToken

variable nextToken

(optional) OOP class for data types like integers

 

Class

We will use a tree structure to represent the expression. Thanks to Python's versatility, we don't need to implement different class for each operator or data types.

class Node(object):
    def __init__(self, val="", left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

 

Print Method

Recursively call to print the expression.

 

Eval Method

Recursively call to evaluate the expression.

 

Parsing Algorithm

Functions: each function will return a node (or pointer in other languages) of the tree it builds

parseExpression()

parseTerm()

parseFactor()

Details:

- at any moment, nextToken will contain the next unscanned thing from the input.

- scanToken() to advance.

- return Null if error.

 

Python Implementation

To be very technical, it is very important to handle errors at each step. However, for simplicity, I won't handle errors as for now. If I have some time in the future, I might update the code.

 

from collections import deque


class Node(object):
    def __init__(self, val=" ", left=None, right=None):
        self.val = val
        self.left = left
        self.right = right


class Parser:

    @staticmethod
    def expTree(expression):
        tokens = deque(list(expression))
        return Parser.parse_expression(tokens)

    @staticmethod
    def parse_expression(tokens):
        lhs = Parser.parse_term(tokens)
        while tokens and tokens[0] in ['+', '-']:
            op = tokens.popleft()
            rhs = Parser.parse_term(tokens)
            lhs = Node(val=op, left=lhs, right=rhs)
        return lhs

    @staticmethod
    def parse_term(tokens):
        lhs = Parser.parse_factor(tokens)
        while tokens and tokens[0] in ['*', '/']:
            op = tokens.popleft()
            rhs = Parser.parse_factor(tokens)
            lhs = Node(val=op, left=lhs, right=rhs)
        return lhs

    @staticmethod
    def parse_factor(tokens):
        # if integer
        if tokens[0].isnumeric():
            num = 0
            while tokens and tokens[0].isnumeric():
                num = num * 10 + int(tokens.popleft())
            return Node(val=num)

        elif tokens[0] == '(':
            # consume '('
            tokens.popleft()
            # if node is none: raise error
            node = Parser.parse_expression(tokens)
            # consume ')'
            # if token is not ')': raise error
            tokens.popleft()
            return node

        # if negate
        elif tokens[0] == '-':
            # consume '-'
            tokens.popleft()
            # if numNode is none: raise error
            numNode = Parser.parse_factor(tokens)
            return Node(val='-', left=None, right=numNode)

        # invalid token: raise error
        else:
            return None


    @staticmethod
    def print_tree(root):
        def _print_node(node):
            # leaf node with number value
            if node.left is None and node.right is None:
                print(node.val, end='')
                return
            # negate case: - (3 + 5)
            if node.left is None and node.right is not None:
                print('(', end='')
                # node.val should be '-' (negate)
                print(node.val, end='')
                _print_node(node.right)
                print(')', end='')
                return
            # operator case with both children
            # if node.left is None: raise error
            print('(', end='')
            _print_node(node.left)
            # node.val should be an operator
            print(node.val, end='')
            _print_node(node.right)
            print(')', end='')
        _print_node(root)
        print()

    @staticmethod
    def eval_tree(root):
        def _eval_node(node):
            # leaf node with number value
            if node.left is None and node.right is None:
                return node.val
            # negate case: - (3 + 5)
            if node.left is None and node.right is not None:
                # node.val should be '-' (negate)
                return -_eval_node(node.right)
            # operator case with both children
            # if node.left is None: raise error
            lhs, rhs = _eval_node(node.left), _eval_node(node.right)
            if node.val == '+':
                return lhs + rhs
            elif node.val == '-':
                return lhs - rhs
            elif node.val == '*':
                return lhs * rhs
            elif node.val == '/':
                return int(lhs/rhs)
            # invalid operator: raise error
            else:
                return None
        return _eval_node(root)




'''
Example
'''
root = Parser.expTree('2-3/(5*2)+1')
Parser.print_tree(root)
print(Parser.eval_tree(root))

 

 

 

Relevant LeetCode Problem:

1597. Build Binary Expression Tree From Infix Expression

https://leetcode.com/problems/build-binary-expression-tree-from-infix-expression/description/

 

We can basically use the same code for the solution !

# Definition for a binary tree node.
class Node(object):
    def __init__(self, val=" ", left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

from collections import deque
class Solution:
    def expTree(self, s: str) -> 'Node':
        tokens = deque(list(s))
        return self.parse_expression(tokens)

    def parse_expression(self, tokens):
        lhs = self.parse_term(tokens)
        while tokens and tokens[0] in ['+', '-']:
            op = tokens.popleft()
            rhs = self.parse_term(tokens)
            lhs = Node(val=op, left=lhs, right=rhs)
        return lhs

    def parse_term(self, tokens):
        lhs = self.parse_factor(tokens)
        while tokens and tokens[0] in ['*', '/']:
            op = tokens.popleft()
            rhs = self.parse_factor(tokens)
            lhs = Node(val=op, left=lhs, right=rhs)
        return lhs

    def parse_factor(self, tokens):
        # if integer
        if tokens[0].isnumeric():
            num = 0
            while tokens and tokens[0].isnumeric():
                num = num * 10 + int(tokens.popleft())
            return Node(val=str(num))

        elif tokens[0] == '(':
            # consume '('
            tokens.popleft()
            # if node is none: should raise error
            node = self.parse_expression(tokens)
            # consume ')'
            # if token is not ')': should raise error
            tokens.popleft()
            return node