Skip to content Skip to sidebar Skip to footer

Parsing Pseudo-algebraic String Into Command

I have a dictionary containing a list of object as objects = {'A1': obj_1, 'A2': obj_2, } I then have a string as cmd = '(1.3A1 + 2(A2 + 0.7A3)) or 2(A4 to A6)' I want to

Solution 1:

Regular expressions are inherently unsuitable for a task involving parentheses for nested grouping – your pseudo-algebraic language (PAL) is not a regular language. An actual parser such as PyParsing (a PEG parser) should be used instead.

While this still requires translating from source code to operations, this can be performed directly during parsing.


We need a few language elements that directly translate to Python primitives:

  • Number literals, such as 1.3, as int/float literals or fractions.Fraction.
  • Name references, such as A3, as keys to the objects namespace.
  • Parentheses, such as (...), as grouping via parentheses for:
    • Variants, such as (1.3 or A3), as max calls.
    • Name ranges, such as A4 to A6, as max calls
    • The + binary operator, as the + binary operator.
  • Implicit multiplication, such as 2(...), as 2 * (...).

Such a simple language is equally suitable for a transpiler or interpreter – there are no side-effects or introspection, so a naive translation without first-class objects, intermediate representation or AST is fine.


For a transpiler, we need to transform from PAL source code to Python source code. We can use pyparsing to directly read PAL and use a parse action to emit Python.

Primitive Expressions

The simplest case are numbers – both PAL and Python source are identical. This is ideal to look at the general structure of transpiling:

import pyparsing as pp

# PAL grammar rule: one "word" of sign, digits, dot, digits
NUMBER = pp.Regex(r"-?\d+\.?\d*")

# PAL -> Python transformation: Compute appropriate Python code@NUMBER.setParseActiondeftranslate(result: pp.ParseResults) -> str:
    return result[0]

Note that setParseAction is commonly used with a lambda, instead of decorating a def. The longer variant is easier to comment/annotate, however.

A name reference is similar to parse, but needs some minor translation to Python. We can still use regular expressions, since there is no nesting here either. All names will be keys to a single, global namespace that we arbitrarily call objects.

NAME = pp.Regex(r"\w+\d+")

@NAME.setParseActiondeftranslate(result: pp.ParseResults) -> str:
    returnf'objects["{result[0]}"]'# interpolate key into namespace

Both grammar parts work independently for transpiling already. For example, NAME.parseString("A3") provides the source code objects["A3"].

Compound Expressions

Unlike terminal/primitive grammar expressions, compound expressions must refer to other expressions, possibly themselves (at this point, regular expressions fail). PyParsing makes this simple with Forward expressions – these are placeholders which are defined later on.

# placeholder for any valid PAL grammar element
EXPRESSION = pp.Forward()

Without operator precedence and just grouping via (...), all of +, or and to work similar. We pick or as a demonstrator.

The grammar gets more complicated now: We use pp.Suppress to match but discard the purely syntactical (/) and or. We use +/- to combine several grammar expressions (- means there are no alternatives when parsing). Finally, we use the forward reference EXPRESSION to refer to every other and this expression.

SOME_OR = pp.Suppress("(") + EXPRESSION + pp.OneOrMore(pp.Suppress("or") - EXPRESSION) - pp.Suppress(")")

@SOME_OR.setParseActiondeftranslate(result: pp.ParseResults) -> str:
    elements = ', '.join(result)
    returnf"max({elements})"

Name ranges and addition work fundamentally the same, only the delimiter and output formatting changes. Implicit multiplication is simpler in that it works only on a pair of expressions.


At this point, we have a transpiler for each kind of language element. The missing rules can be created with the same approach. Now, we need to actually read source code and run the transpiled code.

We start by putting together the pieces that we have: inserting all grammar elements into the forward reference. We also provide a convenience function to abstract away PyParsing.

EXPRESSION << (NAME | NUMBER | SOME_OR)

deftranspile(pal: str) -> str:
    """Transpile PAL source code to Python source code"""return EXPRESSION.parseString(pal, parseAll=True)[0]

In order to run some code, we need to transpile the PAL code and evaluate the Python code with some namespace. Since our grammar only allows safe input, we can use eval directly:

defexecute(pal, **objects):
    """Execute PAL source code given some object values"""
    code = transpile(pal)
    returneval(code, {"objects": objects})

This function can be run with a given PAL source and name values to evaluate the equivalent Python value:

>>> execute("(A4 or A3 or 13)", A3=42, A4=7)
42

For complete support of PAL, define the missing compound rules and add them alongside the others to EXPRESSION.

Post a Comment for "Parsing Pseudo-algebraic String Into Command"