Parsing Pseudo-algebraic String Into Command
Solution 1:
Regular expressions are inherently unsuitable for a task involving parentheses for nested grouping – your pseudo-algebraic language (PAL) is not a regular language. An actual parser such as PyParsing (a PEG parser) should be used instead.
While this still requires translating from source code to operations, this can be performed directly during parsing.
We need a few language elements that directly translate to Python primitives:
- Number literals, such as
1.3, asint/floatliterals orfractions.Fraction. - Name references, such as
A3, as keys to theobjectsnamespace. - Parentheses, such as
(...), as grouping via parentheses for:- Variants, such as
(1.3 or A3), asmaxcalls. - Name ranges, such as
A4 to A6, asmaxcalls - The
+binary operator, as the+binary operator.
- Variants, such as
- Implicit multiplication, such as
2(...), as2 * (...).
Such a simple language is equally suitable for a transpiler or interpreter – there are no side-effects or introspection, so a naive translation without first-class objects, intermediate representation or AST is fine.
For a transpiler, we need to transform from PAL source code to Python source code. We can use pyparsing to directly read PAL and use a parse action to emit Python.
Primitive Expressions
The simplest case are numbers – both PAL and Python source are identical. This is ideal to look at the general structure of transpiling:
import pyparsing as pp
# PAL grammar rule: one "word" of sign, digits, dot, digits
NUMBER = pp.Regex(r"-?\d+\.?\d*")
# PAL -> Python transformation: Compute appropriate Python code@NUMBER.setParseActiondeftranslate(result: pp.ParseResults) -> str:
return result[0]
Note that setParseAction is commonly used with a lambda, instead of decorating a def. The longer variant is easier to comment/annotate, however.
A name reference is similar to parse, but needs some minor translation to Python. We can still use regular expressions, since there is no nesting here either. All names will be keys to a single, global namespace that we arbitrarily call objects.
NAME = pp.Regex(r"\w+\d+")
@NAME.setParseActiondeftranslate(result: pp.ParseResults) -> str:
returnf'objects["{result[0]}"]'# interpolate key into namespaceBoth grammar parts work independently for transpiling already. For example, NAME.parseString("A3") provides the source code objects["A3"].
Compound Expressions
Unlike terminal/primitive grammar expressions, compound expressions must refer to other expressions, possibly themselves (at this point, regular expressions fail). PyParsing makes this simple with Forward expressions – these are placeholders which are defined later on.
# placeholder for any valid PAL grammar element
EXPRESSION = pp.Forward()
Without operator precedence and just grouping via (...), all of +, or and to work similar. We pick or as a demonstrator.
The grammar gets more complicated now: We use pp.Suppress to match but discard the purely syntactical (/) and or. We use +/- to combine several grammar expressions (- means there are no alternatives when parsing). Finally, we use the forward reference EXPRESSION to refer to every other and this expression.
SOME_OR = pp.Suppress("(") + EXPRESSION + pp.OneOrMore(pp.Suppress("or") - EXPRESSION) - pp.Suppress(")")
@SOME_OR.setParseActiondeftranslate(result: pp.ParseResults) -> str:
elements = ', '.join(result)
returnf"max({elements})"Name ranges and addition work fundamentally the same, only the delimiter and output formatting changes. Implicit multiplication is simpler in that it works only on a pair of expressions.
At this point, we have a transpiler for each kind of language element. The missing rules can be created with the same approach. Now, we need to actually read source code and run the transpiled code.
We start by putting together the pieces that we have: inserting all grammar elements into the forward reference. We also provide a convenience function to abstract away PyParsing.
EXPRESSION << (NAME | NUMBER | SOME_OR)
deftranspile(pal: str) -> str:
"""Transpile PAL source code to Python source code"""return EXPRESSION.parseString(pal, parseAll=True)[0]
In order to run some code, we need to transpile the PAL code and evaluate the Python code with some namespace. Since our grammar only allows safe input, we can use eval directly:
defexecute(pal, **objects):
"""Execute PAL source code given some object values"""
code = transpile(pal)
returneval(code, {"objects": objects})
This function can be run with a given PAL source and name values to evaluate the equivalent Python value:
>>> execute("(A4 or A3 or 13)", A3=42, A4=7)
42For complete support of PAL, define the missing compound rules and add them alongside the others to EXPRESSION.
Post a Comment for "Parsing Pseudo-algebraic String Into Command"