Parsing Pseudo-algebraic String Into Command
Solution 1:
Regular expressions are inherently unsuitable for a task involving parentheses for nested grouping – your pseudo-algebraic language (PAL) is not a regular language. An actual parser such as PyParsing (a PEG parser) should be used instead.
While this still requires translating from source code to operations, this can be performed directly during parsing.
We need a few language elements that directly translate to Python primitives:
- Number literals, such as
1.3
, asint
/float
literals orfractions.Fraction
. - Name references, such as
A3
, as keys to theobjects
namespace. - Parentheses, such as
(...)
, as grouping via parentheses for:- Variants, such as
(1.3 or A3)
, asmax
calls. - Name ranges, such as
A4 to A6
, asmax
calls - The
+
binary operator, as the+
binary operator.
- Variants, such as
- Implicit multiplication, such as
2(...)
, as2 * (...)
.
Such a simple language is equally suitable for a transpiler or interpreter – there are no side-effects or introspection, so a naive translation without first-class objects, intermediate representation or AST is fine.
For a transpiler, we need to transform from PAL source code to Python source code. We can use pyparsing
to directly read PAL and use a parse action to emit Python.
Primitive Expressions
The simplest case are numbers – both PAL and Python source are identical. This is ideal to look at the general structure of transpiling:
import pyparsing as pp
# PAL grammar rule: one "word" of sign, digits, dot, digits
NUMBER = pp.Regex(r"-?\d+\.?\d*")
# PAL -> Python transformation: Compute appropriate Python code@NUMBER.setParseActiondeftranslate(result: pp.ParseResults) -> str:
return result[0]
Note that setParseAction
is commonly used with a lambda
, instead of decorating a def
. The longer variant is easier to comment/annotate, however.
A name reference is similar to parse, but needs some minor translation to Python. We can still use regular expressions, since there is no nesting here either. All names will be keys to a single, global namespace that we arbitrarily call objects
.
NAME = pp.Regex(r"\w+\d+")
@NAME.setParseActiondeftranslate(result: pp.ParseResults) -> str:
returnf'objects["{result[0]}"]'# interpolate key into namespace
Both grammar parts work independently for transpiling already. For example, NAME.parseString("A3")
provides the source code objects["A3"]
.
Compound Expressions
Unlike terminal/primitive grammar expressions, compound expressions must refer to other expressions, possibly themselves (at this point, regular expressions fail). PyParsing makes this simple with Forward
expressions – these are placeholders which are defined later on.
# placeholder for any valid PAL grammar element
EXPRESSION = pp.Forward()
Without operator precedence and just grouping via (...)
, all of +
, or
and to
work similar. We pick or
as a demonstrator.
The grammar gets more complicated now: We use pp.Suppress
to match but discard the purely syntactical (
/)
and or
. We use +
/-
to combine several grammar expressions (-
means there are no alternatives when parsing). Finally, we use the forward reference EXPRESSION
to refer to every other and this expression.
SOME_OR = pp.Suppress("(") + EXPRESSION + pp.OneOrMore(pp.Suppress("or") - EXPRESSION) - pp.Suppress(")")
@SOME_OR.setParseActiondeftranslate(result: pp.ParseResults) -> str:
elements = ', '.join(result)
returnf"max({elements})"
Name ranges and addition work fundamentally the same, only the delimiter and output formatting changes. Implicit multiplication is simpler in that it works only on a pair of expressions.
At this point, we have a transpiler for each kind of language element. The missing rules can be created with the same approach. Now, we need to actually read source code and run the transpiled code.
We start by putting together the pieces that we have: inserting all grammar elements into the forward reference. We also provide a convenience function to abstract away PyParsing.
EXPRESSION << (NAME | NUMBER | SOME_OR)
deftranspile(pal: str) -> str:
"""Transpile PAL source code to Python source code"""return EXPRESSION.parseString(pal, parseAll=True)[0]
In order to run some code, we need to transpile the PAL code and evaluate the Python code with some namespace. Since our grammar only allows safe input, we can use eval
directly:
defexecute(pal, **objects):
"""Execute PAL source code given some object values"""
code = transpile(pal)
returneval(code, {"objects": objects})
This function can be run with a given PAL source and name values to evaluate the equivalent Python value:
>>> execute("(A4 or A3 or 13)", A3=42, A4=7)
42
For complete support of PAL, define the missing compound rules and add them alongside the others to EXPRESSION
.
Post a Comment for "Parsing Pseudo-algebraic String Into Command"