Skip to content Skip to sidebar Skip to footer

Python - How To Write Empty Tree Node As Empty String To Xml File

I want to remove elements of a certain tag value and then write out the .xml file WITHOUT any tags for those deleted elements; is my only option to create a new tree? There are two

Solution 1:

import lxml.etree as et

xml  = et.parse("test.xml")

for node in xml.xpath("//neighbor"):
    node.getparent().remove(node)


xml.write("out.xml",encoding="utf-8",xml_declaration=True)

Using elementTree, we need to find the parents of the neighbor nodes then find the neighbor nodes inside that parent and remove them:

from xml.etree import ElementTree as et

xml  = et.parse("test.xml")


for parent in xml.getroot().findall(".//neighbor/.."):
      for child in parent.findall("./neighbor"):
          parent.remove(child)


xml.write("out.xml",encoding="utf-8",xml_declaration=True)

Both will give you:

<?xml version='1.0' encoding='utf-8'?><data><countryname="Liechtenstein"><rank>1</rank><year>2008</year><gdppc>141100</gdppc></country><countryname="Singapore"><rank>4</rank><year>2011</year><gdppc>59900</gdppc></country><countryname="Panama"><rank>68</rank><year>2011</year><gdppc>13600</gdppc></country></data>

Using your attribute logic and modifying the xml a bit like below:

x = """<?xml version="1.0"?><data><countryname="Liechtenstein"><rank>1</rank><year>2008</year><gdppc>141100</gdppc><neighborname="Austria"direction="E"/><neighborname="Switzerland"direction="W"/></country><countryname="Singapore"><rank>4</rank><year>2011</year><gdppc>59900</gdppc><neighborname="Costa Rica"direction="W"make="foo"build="bar"job="blah"/><neighborname="Malaysia"direction="N"/></country><countryname="Panama"><rank>68</rank><year>2011</year><gdppc>13600</gdppc><neighborname="Costa Rica"direction="W"make="foo"build="bar"job="blah"/><neighborname="Colombia"direction="E"/></country></data>"""

Using lxml:

import lxml.etree as et

xml = et.fromstring(x)

for node in xml.xpath("//neighbor[not(@make) and not(@job) and not(@make)]"):
    node.getparent().remove(node)
print(et.tostring(xml))

Would give you:

<data><countryname="Liechtenstein"><rank>1</rank><year>2008</year><gdppc>141100</gdppc></country><countryname="Singapore"><rank>4</rank><year>2011</year><gdppc>59900</gdppc><neighborname="Costa Rica"direction="W"make="foo"build="bar"job="blah"/></country><countryname="Panama"><rank>68</rank><year>2011</year><gdppc>13600</gdppc><neighborname="Costa Rica"direction="W"make="foo"build="bar"job="blah"/></country></data>

The same logic in ElementTree:

from xml.etree import ElementTree as et

xml = et.parse("test.xml").getroot()

atts = {"build", "job", "make"}

for parent in xml.findall(".//neighbor/.."):
    for child in parent.findall(".//neighbor")[:]:
        ifnot atts.issubset(child.attrib):
            parent.remove(child)

If you are using iter:

from xml.etree import ElementTree as et

xml = et.parse("test.xml")

for parent in xml.getroot().iter("*"):
    parent[:] = (child for child in parent if child.tag != "neighbor")

You can see we get the exact same output:

In [30]: !cat /home/padraic/untitled6/test.xml
<?xml version="1.0"?><data><countryname="Liechtenstein">#
      <neighborname="Austria"direction="E"/><rank>1</rank><neighborname="Austria"direction="E"/><year>2008</year><neighborname="Austria"direction="E"/><gdppc>141100</gdppc><neighborname="Austria"direction="E"/><neighborname="Switzerland"direction="W"/></country><countryname="Singapore"><rank>4</rank><year>2011</year><gdppc>59900</gdppc><neighborname="Malaysia"direction="N"/></country><countryname="Panama"><rank>68</rank><year>2011</year><gdppc>13600</gdppc><neighborname="Costa Rica"direction="W"/><neighborname="Colombia"direction="E"/></country></data>
In [31]: paste
def test():
    import lxml.etree as et
    xml = et.parse("/home/padraic/untitled6/test.xml")
    for node in xml.xpath("//neighbor"):
        node.getparent().remove(node)
    a = et.tostring(xml)
    from xml.etree import ElementTree as et
    xml = et.parse("/home/padraic/untitled6/test.xml")
    for parent in xml.getroot().iter("*"):
        parent[:] = (child for child in parent if child.tag != "neighbor")
    b = et.tostring(xml.getroot())
    assert  a == b

## -- End pasted text --

In [32]: test()

Solution 2:

Whenever modifying XML documents is needed, consider also XSLT, the special-purpose language part of the XSL family which includes XPath. XSLT is designed specifically to transform XML files. Pythoners are not quick to recommend it but it avoids the need of loops or nested if/then logic in general purpose code. Python's lxml module can run XSLT 1.0 scripts using the libxslt processor.

Below transformation runs the identity transform to copy document as is and then runs an empty template match on <neighbor> to remove it:

XSLT Script (save as an .xsl file to be loaded just like source .xml, both of which are well-formed xml files)

<xsl:transformxmlns:xsl="http://www.w3.org/1999/XSL/Transform"version="1.0"><xsl:outputversion="1.0"encoding="UTF-8"indent="yes" /><xsl:strip-spaceelements="*"/><!-- IDENTITY TRANSFORM TO COPY XML AS IS --><xsl:templatematch="@*|node()"><xsl:copy><xsl:apply-templatesselect="@*|node()"/></xsl:copy></xsl:template><!-- EMPTY TEMPLATE TO REMOVE NEIGHBOR WHEREVER IT EXISTS --><xsl:templatematch="neighbor"/></xsl:transform>

Python Script

import lxml.etree as et

# LOAD XML AND XSL DOCUMENTS
xml  = et.parse("Input.xml")
xslt = et.parse("Script.xsl")

# TRANSFORM TO NEW TREE
transform = et.XSLT(xslt)
newdom = transform(xml)

# CONVERT TO STRING
tree_out = et.tostring(newdom, encoding='UTF-8', pretty_print=True,  xml_declaration=True)

# OUTPUT TO FILE
xmlfile = open('Output.xml'),'wb')
xmlfile.write(tree_out)
xmlfile.close()

Solution 3:

The trick here is to find the parent (the country node), and delete the neighbor from there. In this example, I am using ElementTree because I am somewhat familiar with it:

import xml.etree.ElementTree as ET

if __name__ == '__main__':
    withopen('debug.log') as f:
        doc = ET.parse(f)

        for country in doc.findall('.//country'):
            for neighbor in country.findall('neighbor'):
                country.remove(neighbor)

        ET.dump(doc)  # Display

Post a Comment for "Python - How To Write Empty Tree Node As Empty String To Xml File"