Python - How To Write Empty Tree Node As Empty String To Xml File
Solution 1:
import lxml.etree as et
xml = et.parse("test.xml")
for node in xml.xpath("//neighbor"):
node.getparent().remove(node)
xml.write("out.xml",encoding="utf-8",xml_declaration=True)
Using elementTree, we need to find the parents of the neighbor nodes
then find the neighbor nodes inside that parent
and remove them:
from xml.etree import ElementTree as et
xml = et.parse("test.xml")
for parent in xml.getroot().findall(".//neighbor/.."):
for child in parent.findall("./neighbor"):
parent.remove(child)
xml.write("out.xml",encoding="utf-8",xml_declaration=True)
Both will give you:
<?xml version='1.0' encoding='utf-8'?><data><countryname="Liechtenstein"><rank>1</rank><year>2008</year><gdppc>141100</gdppc></country><countryname="Singapore"><rank>4</rank><year>2011</year><gdppc>59900</gdppc></country><countryname="Panama"><rank>68</rank><year>2011</year><gdppc>13600</gdppc></country></data>
Using your attribute logic and modifying the xml a bit like below:
x = """<?xml version="1.0"?><data><countryname="Liechtenstein"><rank>1</rank><year>2008</year><gdppc>141100</gdppc><neighborname="Austria"direction="E"/><neighborname="Switzerland"direction="W"/></country><countryname="Singapore"><rank>4</rank><year>2011</year><gdppc>59900</gdppc><neighborname="Costa Rica"direction="W"make="foo"build="bar"job="blah"/><neighborname="Malaysia"direction="N"/></country><countryname="Panama"><rank>68</rank><year>2011</year><gdppc>13600</gdppc><neighborname="Costa Rica"direction="W"make="foo"build="bar"job="blah"/><neighborname="Colombia"direction="E"/></country></data>"""
Using lxml:
import lxml.etree as et
xml = et.fromstring(x)
for node in xml.xpath("//neighbor[not(@make) and not(@job) and not(@make)]"):
node.getparent().remove(node)
print(et.tostring(xml))
Would give you:
<data><countryname="Liechtenstein"><rank>1</rank><year>2008</year><gdppc>141100</gdppc></country><countryname="Singapore"><rank>4</rank><year>2011</year><gdppc>59900</gdppc><neighborname="Costa Rica"direction="W"make="foo"build="bar"job="blah"/></country><countryname="Panama"><rank>68</rank><year>2011</year><gdppc>13600</gdppc><neighborname="Costa Rica"direction="W"make="foo"build="bar"job="blah"/></country></data>
The same logic in ElementTree:
from xml.etree import ElementTree as et
xml = et.parse("test.xml").getroot()
atts = {"build", "job", "make"}
for parent in xml.findall(".//neighbor/.."):
for child in parent.findall(".//neighbor")[:]:
ifnot atts.issubset(child.attrib):
parent.remove(child)
If you are using iter:
from xml.etree import ElementTree as et
xml = et.parse("test.xml")
for parent in xml.getroot().iter("*"):
parent[:] = (child for child in parent if child.tag != "neighbor")
You can see we get the exact same output:
In [30]: !cat /home/padraic/untitled6/test.xml
<?xml version="1.0"?><data><countryname="Liechtenstein">#
<neighborname="Austria"direction="E"/><rank>1</rank><neighborname="Austria"direction="E"/><year>2008</year><neighborname="Austria"direction="E"/><gdppc>141100</gdppc><neighborname="Austria"direction="E"/><neighborname="Switzerland"direction="W"/></country><countryname="Singapore"><rank>4</rank><year>2011</year><gdppc>59900</gdppc><neighborname="Malaysia"direction="N"/></country><countryname="Panama"><rank>68</rank><year>2011</year><gdppc>13600</gdppc><neighborname="Costa Rica"direction="W"/><neighborname="Colombia"direction="E"/></country></data>
In [31]: paste
def test():
import lxml.etree as et
xml = et.parse("/home/padraic/untitled6/test.xml")
for node in xml.xpath("//neighbor"):
node.getparent().remove(node)
a = et.tostring(xml)
from xml.etree import ElementTree as et
xml = et.parse("/home/padraic/untitled6/test.xml")
for parent in xml.getroot().iter("*"):
parent[:] = (child for child in parent if child.tag != "neighbor")
b = et.tostring(xml.getroot())
assert a == b
## -- End pasted text --
In [32]: test()
Solution 2:
Whenever modifying XML documents is needed, consider also XSLT, the special-purpose language part of the XSL family which includes XPath. XSLT is designed specifically to transform XML files. Pythoners are not quick to recommend it but it avoids the need of loops or nested if/then logic in general purpose code. Python's lxml
module can run XSLT 1.0 scripts using the libxslt processor.
Below transformation runs the identity transform to copy document as is and then runs an empty template match on <neighbor>
to remove it:
XSLT Script (save as an .xsl file to be loaded just like source .xml, both of which are well-formed xml files)
<xsl:transformxmlns:xsl="http://www.w3.org/1999/XSL/Transform"version="1.0"><xsl:outputversion="1.0"encoding="UTF-8"indent="yes" /><xsl:strip-spaceelements="*"/><!-- IDENTITY TRANSFORM TO COPY XML AS IS --><xsl:templatematch="@*|node()"><xsl:copy><xsl:apply-templatesselect="@*|node()"/></xsl:copy></xsl:template><!-- EMPTY TEMPLATE TO REMOVE NEIGHBOR WHEREVER IT EXISTS --><xsl:templatematch="neighbor"/></xsl:transform>
Python Script
import lxml.etree as et
# LOAD XML AND XSL DOCUMENTS
xml = et.parse("Input.xml")
xslt = et.parse("Script.xsl")
# TRANSFORM TO NEW TREE
transform = et.XSLT(xslt)
newdom = transform(xml)
# CONVERT TO STRING
tree_out = et.tostring(newdom, encoding='UTF-8', pretty_print=True, xml_declaration=True)
# OUTPUT TO FILE
xmlfile = open('Output.xml'),'wb')
xmlfile.write(tree_out)
xmlfile.close()
Solution 3:
The trick here is to find the parent (the country node), and delete the neighbor from there. In this example, I am using ElementTree because I am somewhat familiar with it:
import xml.etree.ElementTree as ET
if __name__ == '__main__':
withopen('debug.log') as f:
doc = ET.parse(f)
for country in doc.findall('.//country'):
for neighbor in country.findall('neighbor'):
country.remove(neighbor)
ET.dump(doc) # Display
Post a Comment for "Python - How To Write Empty Tree Node As Empty String To Xml File"