Reading A Spreadsheet Like .xml With Elementtree
I am reading an xml file using ElementTree but there is a Cell in which I cannot read its data. I adapted my file to make a reproducable example that I present next: from xml.etre
Solution 1:
Question: Reading a spreadsheet like .xml with ElementTree
Documentation: The lxml.etree Tutorial- Namespaces
Define the
namespaces
usedns = {'ss':"urn:schemas-microsoft-com:office:spreadsheet", 'html':"http://www.w3.org/TR/REC-html40" }
Use the
namespaces
withfind(.../findall(...
tree = ElementTree.parse(io.StringIO(xmlf)) root = tree.getroot() for ws in root.findall('ss:Worksheet', ns): for table in ws.findall('ss:Row', ns): for c in table.findall('ss:Cell', ns): data = c.find('ss:Data', ns) ifdata.text is None: text = [] data = data.findall('html:Font', ns) for element indata: text.append(element.text) data_text = ''.join(text) print(data_text) else: print(data.text)
Output:
A B C CAN'T READ THIS D
Tested with Python: 3.5
Solution 2:
The text content of the fourth cell belongs to the two Font
subelements, which are bound to another namespace. Demo:
for e in root.iter():
text = e.text.strip() if e.textelse None
iftext:
print(e, text)
Output:
<Element {urn:schemas-microsoft-com:office:spreadsheet}Dataat0x7f8013d01dc8> A
<Element {urn:schemas-microsoft-com:office:spreadsheet}Dataat0x7f8013d01dc8> B
<Element {urn:schemas-microsoft-com:office:spreadsheet}Dataat0x7f8013d01dc8> C
<Element {http://www.w3.org/TR/REC-html40}Fontat0x7f8013d01e08> CAN'T READ
<Element {http://www.w3.org/TR/REC-html40}Fontat0x7f8013d01e48> THIS
<Element {urn:schemas-microsoft-com:office:spreadsheet}Dataat0x7f8013d01e48> D
Post a Comment for "Reading A Spreadsheet Like .xml With Elementtree"