Skip to content Skip to sidebar Skip to footer

Splitting A String With A Unicode Delimiter?

Given the string: str = 'Led Zeppelin — Blackdog' how do I split it at —, ending up with: ['Led Zeppelin', 'Blackdog'] but — is not an hyphen; it is encoded as u'\u2014' ho

Solution 1:

You can just split on explicitly what you've provided if you want it to be clear that it is not a hyphen, surrounded by a whitespace character if that is standard-included with the character. Also, don't shadow built-ins with str as a variable name.

>>> s = 'Led Zeppelin — Blackdog'
>>> s.split(u' \u2014 ')
['Led Zeppelin', 'Blackdog']
>>> s.split(' — ') # perhaps less explicit
['Led Zeppelin', 'Blackdog']

Post a Comment for "Splitting A String With A Unicode Delimiter?"