Skip to content Skip to sidebar Skip to footer

Bytes() Initializer Adding An Additional Byte?

I initialize a utf-8 encoding string in python3: bytes('\xc2', encoding='utf-8', errors='strict') but on writing it out I get two bytes! >>> s = bytes('\xc2', encoding='u

Solution 1:

The Unicode codepoint "\xc2" (which can also be written as "Â"), is two bytes long when encoded with the utf-8 encoding. If you were expecting it to be the single byte b'\xc2', you probably want to use a different encoding, such as "latin-1":

>>>s = bytes("\xc2", encoding="latin-1", errors="strict")>>>s
b'\xc2'

If you area really creating "\xc2" directly with a literal though, there's no need to mess around with the bytes constructor to turn it into a bytes instance. Just use the b prefix on the literal to create the bytes directly:

s = b"\xc2"

Post a Comment for "Bytes() Initializer Adding An Additional Byte?"