Python Selenium ().text Returns "’" Instead Of Apostrophe (')
Solution 1:
Your issue is that the apostrophe that's being misinterpreted is not a normal apostrophe character '
but instead the Unicode character for a right single quote: ’
. The reason it turns into mojibake is that you're decoding the content incorrectly. It's in UTF-8 (so ’
is represented by the three bytes \xe2\x80\x99
), but you're decoding it with Codepage 1252 (where the three bytes \xe2\x80\x99
represent three separate characters, â
, €
, and ™
).
Since you haven't shown much code, I can't offer any suggestions on how to fix the decoding issue, but there is probably a way to request Selenium to use UTF-8 (I'm frankly surprised it's not the default). Alternatively, you might be able to get the raw bytes and decode the text yourself.
While it would be best to avoid the mis-decoding, if you really need to fix up your strings after they've been turned to mojibake, the best approach is probably to re-encode them the same way they were mis-decoded, then decode again, correctly this time:
badtext = 'America’s'encoded = badtext.encode('cp1252')
goodtext = encoded.decode('utf-8') # 'America’s'
Post a Comment for "Python Selenium ().text Returns "’" Instead Of Apostrophe (')"