Skip to content Skip to sidebar Skip to footer

Python Regex Replacing String That Should Not Match

Update: This issue was caused by a bug in the regex module which was resolved by the developer in commit be893e9 If you encounter a similiar problem, update your regex module. Yo

Solution 1:

The original script is using the alternative regex module instead of the standard library re module.

import regex as re

There's clearly some difference between the two in this case. My guess is that it has something to do with nested groups. This expression contains a capturing group within a non-capturing group, which is way too magical for my taste.

import re     # standard library
import regex  # completely different implementation

content = '"Erm....yes. Thank you for that."'
pattern = r"(?i)(?<=\b)(?:(\w{1,3})(?:-|\.{2,10})[\t ]?)+(\1\w{2,})"
substitute = r"\1-\2"

print(re.sub(pattern, substitute, content))
print(regex.sub(pattern, substitute, content))

Output:

"Erm....yes. Thank you for that."
"-yes. Thank you for that."

Post a Comment for "Python Regex Replacing String That Should Not Match"