Filter Dataframe If Column Is In Any Part Of List
I'm trying to filter a dataframe to rows with column values that are in a list. However, the value in the column will not be an exact match to the list. Can I do some sort of wildc
Solution 1:
This is a more complicated string matching problem than usual, but you can use a list comprehension for performance:
lst = ["123 ABC", "456 DEF", "789 GHI"]
df['match'] = [any(x in l for l in lst) for x in df['idlist']]
df
id idlist match
0 0 ABC True
1 1 XYZ False
To simply filter, use
df[[any(x in l for l in lst) for x in df['idlist']]]
id idlist
0 0 ABC
List comprehensions are my to-go syntax for many string operations. I've written a detailed writeup about their advantages in For loops with pandas - When should I care?.
If you need to handle NaNs, use a function with try-catch handling.
def search(x, lst):
try:
return any(x in l for l in lst)
except TypeError:
return False
df[[search(x, lst) for x in df['idlist']]]
id idlist
0 0 ABC
Solution 2:
You can use the operator library to check if the string is partly inlcuded:
import operator
operator.contains('ABC','123 ABC')
Post a Comment for "Filter Dataframe If Column Is In Any Part Of List"