Skip to content Skip to sidebar Skip to footer

Filter Dataframe If Column Is In Any Part Of List

I'm trying to filter a dataframe to rows with column values that are in a list. However, the value in the column will not be an exact match to the list. Can I do some sort of wildc

Solution 1:

This is a more complicated string matching problem than usual, but you can use a list comprehension for performance:

lst = ["123 ABC", "456 DEF", "789 GHI"]
df['match'] = [any(x in l for l in lst) for x in df['idlist']]
df

   id idlist  match
0   0    ABC   True
1   1    XYZ  False

To simply filter, use

df[[any(x in l for l in lst) for x in df['idlist']]]

   id idlist
0   0    ABC

List comprehensions are my to-go syntax for many string operations. I've written a detailed writeup about their advantages in For loops with pandas - When should I care?.

If you need to handle NaNs, use a function with try-catch handling.

def search(x, lst):
    try:
        return any(x in l for l in lst)
    except TypeError:
        return False

df[[search(x, lst) for x in df['idlist']]]

   id idlist
0   0    ABC

Solution 2:

You can use the operator library to check if the string is partly inlcuded:

import operator
operator.contains('ABC','123 ABC')

Post a Comment for "Filter Dataframe If Column Is In Any Part Of List"