Skip to content Skip to sidebar Skip to footer

Filtering Products Based On Description Scenarios And Status In Python Pandas

Let's say I have the following product descriptions in a Pandas DataFrame. I would like to keep all product descriptions of products that satisfy the following condition: For ever

Solution 1:

Use:

#create dictionary by scenaries
d = {'scenario{}'.format(k):v for k, v in enumerate(scenario_descriptions, 1)}

#unique id for reindex
uniq_id = df['id'].unique()

def f(x):
    #check if all description
    c = set(x['description']) >= set(v)
    #check if 4,5 or 6 value
    d = x['status'].isin([4,5,6]).all()
    return (c & d)

d1 = {}
for k, v in d.items():
     #filter df by scenary first for remove not relevant rows
     a = df[df['description'].isin(v)]
     #call groupby with custom function 
     b = a.groupby('id').apply(f)
     #add missing ids and fill by False
     #output to dictionary 
     d1[k] = b.reindex(uniq_id, fill_value=False)

print (d1)
{'scenario1': id
1    False
2    False
dtype: bool, 'scenario4': id
1    False
2    False
dtype: bool, 'scenario5': id
1     True
2    False
dtype: bool, 'scenario3': id
1     True
2    False
dtype: bool, 'scenario2': id
1     True
2    False
dtype: bool}

#reduce dict to DataFrame and check at least one True per row
m = pd.concat(d1, axis=1).any(axis=1)
print (m)
id
1     True
2    False

#last filtering
df = df[df['id'].isin(m.index[m])]
print (df)
    id description  status
0    1      world1       1
1    1      world2       4
2    1      world3       1
3    1      world4       4
4    1      world5       4
5    1      world6       4
6    1      world7       1
7    1      world8       4
8    1      world9       4
9    1     world10       4
10   1     world11       4
11   1     world12       4
12   1     world13       4
13   1     world14       4
14   1     world15       1

Solution 2:

Use

In [260]: product_descriptions.groupby('id').filter(
     ...:   lambda x: all(any(w in x.description.values for w in L)
     ...:                 for L in scenario_descriptions))
Out[260]:
    id description  status
0    1      world1       1
1    1      world2       4
2    1      world3       1
3    1      world4       4
4    1      world5       4
5    1      world6       4
6    1      world7       1
7    1      world8       4
8    1      world9       4
9    1     world10       4
10   1     world11       4
11   1     world12       4
12   1     world13       4
13   1     world14       4
14   1     world15       1

Post a Comment for "Filtering Products Based On Description Scenarios And Status In Python Pandas"