Filtering Products Based On Description Scenarios And Status In Python Pandas
Let's say I have the following product descriptions in a Pandas DataFrame. I would like to keep all product descriptions of products that satisfy the following condition: For ever
Solution 1:
Use:
#create dictionary by scenaries
d = {'scenario{}'.format(k):v for k, v in enumerate(scenario_descriptions, 1)}
#unique id for reindex
uniq_id = df['id'].unique()
def f(x):
#check if all description
c = set(x['description']) >= set(v)
#check if 4,5 or 6 value
d = x['status'].isin([4,5,6]).all()
return (c & d)
d1 = {}
for k, v in d.items():
#filter df by scenary first for remove not relevant rows
a = df[df['description'].isin(v)]
#call groupby with custom function
b = a.groupby('id').apply(f)
#add missing ids and fill by False
#output to dictionary
d1[k] = b.reindex(uniq_id, fill_value=False)
print (d1)
{'scenario1': id
1 False
2 False
dtype: bool, 'scenario4': id
1 False
2 False
dtype: bool, 'scenario5': id
1 True
2 False
dtype: bool, 'scenario3': id
1 True
2 False
dtype: bool, 'scenario2': id
1 True
2 False
dtype: bool}
#reduce dict to DataFrame and check at least one True per row
m = pd.concat(d1, axis=1).any(axis=1)
print (m)
id
1 True
2 False
#last filtering
df = df[df['id'].isin(m.index[m])]
print (df)
id description status
0 1 world1 1
1 1 world2 4
2 1 world3 1
3 1 world4 4
4 1 world5 4
5 1 world6 4
6 1 world7 1
7 1 world8 4
8 1 world9 4
9 1 world10 4
10 1 world11 4
11 1 world12 4
12 1 world13 4
13 1 world14 4
14 1 world15 1
Solution 2:
Use
In [260]: product_descriptions.groupby('id').filter(
...: lambda x: all(any(w in x.description.values for w in L)
...: for L in scenario_descriptions))
Out[260]:
id description status
0 1 world1 1
1 1 world2 4
2 1 world3 1
3 1 world4 4
4 1 world5 4
5 1 world6 4
6 1 world7 1
7 1 world8 4
8 1 world9 4
9 1 world10 4
10 1 world11 4
11 1 world12 4
12 1 world13 4
13 1 world14 4
14 1 world15 1
Post a Comment for "Filtering Products Based On Description Scenarios And Status In Python Pandas"