Skip to content Skip to sidebar Skip to footer

Attempting To Find The 5 Largest Values Per Month Using Groupby

I am attempting to show the top three values of nc_type for each month. I tried using n_largest but that doesn't do it by date. Original Data: area

Solution 1:

Scenario 1MultiIndex series

occurred_datenc_type1.0x3y4z13w24f3412.0d18g10w44a27g42Name:test,dtype:int64

Call sort_values + groupby + head:

df.sort_values(ascending=False).groupby(level=0).head(2)

occurred_date  nc_type
12.0           w          44
               g          421.0            f          34
               w          24
Name: test, dtype: int64

Change head(2) to head(5) for your situation.

Or, expanding upon my comment with nlargest, you could do:

df.groupby(level=0).nlargest(2).reset_index(level=0, drop=1)

occurred_date  nc_type
1.0            f          34
               w          2412.0           w          44
               g          42
Name: test, dtype: int64

Scenario 23-col dataframe

   occurred_date nc_type  value
01.0       x      311.0       y      421.0       z     1331.0       w     2441.0       f     34512.0       d     18612.0       g     10712.0       w     44812.0a27912.0       g     42

You can use sort_values + groupby + head:

df.sort_values(['occurred_date', 'value'], 
        ascending=[True, False]).groupby('occurred_date').head(2)

   occurred_date nc_type  value
41.0       f     3431.0       w     24712.0       w     44912.0       g     42

Change head(2) to head(5) for your scenario.


Scenario 3MultiIndex Dataframe

test
occurred_date nc_type      
1.0           x           3
              y           4
              z          13
              w          24
              f          34
12.0          d          18
              g          10
              w          44
              a          27
              g          42

Or, with nlargest.

df.groupby(level=0).test.nlargest(2)\
              .reset_index(level=0, drop=1)

occurred_date  nc_type
1.0            f          34
               w          2412.0           w          44
               g          42
Name: test, dtype: int64

Solution 2:

I'd include group_keys=False

df.groupby('occurred_date', group_keys=False).nlargest(3)

occurred_date  nc_type
1.0            f          34
               w          24
               z          1312.0           w          44
               g          42
               a          27
Name: value, dtype: int64

Post a Comment for "Attempting To Find The 5 Largest Values Per Month Using Groupby"