Skip to content Skip to sidebar Skip to footer

Format Problem Categorizing Time In Pandas

I'm trying to convert hours to an categoric format, the column is like this with hundreds of entries. hr_animals 21:25:00 10:36:00 23:17:00 01:23:00 NA 13:30:00 NA And i want the

Solution 1:

That's probably because your data is not datetime type so you cannot use .dt access. To fix your code, the 3rd option should be:

pd.to_datetime(pamdf['hr_animals'], format='%H:%M:%S', errors='coerce').dt.hour

That said, your format is better suited for timedelta instead of DateTime. Try:

df['hr_animals'] = ((pd.to_timedelta(df['hr_animals'], errors='coerce')
                      // pd.Timedelta('4H') )
                      .add(1)
                      .replace({1: 'Dawn',
                      2: 'Early Morning',
                      3: 'Morning',
                      4: 'Noon',
                      5: 'Evening',
                      6: 'Night'})
                   )

Output:

  hr_animals
0      Night
1    Morning
2      Night
3       Dawn
4NaN5       Noon
6NaN

Another option is to use pd.cut, which returns a categorical column. This might be helpful because the labels will be ordered, i.e. Dawn < Early Morning<...:

df['hr_animals'] = pd.cut(pd.to_timedelta(df['hr_animals'], errors='coerce'),
                          bins = pd.to_timedelta(np.arange(0,25,4), unit='H'),
                          labels=['Dawn','Early Morning', 'Morning', 
                                  'Noon', 'Evening', 'Night']
                   )

Post a Comment for "Format Problem Categorizing Time In Pandas"