Format Problem Categorizing Time In Pandas
I'm trying to convert hours to an categoric format, the column is like this with hundreds of entries. hr_animals 21:25:00 10:36:00 23:17:00 01:23:00 NA 13:30:00 NA And i want the
Solution 1:
That's probably because your data is not datetime
type so you cannot use .dt
access. To fix your code, the 3rd option should be:
pd.to_datetime(pamdf['hr_animals'], format='%H:%M:%S', errors='coerce').dt.hour
That said, your format is better suited for timedelta
instead of DateTime
. Try:
df['hr_animals'] = ((pd.to_timedelta(df['hr_animals'], errors='coerce')
// pd.Timedelta('4H') )
.add(1)
.replace({1: 'Dawn',
2: 'Early Morning',
3: 'Morning',
4: 'Noon',
5: 'Evening',
6: 'Night'})
)
Output:
hr_animals
0 Night
1 Morning
2 Night
3 Dawn
4NaN5 Noon
6NaN
Another option is to use pd.cut
, which returns a categorical column. This might be helpful because the labels will be ordered, i.e. Dawn < Early Morning<...
:
df['hr_animals'] = pd.cut(pd.to_timedelta(df['hr_animals'], errors='coerce'),
bins = pd.to_timedelta(np.arange(0,25,4), unit='H'),
labels=['Dawn','Early Morning', 'Morning',
'Noon', 'Evening', 'Night']
)
Post a Comment for "Format Problem Categorizing Time In Pandas"