Converting Pandas Dataframe Types
I have a pandas dataFrame created through a mysql call which returns the data as object type. The data is mostly numeric, with some 'na' values. How can I cast the type of the data
Solution 1:
Use the replace method on dataframes:
import numpy as np
df = DataFrame({
'k1': ['na'] * 3 + ['two'] * 4,
'k2': [1, 'na', 2, 'na', 3, 4, 4]})
printdfdf = df.replace('na', np.nan)
printdf
I think it's helpful to point out that df.replace('na', np.nan) by itself won't work. You must assign it back to the existing dataframe.
Solution 2:
df = df.convert_objects(convert_numeric=True)
will work in most cases.
I should note that this copies the data. It would be preferable to get it to a numeric type on the initial read. If you post your code and a small example, someone might be able to help you with that.
Solution 3:
This is what Tom suggested and is correct
In [134]: s = pd.Series(['1','2.','na'])
In [135]: s.convert_objects(convert_numeric=True)
Out[135]:
01122 NaN
dtype: float64
As Andy points out, this doesn't work directly (I think that's a bug), so convert to all string elements first, then convert
In [136]: s2 = pd.Series(['1','2.','na',5])
In [138]: s2.astype(str).convert_objects(convert_numeric=True)
Out[138]:
01122 NaN
35
dtype: float64
Post a Comment for "Converting Pandas Dataframe Types"