Skip to content Skip to sidebar Skip to footer

Pandas: Fill A Column With Some Numpy Arrays

I am using python2.7 and pandas 0.11.0. I try to fill a column of a dataframe using DataFrame.apply(func). The func() function is supposed to return a numpy array (1x3). import pan

Solution 1:

If you try to return multiple values from the function that is passed to apply, and the DataFrame you call the apply on has the same number of item along the axis (in this case columns) as the number of values you returned, Pandas will create a DataFrame from the return values with the same labels as the original DataFrame. You can see this if you just do:

>>>deftest(row):
        return [1, 2, 3]
>>>df= pd.DataFrame(np.random.randn(4, 3), columns=list('ABC'))>>>df.apply(test, axis=1)
   A  B  C
0  1  2  3
1  1  2  3
2  1  2  3
3  1  2  3

And that is why you get the error, since you cannot assign a DataFrame to DataFrame column.

If you return any other number of values, it will return just a series object, that can be assigned:

>>>deftest(row):
       return [1, 2]
>>>df= pd.DataFrame(np.random.randn(4, 3), columns=list('ABC'))>>>df.apply(test, axis=1)
0    [1, 2]
1    [1, 2]
2    [1, 2]
3    [1, 2]
>>>df['D'] = df.apply(test, axis=1)>>>df
          A         B         C       D
0  0.333535  0.209745 -0.972413  [1, 2]
1  0.469590  0.107491 -1.248670  [1, 2]
2  0.234444  0.093290 -0.853348  [1, 2]
3  1.021356  0.092704 -0.406727  [1, 2]

I'm not sure why Pandas does this, and why it does it only when the return value is a list or an ndarray, since it won't do it if you return a tuple:

>>>deftest(row):
        return (1, 2, 3)
>>>df= pd.DataFrame(np.random.randn(4, 3), columns=list('ABC'))>>>df['D'] = df.apply(test, axis=1)>>>df
          A         B         C          D
0  0.121136  0.541198 -0.281972  (1, 2, 3)
1  0.569091  0.944344  0.861057  (1, 2, 3)
2 -1.742484 -0.077317  0.181656  (1, 2, 3)
3 -1.541244  0.174428  0.660123  (1, 2, 3)

Post a Comment for "Pandas: Fill A Column With Some Numpy Arrays"