Skip to content Skip to sidebar Skip to footer

Bizarre Issue With Pandas' .groupby Function, When Function Applied To Rows

I have a set of CSV data that is 4203x37 which I reshape to 50436x4 in order to find the Euclidean distance between 12 sets of 3D points, recorded at each time-step. This does not

Solution 1:

If you run the following code

df_f_2_norm.Time.value_counts()

Then you can find that not all time value has 12 rows.

Here is the output:

1.333    492
1.383    492
1.317    492
1.400    492
1.467    492
1.450    492
1.483    492
1.417    492
1.500    492
1.367    492
1.350    492
1.433    492
1.533    480
1.517    480
1.550    468
...
4.800    12
4.600    12
4.750    12
4.833    12
4.667    12
4.700    12
4.650    12
4.683    12
4.633    12
4.617    12
4.817    12
4.583    12
4.733    12
4.767    12
4.783    12
Length: 272, dtype: int64

If you want to group the dataframe every 12 rows, you can:

import pandas as pd
from scipy.spatial.distance import pdist, squareform

df_f_2_norm = pd.read_csv("astrid_data.csv")
g = np.repeat(np.arange(df_f_2_norm.shape[0]//12), 12)

N = 12

N_lim = int(0.5*N*(N-1)) 
result_index = ['D{}'.format(tag) for tag inrange(1,N_lim+1)] # Column labels
two_norm = df_f_2_norm.groupby(g)[["X", "Y", "Z"]].apply(lambda g: pd.Series(pdist(g), index=result_index))

Post a Comment for "Bizarre Issue With Pandas' .groupby Function, When Function Applied To Rows"