Bizarre Issue With Pandas' .groupby Function, When Function Applied To Rows
I have a set of CSV data that is 4203x37 which I reshape to 50436x4 in order to find the Euclidean distance between 12 sets of 3D points, recorded at each time-step. This does not
Solution 1:
If you run the following code
df_f_2_norm.Time.value_counts()
Then you can find that not all time value has 12 rows.
Here is the output:
1.333 492
1.383 492
1.317 492
1.400 492
1.467 492
1.450 492
1.483 492
1.417 492
1.500 492
1.367 492
1.350 492
1.433 492
1.533 480
1.517 480
1.550 468
...
4.800 12
4.600 12
4.750 12
4.833 12
4.667 12
4.700 12
4.650 12
4.683 12
4.633 12
4.617 12
4.817 12
4.583 12
4.733 12
4.767 12
4.783 12
Length: 272, dtype: int64
If you want to group the dataframe every 12 rows, you can:
import pandas as pd
from scipy.spatial.distance import pdist, squareform
df_f_2_norm = pd.read_csv("astrid_data.csv")
g = np.repeat(np.arange(df_f_2_norm.shape[0]//12), 12)
N = 12
N_lim = int(0.5*N*(N-1))
result_index = ['D{}'.format(tag) for tag inrange(1,N_lim+1)] # Column labels
two_norm = df_f_2_norm.groupby(g)[["X", "Y", "Z"]].apply(lambda g: pd.Series(pdist(g), index=result_index))
Post a Comment for "Bizarre Issue With Pandas' .groupby Function, When Function Applied To Rows"