Pandas Data Frame From Dictionary
Solution 1:
Try following code:
import pandas
sample={'user1': {'item1': 2.5, 'item2': 3.5, 'item3': 3.0, 'item4': 3.5, 'item5': 2.5, 'item6': 3.0},
'user2': {'item1': 2.5, 'item2': 3.0, 'item3': 3.5, 'item4': 4.0},
'user3': {'item2':4.5,'item5':1.0,'item6':4.0}}
df = pandas.DataFrame([
[col1,col2,col3] for col1, d in sample.items() for col2, col3 in d.items()
])
Solution 2:
I think the operation you're after -- to unpivot a table -- is called "melting". In this case, the hard part can be done by pd.melt
, and everything else is basically renaming and reordering:
df = pd.DataFrame(sample).reset_index().rename(columns={"index": "item"})
df = pd.melt(df, "item", var_name="user").dropna()
df = df[["user", "item", "value"]].reset_index(drop=True)
Simply calling DataFrame
produces something which has the information we want but has the wrong shape:
>>> df = pd.DataFrame(sample)
>>> df
user1 user2 user3
item1 2.5 2.5 NaN
item2 3.5 3.0 4.5
item3 3.0 3.5 NaN
item4 3.5 4.0 NaN
item5 2.5 NaN 1.0
item6 3.0 NaN 4.0
So let's promote the index to a real column and improve the name:
>>> df = pd.DataFrame(sample).reset_index().rename(columns={"index": "item"})
>>> df
item user1 user2 user3
0 item1 2.5 2.5 NaN
1 item2 3.5 3.0 4.5
2 item3 3.0 3.5 NaN
3 item4 3.5 4.0 NaN
4 item5 2.5 NaN 1.0
5 item6 3.0 NaN 4.0
Then we can call pd.melt
to turn the columns. If we don't specify the variable name we want, "user", it'll give it the boring name of "variable" (just like it gives the data itself the boring name "value").
>>> df = pd.melt(df, "item", var_name="user").dropna()
>>> df
item user value
0 item1 user1 2.5
1 item2 user1 3.5
2 item3 user1 3.0
3 item4 user1 3.5
4 item5 user1 2.5
5 item6 user1 3.0
6 item1 user2 2.5
7 item2 user2 3.0
8 item3 user2 3.5
9 item4 user2 4.0
13 item2 user3 4.5
16 item5 user3 1.0
17 item6 user3 4.0
Finally, we can reorder and renumber the indices:
>>> df = df[["user", "item", "value"]].reset_index(drop=True)
>>> df
user item value
0 user1 item1 2.5
1 user1 item2 3.5
2 user1 item3 3.0
3 user1 item4 3.5
4 user1 item5 2.5
5 user1 item6 3.0
6 user2 item1 2.5
7 user2 item2 3.0
8 user2 item3 3.5
9 user2 item4 4.0
10 user3 item2 4.5
11 user3 item5 1.0
12 user3 item6 4.0
melt
is pretty useful once you get used to it. Usually, as here, you do some renaming/reordering before and after.
Solution 3:
I provide another possibility here using pd.stack
:
df = pd.DataFrame(sample)
df = df.T.stack().reset_index()
Detailed explanations
In [24]: df = pd.DataFrame(sample)
In [25]: df
Out[25]:
user1 user2 user3
item1 2.5 2.5 NaN
item2 3.5 3.0 4.5
item3 3.0 3.5 NaN
item4 3.5 4.0 NaN
item5 2.5 NaN 1.0
item6 3.0 NaN 4.0
Applying stack
will pivot the column axis on a sublevel of the row axis already indexed by item
. As you want user
first, let's do the operation on the transposed DataFrame by using .T
:
In [34]: df = df.T.stack()
In [35]: df
Out[35]:
user1 item1 2.5
item2 3.5
item3 3.0
item4 3.5
item5 2.5
item6 3.0
user2 item1 2.5
item2 3.0
item3 3.5
item4 4.0
user3 item2 4.5
item5 1.0
item6 4.0
dtype: float64
You expect basic columns and not index, so just reset the index:
In [36]: df = df.reset_index()
In [37]: df
Out[37]:
level_0 level_1 0
0 user1 item1 2.5
1 user1 item2 3.5
2 user1 item3 3.0
3 user1 item4 3.5
4 user1 item5 2.5
5 user1 item6 3.0
6 user2 item1 2.5
7 user2 item2 3.0
8 user2 item3 3.5
9 user2 item4 4.0
10 user3 item2 4.5
11 user3 item5 1.0
12 user3 item6 4.0
Solution 4:
This one is very similar to the melt
solution provided by DSM:
df = DataFrame(sample)
df = df.unstack().dropna().reset_index()
df = df.rename(columns={'level_0':'col1', 'level_1':'col2', 0:'col3'})
Solution 5:
You could try doing it like this perhaps.
temp=[]
for item in sample:
temp.append(pandas.DataFrame(item))
self.results = pandas.concat(temp)
Post a Comment for "Pandas Data Frame From Dictionary"