Remove Last Two Characters From Column Names Of All The Columns In Dataframe - Pandas
I am joining the two dataframes (a,b) with identical columns / column names using the user ID key and while joining, I had to give suffix characters, in order for it to get created
Solution 1:
This snippet should get the job done :
df.columns = pd.Index(map(lambda x : str(x)[:-2], df.columns))
Edit : This is a better way to do it
df.rename(columns = lambda x : str(x)[:-2])
In both cases, all we're doing is iterating through the columns and apply some function. In this case, the function converts something into a string and takes everything up until the last two characters.
I'm sure there are a few other ways you could do this.
Solution 2:
You could use str.rstrip
like so
In [214]: import functools as ft
In [215]: f = ft.partial(np.random.choice, *[5, 3])
In [225]: df = pd.DataFrame({'a': f(), 'b': f(), 'c': f(), 'a_1': f(), 'b_1': f(), 'c_1': f()})
In [226]: df
Out[226]:
a b c a_1 b_1 c_1
0 4 2 0 2 3 2
1 0 0 3 2 1 1
2 4 0 4 4 4 3
In [227]: df.columns = df.columns.str.rstrip('_1')
In [228]: df
Out[228]:
a b c a b c
0 4 2 0 2 3 2
1 0 0 3 2 1 1
2 4 0 4 4 4 3
However if you need something more flexible (albeit probably a bit slower), you can use str.extract
which, with the power of regexes, will allow you to select which part of the column name you would like to keep
In [216]: df = pd.DataFrame({f'{c}_{i}': f() for i inrange(3) for c in'abc'})
In [217]: df
Out[217]:
a_0 b_0 c_0 a_1 b_1 c_1 a_2 b_2 c_2
001022400310031424322201002221
In [223]: df.columns = df.columns.str.extract(r'(.*)_\d+')[0]
In [224]: df
Out[224]:
0 a b c a b c a b c
011000211211010120412131342011
Idea to use df.columns.str
came from this answer
Post a Comment for "Remove Last Two Characters From Column Names Of All The Columns In Dataframe - Pandas"