Skip to content Skip to sidebar Skip to footer

Remove Last Two Characters From Column Names Of All The Columns In Dataframe - Pandas

I am joining the two dataframes (a,b) with identical columns / column names using the user ID key and while joining, I had to give suffix characters, in order for it to get created

Solution 1:

This snippet should get the job done :

df.columns = pd.Index(map(lambda x : str(x)[:-2], df.columns))

Edit : This is a better way to do it

df.rename(columns = lambda x : str(x)[:-2])

In both cases, all we're doing is iterating through the columns and apply some function. In this case, the function converts something into a string and takes everything up until the last two characters.

I'm sure there are a few other ways you could do this.

Solution 2:

You could use str.rstrip like so

In [214]: import functools as ft

In [215]: f = ft.partial(np.random.choice, *[5, 3])

In [225]: df = pd.DataFrame({'a': f(), 'b': f(), 'c': f(), 'a_1': f(), 'b_1': f(), 'c_1': f()})

In [226]: df
Out[226]:
   a  b  c  a_1  b_1  c_1
0  4  2  0    2    3    2
1  0  0  3    2    1    1
2  4  0  4    4    4    3

In [227]: df.columns = df.columns.str.rstrip('_1')

In [228]: df
Out[228]:
   a  b  c  a  b  c
0  4  2  0  2  3  2
1  0  0  3  2  1  1
2  4  0  4  4  4  3

However if you need something more flexible (albeit probably a bit slower), you can use str.extract which, with the power of regexes, will allow you to select which part of the column name you would like to keep

In [216]: df = pd.DataFrame({f'{c}_{i}': f() for i inrange(3) for c in'abc'})

In [217]: df
Out[217]:
   a_0  b_0  c_0  a_1  b_1  c_1  a_2  b_2  c_2
001022400310031424322201002221

In [223]: df.columns = df.columns.str.extract(r'(.*)_\d+')[0]

In [224]: df
Out[224]:
0  a  b  c  a  b  c  a  b  c
011000211211010120412131342011

Idea to use df.columns.str came from this answer

Post a Comment for "Remove Last Two Characters From Column Names Of All The Columns In Dataframe - Pandas"