Skip to content Skip to sidebar Skip to footer

How To Extract Substring From Varible Length Column In Pandas Dataframe?

Hi there I am trying to accomplish something similar to the mid function in excel with a column in a pandas dataframe in python. I have a column with medication names + strengths,

Solution 1:

You can use str.partition [pandas-doc] here:

df['GENERIC_NAME'] = df['MEDICATION_NAME'].str.partition(' ')[0]

For the given column this gives:

>>> g.str.partition(' ')[0]0acetaminophen1a-hydrocortName: 0, dtype: object

partition itself creates from a series a dataframe with three columns: before, match, and after :

>>>df['MEDICATION_NAME'].str.partition(' ')
               0  1            2
0  acetaminophen          325 mg
1    a-hydrocort     100 mg/2 ml

Solution 2:

DO with str.split

df['MEDICATION_NAME'].str.split(n=1).str[0]
Out[345]: 
0    acetaminophen
1      a-hydrocort
Name: MEDICATION_NAME, dtype: object
#df['GENERIC_NAME']=df['MEDICATION_NAME'].str.split(n=1).str[0]

Solution 3:

Use str.extract to use full regex features:

df["GENERIC_NAME"] = df["MEDICATION_NAME"].str.extract(r'([^\s]+)')

This capture the first word bounded by space. So will protect against instances where there are a space first.

Solution 4:

Try this:

df['GENERIC_NAME'] = df['MEDICATION_NAME'].str.split(" ")[0]

Post a Comment for "How To Extract Substring From Varible Length Column In Pandas Dataframe?"