How To Extract Substring From Varible Length Column In Pandas Dataframe?
Hi there I am trying to accomplish something similar to the mid function in excel with a column in a pandas dataframe in python. I have a column with medication names + strengths,
Solution 1:
You can use str.partition
[pandas-doc
] here:
df['GENERIC_NAME'] = df['MEDICATION_NAME'].str.partition(' ')[0]
For the given column this gives:
>>> g.str.partition(' ')[0]0acetaminophen1a-hydrocortName: 0, dtype: object
partition
itself creates from a series a dataframe with three columns: before, match, and after :
>>>df['MEDICATION_NAME'].str.partition(' ')
0 1 2
0 acetaminophen 325 mg
1 a-hydrocort 100 mg/2 ml
Solution 2:
DO with str.split
df['MEDICATION_NAME'].str.split(n=1).str[0]
Out[345]:
0 acetaminophen
1 a-hydrocort
Name: MEDICATION_NAME, dtype: object
#df['GENERIC_NAME']=df['MEDICATION_NAME'].str.split(n=1).str[0]
Solution 3:
Use str.extract
to use full regex features:
df["GENERIC_NAME"] = df["MEDICATION_NAME"].str.extract(r'([^\s]+)')
This capture the first word bounded by space. So will protect against instances where there are a space first.
Solution 4:
Try this:
df['GENERIC_NAME'] = df['MEDICATION_NAME'].str.split(" ")[0]
Post a Comment for "How To Extract Substring From Varible Length Column In Pandas Dataframe?"