Skip to content Skip to sidebar Skip to footer

Pandas Keeps Converting Strings To Int

I have the following code from this question Df groupby set comparison: import pandas as pd wordlist = pd.read_csv('data/example.txt', sep='\r', header=None, index_col=None, na

Solution 1:

First for force read column to string is possible use parameter dtype=str in read_csv, but it is used if numeric columns is necessary explicitly converting. So it seems because string values all values in column are converted to str implicitly.

I try a bit change your code:

Setup:

import pandas as pd
import numpy as np

temp=u'''"acb"
"acb"
"bca"
"foo"
"oof"
"spaniel"'''#after testing replace 'pd.compat.StringIO(temp)' to 'example.txt'
wordlist = pd.read_csv(pd.compat.StringIO(temp), sep="\r", index_col=None, names=['word'])
print (wordlist)
      word
0      acb
1      acb
2      bca
3      foo
4      oof
5  spaniel

#first remove duplicates
wordlist = wordlist.drop_duplicates()
#create lists and join them
wordlist['anagrams'] = wordlist['word'].apply(lambda x: ''.join(sorted(list(x))))

print (wordlist)
      word anagrams
0      acb      abc
2      bca      abc
3      foo      foo
4      oof      foo
5  spaniel  aeilnps

#sort DataFrame by column anagrams
wordlist = wordlist.sort_values('anagrams')

#getfirst duplicated rows
wordlist1 = wordlist[wordlist['anagrams'].duplicated()]
print (wordlist1)
  word anagrams
2  bca      abc
4  oof      foo

#getall duplicated rows
wordlist2 = wordlist[wordlist['anagrams'].duplicated(keep=False)]
print (wordlist2)
  word anagrams
0  acb      abc
2  bca      abc
3  foo      foo
4  oof      foo

Post a Comment for "Pandas Keeps Converting Strings To Int"