Pandas Keeps Converting Strings To Int
I have the following code from this question Df groupby set comparison: import pandas as pd wordlist = pd.read_csv('data/example.txt', sep='\r', header=None, index_col=None, na
Solution 1:
First for force read column to string is possible use parameter dtype=str
in read_csv
, but it is used if numeric columns is necessary explicitly converting. So it seems because string values all values in column are converted to str
implicitly.
I try a bit change your code:
Setup:
import pandas as pd
import numpy as np
temp=u'''"acb"
"acb"
"bca"
"foo"
"oof"
"spaniel"'''#after testing replace 'pd.compat.StringIO(temp)' to 'example.txt'
wordlist = pd.read_csv(pd.compat.StringIO(temp), sep="\r", index_col=None, names=['word'])
print (wordlist)
word
0 acb
1 acb
2 bca
3 foo
4 oof
5 spaniel
#first remove duplicates
wordlist = wordlist.drop_duplicates()
#create lists and join them
wordlist['anagrams'] = wordlist['word'].apply(lambda x: ''.join(sorted(list(x))))
print (wordlist)
word anagrams
0 acb abc
2 bca abc
3 foo foo
4 oof foo
5 spaniel aeilnps
#sort DataFrame by column anagrams
wordlist = wordlist.sort_values('anagrams')
#getfirst duplicated rows
wordlist1 = wordlist[wordlist['anagrams'].duplicated()]
print (wordlist1)
word anagrams
2 bca abc
4 oof foo
#getall duplicated rows
wordlist2 = wordlist[wordlist['anagrams'].duplicated(keep=False)]
print (wordlist2)
word anagrams
0 acb abc
2 bca abc
3 foo foo
4 oof foo
Post a Comment for "Pandas Keeps Converting Strings To Int"