Numpy Genfromtxt Issues In Python3
Solution 1:
The answer to my problem is using the dtype
for unicode strings (U2
, for example).
Thanks to the answer of E.Kehler, I found the solution.
If I use str
in place of S8
in the dtype
definition, then the output for the 2nd column is empty:
numpy.genfromtxt("test.csv", delimiter=",", dtype='f8,str')
the output is:
array([(1.0, ''), (2.0, ''), (3.0, '')], dtype=[('f0', '<f16'), ('f1', '<U0')])
This suggested me that correct dtype
to solve my problem is an unicode string:
numpy.genfromtxt("test.csv", delimiter=",", dtype='f8,U2')
that gives the expected output:
array([(1.0, 'a'), (2.0, 'b'), (3.0, 'c')], dtype=[('f0', '<f16'), ('f1', '<U2')])
Useful information can be also found at the numpy datatype doc page .
Solution 2:
In python 3, writing
dtype="S8"
(or any variation of "S#") in NumPy's genfromtxt yields a byte string. To avoid this and get just an old fashioned string, write
dtype=str
instead.
Solution 3:
training = np.genfromtxt('twitter_train.csv', delimiter=',', usecols=(0,1), dtype='U')
In my case, the first column contains a sentiment value of either 0 or 1 and the second column is a string of many characters representing a tweet in this ex. dtype='U' removed the b' from being included.
So in your case it would be: data=numpy.genfromtxt("test.csv", delimiter=",", dtype='U')
Post a Comment for "Numpy Genfromtxt Issues In Python3"