Python Write A New Csv File By Filtering Selected Rows From An Existing Csv File
Solution 1:
Since you are using the csv module anyway, why not write the file as you are reading it in:
import csv
withopen('in.csv', 'r') as i, open('out.csv', 'w') as o:
r = csv.reader(i, delimiter='\t')
w = csv.writer(o, delimiter='\t')
for row in r:
if row[0].split('-')[0] == '2014':
w.write(row)
Solution 2:
The error could be "fixed" by changing has_key
to startswith
, but more importantly, the way the program is currently written, you'll skip the first line which starts with 2014, and include the first line of subsequent groups that starts with 2013. Is that really what you want?
If instead you simply want to keep all lines that begin with 2014, then:
withopen('year.csv') as rad_file, open("try.csv","w") as out_file:
header = next(rad_file)
out_file.write(header)
for rad_line in rad_file:
if rad_line.startswith('2014'):
out_file.write(rad_line)
By processing each line as they are read, you avoid accumulating lines in the list string_storage
, thus saving memory. That can be important when processing large files.
Also, if you use a with-statement
to open your files, then the file will be automatically closed for you when the flow of execution leaves the with-statement.
Note that in Python2, dicts
have a has_key
method to check if the dict has a certain key.
The code raised an error because rad_line
is a string not a dict.
The has_key
method was removed in Python3. In modern versions of Python2 such as Python2.7, you never need to use has_key
since key in dict
is preferred over dict.has_key(key)
.
Solution 3:
Use string.find or regular expressions to find a substring in a string.
So instead of
if (rad_line.has_key('2014')):
you can do:
if (rad_line.find('2014') <> -1):
Post a Comment for "Python Write A New Csv File By Filtering Selected Rows From An Existing Csv File"