Skip Specific Line That Contains Certain Value When You Read Pandas Data Frame
When you read the csv using command pd.read_csv, How do I skip the line that contains specific value in a row? If in 50th, 55th row, the 1st column has the value, 100, so I want t
Solution 1:
What is the difference between dropping them later, and not reading them at all? You might simply do:
pd.read_csv('file.csv').query('col1 != 100')
Solution 2:
The only way is to pre-parse the file. Use a generator to read the file, and then only yield
the lines that you want. You can then use that to read the lines you want into a StringIO
object, and pass that object in inplace of the filepath to read_csv
.
importStringIO
import pandas as pd
def read_file(file_name):
withopen(file_name, 'r') asfh:
for line in fh.readlines():
parts = line.split(',')
if parts[0] != '100':
yield line
stream = StringIO.StringIO()
stream.writelines(read_file('foo.txt'))
stream.seek(0)
df = pd.read_csv(stream)
Post a Comment for "Skip Specific Line That Contains Certain Value When You Read Pandas Data Frame"