Skip to content Skip to sidebar Skip to footer

Skip Specific Line That Contains Certain Value When You Read Pandas Data Frame

When you read the csv using command pd.read_csv, How do I skip the line that contains specific value in a row? If in 50th, 55th row, the 1st column has the value, 100, so I want t

Solution 1:

What is the difference between dropping them later, and not reading them at all? You might simply do:

pd.read_csv('file.csv').query('col1 != 100')

Solution 2:

The only way is to pre-parse the file. Use a generator to read the file, and then only yield the lines that you want. You can then use that to read the lines you want into a StringIO object, and pass that object in inplace of the filepath to read_csv.

importStringIO
import pandas as pd

def read_file(file_name):
    withopen(file_name, 'r') asfh:
        for line in fh.readlines():
            parts = line.split(',')
            if parts[0] != '100':
                yield line

stream = StringIO.StringIO()
stream.writelines(read_file('foo.txt'))
stream.seek(0)

df = pd.read_csv(stream)

Post a Comment for "Skip Specific Line That Contains Certain Value When You Read Pandas Data Frame"