Skip to content Skip to sidebar Skip to footer

Search Through Directories For Specific Excel Files And Compare Data From These Files With Inputvalues

The task: The firm I have gotten a summer-job for has an expanding test-database that consists of an increasing number of subfolders for each project, that includes everything from

Solution 1:

In answer to the first question "How come I only get one result when I am printing excelfiles()" this is because your return statement is within the nested loop, so the function will stop on the first iteration. I would try building up a list instead and then return this list, you could also combine this with the issue of checking the name e.g. :

import os, fnmatch

#globals
start_dir = os.getenv('md')

def excelfiles(pattern):
    file_list = []
    for root, dirs, files inos.walk(start_dir):
        for filename in files:
            if fnmatch.fnmatch(filename.lower(), pattern):
                if filename.endswith(".xls") or filename.endswith(".xlsx") or filename.endswith(".xlsm"):
                    file_list.append(os.path.join(root, filename))
    return file_list

file_list = excelfiles('*cd*')
for i in file_list: print i

Obviously, you'll need to replace the cd with your own search text, but keep the * either side and replace the start_dir with your own. I have done the match on filename.lower() and entered the search text in lower case to make the matching case in-sensitive, just remove the .lower() if you don't want this. I have also allowed for other types of Excel files.

Regarding reading data from Excel files I have done this before to create an automated way of converting basic Excel files into csv format. You are welcome to have a look at the code below and see if there is anything you can use from this. The xl_to_csv function is where the data is read from the Excel file:

import os, csv, sys, Tkinter, tkFileDialog as fd, xlrd

# stop tinker shell from opening as only needed for file dialog
root = Tkinter.Tk()
root.withdraw()

defformat_date(dt):
    yyyy, mm, dd = str(dt[0]), str(dt[1]), str(dt[2])
    hh, mi, ss = str(dt[3]), str(dt[4]), str(dt[5])

    iflen(mm) == 1:
        mm = '0'+mm
    iflen(dd) == 1:
        dd = '0'+dd

    if hh == '0'and mi == '0'and ss == '0':
        datetime_str = dd+'/'+mm+'/'+yyyy
    else:
        iflen(hh) == 1:
            hh = '0'+hh
        iflen(mi) == 1:
            mi = '0'+mi
        iflen(ss) == 1:
            ss = '0'+ss
        datetime_str = dd+'/'+mm+'/'+yyyy+' '+hh+':'+mi+':'+ss

    return datetime_str

defxl_to_csv(in_path, out_path):
    # set up vars to read file
    wb = xlrd.open_workbook(in_path)
    sh1 = wb.sheet_by_index(0)
    row_cnt, col_cnt = sh1.nrows, sh1.ncols

    # set up vars to write file
    fileout = open(out_path, 'wb')
    writer = csv.writer(fileout)

    # iterate through rows and colsfor r inrange(row_cnt):

        # make list from row data
        row = []
        for c inrange(col_cnt):
            #print "...debug - sh1.cell(",r,c,").value set to:", sh1.cell(r,c).value#print "...debug - sh1.cell(",r,c,").ctype set to:", sh1.cell(r,c).ctype# check data type and make conversions
            val = sh1.cell(r,c).value
            if sh1.cell(r,c).ctype == 2: # number data typeif val == int(val):
                    val = int(val) # convert to int if only no decimal other than .0#print "...debug - res 1 (float to str), val set to:", valelif sh1.cell(r,c).ctype == 3: # date fields
                dt = xlrd.xldate_as_tuple(val, 0) # date no from excel to dat obj
                val = format_date(dt)
                #print "...debug - res 2 (date to str), val set to:", valelif sh1.cell(r,c).ctype == 4: # boolean data types
                val = str(bool(val)) # convert 1 or 0 to bool true / false, then string#print "...debug - res 3 (bool to str), val set to:", valelse:
                val = str(val)
                #print "...debug - else, val set to:", val

            row.append(val)
            #print ""# write row to csv filetry:
            writer.writerow(row)
        except:
            print'...row failed in write to file:', row
            exc_type, exc_value, exc_traceback = sys.exc_info()
            lines = traceback.format_exception(exc_type, exc_value, exc_traceback)
            for line in lines:
                print'!!', line

    print'Data written to:', out_path, '\n'defmain():
    in_path, out_path = None, None# set current working directory to user's my documents folder
    os.chdir(os.path.join(os.getenv('userprofile'),'documents'))

    # ask user for path to Excel file...whilenot in_path:
        print"Please select the excel file to read data from ..."try:
            in_path = fd.askopenfilename()
        except:
            print'Error selecting file, please try again.\n'# get dir for output...
    same = raw_input("Do you want to write the output to the same directory? (Y/N): ")
    if same.upper() == 'Y':
        out_path = os.path.dirname(in_path)
    else:
        whilenot out_path:
            print"Please select a directory to write the csv file to ..."try:
                out_path = fd.askdirectory()
            except:
                print'Error selecting file, please try again.\n'# get file name and join to dir
    f_name = os.path.basename(in_path)
    f_name = f_name[:f_name.find('.')]+'.csv'
    out_path = os.path.join(out_path,f_name)

    # get data from file and write to csv...print'Attempting read data from', in_path
    print' and write csv data to', out_path, '...\n'
    xl_to_csv(in_path, out_path)

    v_open = raw_input("Open file (Y/N):").upper()
    if v_open == 'Y':
        os.startfile(out_path)
    sys.exit()

if __name__ == '__main__':
    main()

Let me know if you have any questions on this.

Finally, regarding the output I would consider writing this out to a html file in a table format. Let me know if you want any help with this, I will have some more sample code that you could use part of.

UPDATE

Here is some further advice on writing your output to a html file. Here is a function that I have written and used previously for this purpose. Let me know if you need any guidance on what you would need to change for your implementation (if anything). The function expects a nested object in the data argument e.g. a list of lists or list of tuples etc. but should work for any number of rows / columns:

defwrite_html_file(path, data, heads):
    html = []
    tab_attr = ' border="1" cellpadding="3" style="background-color:#FAFCFF; text-align:right"'
    head_attr = ' style="background-color:#C0CFE2"'# opening lines needed for html tabletry:
        html.append('<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" ')
        html.append('"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> ')
        html.append('<html xmlns="http://www.w3.org/1999/xhtml">')
        html.append('<body>')
        html.append('  <table'+tab_attr+'>')
    except:
        print'Error setting up html heading data'# html table headings (if required)if headings_on:
        try:
            html.append('    <tr'+head_attr+'>')
            for item in heads:
                html.append(' '*6+'<th>'+str(item)+'</th>')
            html.append('    </tr>')
        except:
            exc_type, exc_value, exc_traceback = sys.exc_info()
            lines = traceback.format_exception(exc_type, exc_value, exc_traceback)
            print'Error writing html table headings:'print''.join('!! ' + line for line in lines)

    # html table contenttry:
        for row in data:
            html.append('    <tr>')
            for item in row:
                html.append(' '*6+'<td>'+str(item)+'</td>')
            html.append('    </tr>')
    except:
        print'Error writing body of html data'# closing lines neededtry:
        html.append('  </table>')
        html.append('</body>')
        html.append('</html>')
    except:
        print'Error closing html data'# write html data to file
    fileout = open(path, 'w')
    for line in html:
        fileout.write(line)

    print'Data written to:', path, '\n'if sql_path:
        os.startfile(path)
    else:
        v_open = raw_input("Open file (Y/N):").upper()
        if v_open == 'Y':
            os.startfile(path)

headings_on is a global that I have set to True in my script, you will also need to import traceback for the error handling to work as it is currently specified.

Post a Comment for "Search Through Directories For Specific Excel Files And Compare Data From These Files With Inputvalues"