Checking For Membership Inside Nested Dict

June 27, 2022 Post a Comment

This is a followup questions to this one: Python DictReader - Skipping rows with missing columns? Turns out I was being silly, and using the wrong ID field. I'm using Python 3.x he

Solution 1:

You probably will need to do some iteration to get the data. I assume you don't want an extra dict that can get out of date, so it won't be worth it trying to store everything keyed on internal ids.

Try this on for size:

def lookup_supervisor(manager_internal_id, employees):
    if manager_internal_id is not None and manager_internal_id != "":
        manager_dir_ids = [dir_id for dir_id in employees if employees[dir_id].get('internal_id') == manager_internal_id]
        assert(len(manager_dir_ids) <= 1)
        if len(manager_dir_ids) == 1:
            return manager_dir_ids[0]
    return None

def tidy_data(employees):
    for emp_data in employees.values():
        manager_dir_id = lookup_supervisor(emp_data.get('manager_internal_id'), employees)
        for (field, sup_key) in [('Email', 'mail'), ('FirstName', 'givenName'), ('Surname', 'sn')]:
            emp_data['Supervisor'+field] = (employees[manager_dir_id][sup_key] if manager_dir_id is not None else 'Supervisor Not Found')

And you're definitely right that a class is the answer for passing employees around. In fact, I'd recommend against storing the 'Supervisor' keys in the employee dict, and suggest instead getting the supervisor dict fresh whenever you need it, perhaps with a get_supervisor_data method.

Your new OO version all looks reasonable except for the changes I already mentioned and some tweaks to clean_phone_number.

def clean_phone_number(self, original_telephone_number):
    phone_re = re.compile(r'^\+(?P<intl_prefix>\d{2})\((?P<extra_zero>0?)(?P<area_code>\d)\)(?P<local_first_half>\d{4})(?P<hyph>-?)(?P<local_second_half>\d{4})')
    result = phone_re.search(original_telephone_number)
    if result is None:
        return '', "Number didn't match format. Original text is: " + original_telephone_number
    msg = ''
    if result.group('extra_zero'):
        msg += 'Extra zero in area code - ask user to remediate. '
    if result.group('hyph'):    # Note: can have both errors at once
        msg += 'Missing hyphen in local component - ask user to remediate. '
    return '0' + result.group('area_code') + result.group('local_first_half') + result.group('local_second_half'), msg

You could definitely make an individual object for each employee, but seeing how you're using the data and what you need from it, I'm guessing it wouldn't have that much payoff.

Solution 2:

My python skills are poor, so I am far too ignorant to write out what I have in mind in any kind of reasonable time. But I do know how to do OO decomposition.

Why does the Employees class to do all the work? There are several types of things that your monolithic Employees class does:

Read and write data from a file - aka serialization
Manage and access data from individual employees
Manage relationships between exmployees.

I suggest that you create a class to handle each task group listed.

Define an Employee class to keep track or employee data and handle field processing/tidying tasks.

Use the Employees class as a container for employee objects. It can handle tasks like tracking down an Employee's supervisor.

Define a virtual base class EmployeeLoader to define an interface (load, store, ?? ). Then implement a subclass for CSV file serialization. (The virtual base class is optional--I'm not sure how Python handles virtual classes, so this may not even make sense.)

So:

create an instance of EmployeeCSVLoader with a file name to work with.
The loader can then build an Employees object and parse the file.
As each record is read, a new Employee object will be created and stored in the Employees object.
Now ask the Employees object to populate supervisor links.
Iterate over the Employees object's collection of employees and ask each one to tidy itself.
Finally, let the serialization object handle updating the data file.

Why is this design worth the effort?

It makes things easier to understand. Smaller, task focused objects are easier to create clean, consistent APIs for.

If you find that you need an XML serialization format, it becomes trivial to add the new format. Subclass your virtual loader class to handle the XML parsing/generation. Now you can seamlessly move between CSV and XML formats.

In summary, use objects to simplify and structure your data. Section off common data and behaviors into separate classes. Keep each class tightly focused on a single type of ability. If your class is a collection, accessor, factory, kitchen sink, the API can never be usable: it will be too big and loaded with dissimilar groups of methods. But if your classes stay on topic, they will be easy to test, maintain, use, reuse, and extend.

Python Dictionary

Checking For Membership Inside Nested Dict

Solution 1:

Solution 2:

Post a Comment for "Checking For Membership Inside Nested Dict"