Skip to content Skip to sidebar Skip to footer

Python Function Slows Down With Presence Of Large List

I was testing the speeds of a few different ways to do complex iterations over some of my data, and I found something weird. It seems that having a large list local to some functio

Solution 1:

When you create that many new objects (3 million tuples), the garbage collector gets bogged down. If you turn off garbage collection with gc.disable(), the issue goes away (and the program runs 4x faster to boot).

Solution 2:

It's impossible to say without more detailed instrumentation.

As a very, very preliminary step, check your main memory usage. If your RAM is all filled up and your OS is paging to disk, your performance will be quite dreadful. In such a case, you may be best off taking your intermediate products and putting them somewhere other than in memory. If you only need sequential reads of your data, consider writing to a plain file; if your data follows a strict structure, consider persisting into a relational database.

Solution 3:

My guess is that when the first list is made, there is more memory available, meaning less chance that the list needs to be reallocated as it grows.

After you take up a decent chunk of memory with the first list, your second list has a higher chance of needing to be reallocated as it grows since python lists are dynamically sized.

Solution 4:

The memory used by the data local to the function isn't going to be garbage-collected until the function returns. Unless you have a need to do slicing, using lists for large collections of data is not a great idea.

From your example it's not entirely clear what the purpose of creating these lists are. You might want to consider using generators instead of lists, especially if the lists are just going to be iterated. If you need to do slicing on the return data, cast the generators to lists at that time.

Post a Comment for "Python Function Slows Down With Presence Of Large List"