Efficient Python Array With 100 Million Zeros?

January 30, 2023 Post a Comment

What is an efficient way to initialize and access elements of a large array in Python? I want to create an array in Python with 100 million entries, unsigned 4-byte integers, initi

Solution 1:

I have done some profiling, and the results are completely counterintuitive. For simple array access operations, numpy and array.array are 10x slower than native Python arrays.

Note that for array access, I am doing operations of the form:

a[i] += 1

Profiles:

[0] * 20000000
- Access: 2.3M / sec
- Initialization: 0.8s
numpy.zeros(shape=(20000000,), dtype=numpy.int32)
- Access: 160K/sec
- Initialization: 0.2s
array.array('L', [0] * 20000000)
- Access: 175K/sec
- Initialization: 2.0s
array.array('L', (0 for i in range(20000000)))
- Access: 175K/sec, presumably, based upon the profile for the other array.array
- Initialization: 6.7s

Solution 2:

Just a reminder how Python's integers work: if you allocate a list by saying

a = [0] * K

you need the memory for the list (sizeof(PyListObject) + K * sizeof(PyObject*)) and the memory for the single integer object 0. As long as the numbers in the list stay below the magic number V that Python uses for caching, you are fine because those are shared, i.e. any name that points to a number n < V points to the exact same object. You can find this value by using the following snippet:

>>> i = 0
>>> j = 0
>>> while i is j:
...    i += 1
...    j += 1
>>> i # on my system!
257

This means that as soon as the counts go above this number, the memory you need is sizeof(PyListObject) + K * sizeof(PyObject*) + d * sizeof(PyIntObject), where d < K is the number of integers above V (== 256). On a 64 bit system, sizeof(PyIntObject) == 24 and sizeof(PyObject*) == 8, i.e. the worst case memory consumption is 3,200,000,000 bytes.

With numpy.ndarray or array.array, memory consumption is constant after initialization, but you pay for the wrapper objects that are created transparently, as Thomas Wouters said. Probably, you should think about converting the update code (which accesses and increases the positions in the array) to C code, either by using Cython or scipy.weave.

Solution 3:

Try this:

x = [0] * 100000000

It takes just a few seconds to execute on my machine, and access is close to instant.

Solution 4:

If you are are not able to vectorize your calculuations, Python/Numpy will be slow. Numpy is fast because vectorized calculations occur at a lower level than Python. The core numpy functions are all written in C or Fortran. Hence sum(a) is not a python loop with many accesses, it's a single low level C call.

Numpy's Performance Python demo page has a good example with different options. You can easily get 100x increase by using a lower level compiled language, Cython, or using vectorized functions if feasible. This blog post that shows a 43 fold increase using Cython for a numpy usecase.

Solution 5:

It's unlikely you'll find anything faster than numpy's array. The implementation of the array itself is as efficient as it would be in, say, C (and basically the same as array.array, just with more usefulness.)

If you want to speed up your code, you'll have to do it by doing just that. Even though the array is implemented efficiently, accessing it from Python code has certain overhead; for example, indexing the array produces integer objects, which have to be created on the fly. numpy offers a number of operations implemented efficiently in C, but without seeing the actual code that isn't performing as well as you want it's hard to make any specific suggestions.

Python Dictionary