Skip to content Skip to sidebar Skip to footer

How To Use Mask Indexing On Numpy Arrays Of Classes?

When working with numpy array of custom classes like: class TestClass: active = False How to use the inline masking (boolean index arrays) like described here: http://docs.sci

Solution 1:

So your array is dtype=object (print it) and each element points to an instance of your class:

items = np.array([TestClass() for _ in range(10)])

Now try:

items.active

items is an array; active is an attribute of your class, not an attribute of the array of your objects. Your definition does not add any functionality to the class ndarray. The error isn't in the masking; it's in trying to get the instance attribute.

Many operations on arrays like this have be done iteratively. This kind of array is similar to a plain Python list.

[obj.active for obj in items]

or to turn it back into an array

np.array([obj...])

items[[True,False,True,...]] should work, but that's because the mask is a boolean list or array already.

====================

Lets modify your class so it shows something interesting. Note I am assigning active to instances, not, as you did, to the class:

In [1671]: class TestClass:
      ...:     def __init__(self,val):
      ...:        self.active = bool(val%2)

In [1672]: items = np.array([TestClass(i) for i in range(10)])

In [1674]: items
Out[1674]: 
array([<__main__.TestClass object at 0xb106758c>,
       <__main__.TestClass object at 0xb117764c>,
       ...
       <__main__.TestClass object at 0xb269850c>], dtype=object)
# print of the array isn't interesting.  The class needs a `__str__` method.

This simple iterative access to the attribute:

In [1675]: [i.active for i in items]
Out[1675]: [False, True, False, True, False, True, False, True, False, True]

np.frompyfunc provides a more powerful way of accessing each element of an array. operator.attrgetter('active')(i) is a functional way of doing i.active.

In [1676]: f=np.frompyfunc(operator.attrgetter('active'),1,1)
In [1677]: f(items)
Out[1677]: array([False, True, False, True, False, True, False, True, False, True], dtype=object)

but the main advantage of this function appears when I change the shape of the array:

In [1678]: f(items.reshape(2,5))
Out[1678]: 
array([[False, True, False, True, False],
       [True, False, True, False, True]], dtype=object)

Note this array is dtype object. That's what frompyfunc does. To get an array of booleans we need to change type:

In [1679]: f(items.reshape(2,5)).astype(bool)
Out[1679]: 
array([[False,  True, False,  True, False],
       [ True, False,  True, False,  True]], dtype=bool)

np.vectorize uses frompyfunc, and makes the dtype a little more user friendly. But in timings it's a bit slower.

===============

Expanding on Jon's comment

In [1702]: class TestClass:
      ...:     def __init__(self,val):
      ...:        self.active = bool(val%2)
      ...:     def __bool__(self):
      ...:         return self.active
      ...:     def __str__(self):
      ...:         return 'TestClass(%s)'%self.active
      ...:     def __repr__(self):
      ...:         return str(self)

In [1707]: items = np.array([TestClass(i) for i in range(5)])

items now display in an informative manner; and convert to strings:

In [1708]: items
Out[1708]: 
array([TestClass(False), TestClass(True), TestClass(False),
       TestClass(True), TestClass(False)], dtype=object)
In [1709]: items.astype('S20')
Out[1709]: 
array([b'TestClass(False)', b'TestClass(True)', b'TestClass(False)',
       b'TestClass(True)', b'TestClass(False)'], 
      dtype='|S20')

and convert to bool:

In [1710]: items.astype(bool)
Out[1710]: array([False,  True, False,  True, False], dtype=bool)

In effect astype is applying the conversion method to each element of the array. We could also define __int__, __add__, This shows that it is easier to add functionality to the custom class than to the array class itself. I wouldn't expect to get the same speed as with native types.


Post a Comment for "How To Use Mask Indexing On Numpy Arrays Of Classes?"