MXNet: Nn.Activation Vs Nd.relu?
Solution 1:
It appears that
mx.gluon.nn.Activation(activation=<act>)
is a wrapper for calling a host of the underlying activations from the NDArray
module.
Thus - in principle - it does not matter if in the forward definition one uses
x = self.ramp(x)
or
x = mx.nd.relu(x)
or
x = mx.nd.relu(self.ramp(x))
as relu is simply taking the max of 0 and the passed value (so multiple applications will not affect the value any more than a single call besides from a slight runtime duration increase).
Thus in this case it doesnt really matter. Of course with other activation functions stacking multiple calls might have an impact.
In MXNets documentation they use nd.relu
in the forward definition when defining gluon.Block
s. This might carry slightly less overhead than using mx.gluon.nn.Activation(activation='relu')
.
Flavor-wise the gluon
module is meant to be the high level abstraction. Therefore I am of the opinion that when defining a block one should use
ramp = mx.gluon.nn.Activation(activation=<act>)
instead of nd.<act>(x)
and then call self.ramp(x)
in the forward definition.
However given that at this point all custom Block
tutorials / documentation stick to relu
activation, whether or not this will have lasting consequences is yet to be seen.
All together the use of mx.gluon.nn.Activation
seems to be a way to call activation functions from the NDArray
module from the Gluon
module.
Solution 2:
mx.gluon.nn.Activation wraps around mx.ndarray.Activation, see Gluon source code.
However, when using Gluon to build a neural net, it is recommended that you use the Gluon API and not branch off to use the lower level MXNet API arbitrarily - which may have issues as Gluon evolves and potentially change (e.g. stop using mx.nd under the hood).
Post a Comment for "MXNet: Nn.Activation Vs Nd.relu?"