Sparse Symbol API¶
Overview¶
This document lists the routines of the sparse symbolic expression package:
mxnet.symbol.sparse |
Sparse Symbol API of MXNet. |
The Sparse Symbol API, defined in the symbol.sparse package, provides
sparse neural network graphs and auto-differentiation on CPU.
The storage type of a variable is speficied by the stype attribute of the variable.
The storage type of a symbolic expression is inferred based on the storage types of the variables and the operators.
>>> a = mx.sym.Variable('a', stype='csr')
>>> b = mx.sym.Variable('b')
>>> c = mx.sym.dot(a, b, transpose_a=True)
>>> type(c)
>>> e = c.bind(mx.cpu(), {'a': mx.nd.array([[1,0,0]]).tostype('csr'), 'b':mx.nd.ones((1,2))})
>>> y = e.forward()
# the result storage type of dot(csr.T, dense) is inferred to be `row_sparse`
>>> y
[]
>>> y[0].asnumpy()
array([ 1., 1.],
[ 0., 0.],
[ 0., 0.]], dtype=float32)
Note
most operators provided in mxnet.symbol.sparse are similar to those in
mxnet.symbol although there are few differences:
- Only a subset of operators in
mxnet.symbolhave specialized implementations inmxnet.symbol.sparse. Operators such as reduction and broadcasting do not have sparse implementations yet. - The storage types (
stype) of sparse operators’ outputs depend on the storage types of inputs. By default the operators not available inmxnet.symbol.sparseinfer “default” (dense) storage type for outputs. Please refer to the API reference section for further details on specific operators. - GPU support for
mxnet.symbol.sparseis experimental.
In the rest of this document, we list sparse related routines provided by the
symbol.sparse package.
Symbol creation routines¶
zeros_like |
Return an array of zeros with the same shape and type as the input array. |
mxnet.symbol.var |
Creates a symbolic variable with specified name. |
Symbol manipulation routines¶
Changing symbol storage type¶
cast_storage |
Casts tensor storage type to the new type. |
Mathematical functions¶
Arithmetic operations¶
elemwise_add |
Adds arguments element-wise. |
elemwise_sub |
Subtracts arguments element-wise. |
elemwise_mul |
Multiplies arguments element-wise. |
negative |
Numerical negative of the argument, element-wise. |
dot |
Dot product of two arrays. |
add_n |
Adds all input arguments element-wise. |
Trigonometric functions¶
sin |
Computes the element-wise sine of the input array. |
tan |
Computes the element-wise tangent of the input array. |
arcsin |
Returns element-wise inverse sine of the input array. |
arctan |
Returns element-wise inverse tangent of the input array. |
degrees |
Converts each element of the input array from radians to degrees. |
radians |
Converts each element of the input array from degrees to radians. |
Hyperbolic functions¶
sinh |
Returns the hyperbolic sine of the input array, computed element-wise. |
tanh |
Returns the hyperbolic tangent of the input array, computed element-wise. |
arcsinh |
Returns the element-wise inverse hyperbolic sine of the input array, computed element-wise. |
arctanh |
Returns the element-wise inverse hyperbolic tangent of the input array, computed element-wise. |
Reduce functions¶
sum |
Computes the sum of array elements over given axes. |
mean |
Computes the mean of array elements over given axes. |
Rounding¶
round |
Returns element-wise rounded value to the nearest integer of the input. |
rint |
Returns element-wise rounded value to the nearest integer of the input. |
fix |
Returns element-wise rounded value to the nearest integer towards zero of the input. |
floor |
Returns element-wise floor of the input. |
ceil |
Returns element-wise ceiling of the input. |
trunc |
Return the element-wise truncated value of the input. |
Exponents and logarithms¶
expm1 |
Returns exp(x) - 1 computed element-wise on the input. |
log1p |
Returns element-wise log(1 + x) value of the input. |
Neural network¶
More¶
make_loss |
Make your own loss function in network construction. |
stop_gradient |
Stops gradient computation. |
mxnet.symbol.contrib.SparseEmbedding |
Maps integer indices to vector representations (embeddings). |
API Reference¶
Sparse Symbol API of MXNet.
-
mxnet.symbol.sparse.ElementWiseSum(*args, **kwargs)¶ Adds all input arguments element-wise.
\[add\_n(a_1, a_2, ..., a_n) = a_1 + a_2 + ... + a_n\]add_nis potentially more efficient than callingaddby n times.The storage type of
add_noutput depends on storage types of inputs- add_n(row_sparse, row_sparse, ..) = row_sparse
- otherwise,
add_ngenerates output with default storage
Defined in src/operator/tensor/elemwise_sum.cc:L123 This function support variable length of positional input.
Parameters: - args (Symbol[]) – Positional input arguments
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.abs(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise absolute value of the input.
Example:
abs([-2, 0, 3]) = [2, 0, 3]
The storage type of
absoutput depends upon the input storage type:- abs(default) = default
- abs(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_basic.cc:L385
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.adam_update(weight=None, grad=None, mean=None, var=None, lr=_Null, beta1=_Null, beta2=_Null, epsilon=_Null, wd=_Null, rescale_grad=_Null, clip_gradient=_Null, name=None, attr=None, out=None, **kwargs)¶ Update function for Adam optimizer. Adam is seen as a generalization of AdaGrad.
Adam update consists of the following steps, where g represents gradient and m, v are 1st and 2nd order moment estimates (mean and variance).
\[\begin{split}g_t = \nabla J(W_{t-1})\\ m_t = \beta_1 m_{t-1} + (1 - \beta_1) g_t\\ v_t = \beta_2 v_{t-1} + (1 - \beta_2) g_t^2\\ W_t = W_{t-1} - \alpha \frac{ m_t }{ \sqrt{ v_t } + \epsilon }\end{split}\]It updates the weights using:
m = beta1*m + (1-beta1)*grad v = beta2*v + (1-beta2)*(grad**2) w += - learning_rate * m / (sqrt(v) + epsilon)
If w, m and v are all of
row_sparsestorage type, only the row slices whose indices appear in grad.indices are updated (for w, m and v):for row in grad.indices: m[row] = beta1*m[row] + (1-beta1)*grad[row] v[row] = beta2*v[row] + (1-beta2)*(grad[row]**2) w[row] += - learning_rate * m[row] / (sqrt(v[row]) + epsilon)
Defined in src/operator/optimizer_op.cc:L383
Parameters: - weight (Symbol) – Weight
- grad (Symbol) – Gradient
- mean (Symbol) – Moving mean
- var (Symbol) – Moving variance
- lr (float, required) – Learning rate
- beta1 (float, optional, default=0.9) – The decay rate for the 1st moment estimates.
- beta2 (float, optional, default=0.999) – The decay rate for the 2nd moment estimates.
- epsilon (float, optional, default=1e-08) – A small constant for numerical stability.
- wd (float, optional, default=0) – Weight decay augments the objective function with a regularization term that penalizes large weights. The penalty scales with the square of the magnitude of each weight.
- rescale_grad (float, optional, default=1) – Rescale gradient to grad = rescale_grad*grad.
- clip_gradient (float, optional, default=-1) – Clip gradient to the range of [-clip_gradient, clip_gradient] If clip_gradient <= 0, gradient clipping is turned off. grad = max(min(grad, clip_gradient), -clip_gradient).
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.add_n(*args, **kwargs)¶ Adds all input arguments element-wise.
\[add\_n(a_1, a_2, ..., a_n) = a_1 + a_2 + ... + a_n\]add_nis potentially more efficient than callingaddby n times.The storage type of
add_noutput depends on storage types of inputs- add_n(row_sparse, row_sparse, ..) = row_sparse
- otherwise,
add_ngenerates output with default storage
Defined in src/operator/tensor/elemwise_sum.cc:L123 This function support variable length of positional input.
Parameters: - args (Symbol[]) – Positional input arguments
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.arccos(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise inverse cosine of the input array.
The input should be in range [-1, 1]. The output is in the closed interval \([0, \pi]\)
\[arccos([-1, -.707, 0, .707, 1]) = [\pi, 3\pi/4, \pi/2, \pi/4, 0]\]The storage type of
arccosoutput is always denseDefined in src/operator/tensor/elemwise_unary_op_trig.cc:L123
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.arccosh(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns the element-wise inverse hyperbolic cosine of the input array, computed element-wise.
The storage type of
arccoshoutput is always denseDefined in src/operator/tensor/elemwise_unary_op_trig.cc:L264
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.arcsin(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise inverse sine of the input array.
The input should be in the range [-1, 1]. The output is in the closed interval of [\(-\pi/2\), \(\pi/2\)].
\[arcsin([-1, -.707, 0, .707, 1]) = [-\pi/2, -\pi/4, 0, \pi/4, \pi/2]\]The storage type of
arcsinoutput depends upon the input storage type:- arcsin(default) = default
- arcsin(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_trig.cc:L104
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.arcsinh(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns the element-wise inverse hyperbolic sine of the input array, computed element-wise.
The storage type of
arcsinhoutput depends upon the input storage type:- arcsinh(default) = default
- arcsinh(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_trig.cc:L250
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.arctan(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise inverse tangent of the input array.
The output is in the closed interval \([-\pi/2, \pi/2]\)
\[arctan([-1, 0, 1]) = [-\pi/4, 0, \pi/4]\]The storage type of
arctanoutput depends upon the input storage type:- arctan(default) = default
- arctan(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_trig.cc:L144
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.arctanh(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns the element-wise inverse hyperbolic tangent of the input array, computed element-wise.
The storage type of
arctanhoutput depends upon the input storage type:- arctanh(default) = default
- arctanh(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_trig.cc:L281
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.cast_storage(data=None, stype=_Null, name=None, attr=None, out=None, **kwargs)¶ Casts tensor storage type to the new type.
When an NDArray with default storage type is cast to csr or row_sparse storage, the result is compact, which means:
- for csr, zero values will not be retained
- for row_sparse, row slices of all zeros will not be retained
The storage type of
cast_storageoutput depends on stype parameter:- cast_storage(csr, ‘default’) = default
- cast_storage(row_sparse, ‘default’) = default
- cast_storage(default, ‘csr’) = csr
- cast_storage(default, ‘row_sparse’) = row_sparse
Example:
dense = [[ 0., 1., 0.], [ 2., 0., 3.], [ 0., 0., 0.], [ 0., 0., 0.]] # cast to row_sparse storage type rsp = cast_storage(dense, 'row_sparse') rsp.indices = [0, 1] rsp.values = [[ 0., 1., 0.], [ 2., 0., 3.]] # cast to csr storage type csr = cast_storage(dense, 'csr') csr.indices = [1, 0, 2] csr.values = [ 1., 2., 3.] csr.indptr = [0, 1, 3, 3, 3]
Defined in src/operator/tensor/cast_storage.cc:L69
Parameters: - data (Symbol) – The input.
- stype ({'csr', 'default', 'row_sparse'}, required) – Output storage type.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.ceil(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise ceiling of the input.
The ceil of the scalar x is the smallest integer i, such that i >= x.
Example:
ceil([-2.1, -1.9, 1.5, 1.9, 2.1]) = [-2., -1., 2., 2., 3.]
The storage type of
ceiloutput depends upon the input storage type:- ceil(default) = default
- ceil(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_basic.cc:L463
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.clip(data=None, a_min=_Null, a_max=_Null, name=None, attr=None, out=None, **kwargs)¶ Clips (limits) the values in an array.
Given an interval, values outside the interval are clipped to the interval edges. Clipping
xbetween a_min and a_x would be:clip(x, a_min, a_max) = max(min(x, a_max), a_min))
Example:
x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] clip(x,1,8) = [ 1., 1., 2., 3., 4., 5., 6., 7., 8., 8.]
The storage type of
clipoutput depends on storage types of inputs and the a_min, a_max parameter values:- clip(default) = default
- clip(row_sparse, a_min <= 0, a_max >= 0) = row_sparse
- clip(csr, a_min <= 0, a_max >= 0) = csr
- clip(row_sparse, a_min < 0, a_max < 0) = default
- clip(row_sparse, a_min > 0, a_max > 0) = default
- clip(csr, a_min < 0, a_max < 0) = csr
- clip(csr, a_min > 0, a_max > 0) = csr
Defined in src/operator/tensor/matrix_op.cc:L486
Parameters: - data (Symbol) – Input array.
- a_min (float, required) – Minimum value
- a_max (float, required) – Maximum value
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.cos(data=None, name=None, attr=None, out=None, **kwargs)¶ Computes the element-wise cosine of the input array.
The input should be in radians (\(2\pi\) rad equals 360 degrees).
\[cos([0, \pi/4, \pi/2]) = [1, 0.707, 0]\]The storage type of
cosoutput is always denseDefined in src/operator/tensor/elemwise_unary_op_trig.cc:L63
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.cosh(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns the hyperbolic cosine of the input array, computed element-wise.
\[cosh(x) = 0.5\times(exp(x) + exp(-x))\]The storage type of
coshoutput is always denseDefined in src/operator/tensor/elemwise_unary_op_trig.cc:L216
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.degrees(data=None, name=None, attr=None, out=None, **kwargs)¶ Converts each element of the input array from radians to degrees.
\[degrees([0, \pi/2, \pi, 3\pi/2, 2\pi]) = [0, 90, 180, 270, 360]\]The storage type of
degreesoutput depends upon the input storage type:- degrees(default) = default
- degrees(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_trig.cc:L163
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.dot(lhs=None, rhs=None, transpose_a=_Null, transpose_b=_Null, name=None, attr=None, out=None, **kwargs)¶ Dot product of two arrays.
dot‘s behavior depends on the input array dimensions:1-D arrays: inner product of vectors
2-D arrays: matrix multiplication
N-D arrays: a sum product over the last axis of the first input and the first axis of the second input
For example, given 3-D
xwith shape (n,m,k) andywith shape (k,r,s), the result array will have shape (n,m,r,s). It is computed by:dot(x,y)[i,j,a,b] = sum(x[i,j,:]*y[:,a,b])
Example:
x = reshape([0,1,2,3,4,5,6,7], shape=(2,2,2)) y = reshape([7,6,5,4,3,2,1,0], shape=(2,2,2)) dot(x,y)[0,0,1,1] = 0 sum(x[0,0,:]*y[:,1,1]) = 0
The storage type of
dotoutput depends on storage types of inputs and transpose options:- dot(csr, default) = default
- dot(csr.T, default) = row_sparse
- dot(csr, row_sparse) = default
- dot(default, csr) = csr
- otherwise,
dotgenerates output with default storage
Defined in src/operator/tensor/dot.cc:L62
Parameters: - lhs (Symbol) – The first input
- rhs (Symbol) – The second input
- transpose_a (boolean, optional, default=0) – If true then transpose the first input before dot.
- transpose_b (boolean, optional, default=0) – If true then transpose the second input before dot.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.elemwise_add(lhs=None, rhs=None, name=None, attr=None, out=None, **kwargs)¶ Adds arguments element-wise.
The storage type of
elemwise_addoutput depends on storage types of inputs- elemwise_add(row_sparse, row_sparse) = row_sparse
- elemwise_add(csr, csr) = csr
- otherwise,
elemwise_addgenerates output with default storage
Parameters: Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.elemwise_div(lhs=None, rhs=None, name=None, attr=None, out=None, **kwargs)¶ Divides arguments element-wise.
The storage type of
elemwise_divoutput is always denseParameters: Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.elemwise_mul(lhs=None, rhs=None, name=None, attr=None, out=None, **kwargs)¶ Multiplies arguments element-wise.
The storage type of
elemwise_muloutput depends on storage types of inputs- elemwise_mul(default, default) = default
- elemwise_mul(row_sparse, row_sparse) = row_sparse
- elemwise_mul(default, row_sparse) = default
- elemwise_mul(row_sparse, default) = default
- elemwise_mul(csr, csr) = csr
- otherwise,
elemwise_mulgenerates output with default storage
Parameters: Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.elemwise_sub(lhs=None, rhs=None, name=None, attr=None, out=None, **kwargs)¶ Subtracts arguments element-wise.
The storage type of
elemwise_suboutput depends on storage types of inputs- elemwise_sub(row_sparse, row_sparse) = row_sparse
- elemwise_sub(csr, csr) = csr
- otherwise,
elemwise_subgenerates output with default storage
Parameters: Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.exp(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise exponential value of the input.
\[exp(x) = e^x \approx 2.718^x\]Example:
exp([0, 1, 2]) = [1., 2.71828175, 7.38905621]
The storage type of
expoutput is always denseDefined in src/operator/tensor/elemwise_unary_op_basic.cc:L641
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.expm1(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns
exp(x) - 1computed element-wise on the input.This function provides greater precision than
exp(x) - 1for small values ofx.The storage type of
expm1output depends upon the input storage type:- expm1(default) = default
- expm1(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_basic.cc:L720
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.fix(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise rounded value to the nearest integer towards zero of the input.
Example:
fix([-2.1, -1.9, 1.9, 2.1]) = [-2., -1., 1., 2.]
The storage type of
fixoutput depends upon the input storage type:- fix(default) = default
- fix(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_basic.cc:L520
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.floor(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise floor of the input.
The floor of the scalar x is the largest integer i, such that i <= x.
Example:
floor([-2.1, -1.9, 1.5, 1.9, 2.1]) = [-3., -2., 1., 1., 2.]
The storage type of
flooroutput depends upon the input storage type:- floor(default) = default
- floor(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_basic.cc:L482
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.ftrl_update(weight=None, grad=None, z=None, n=None, lr=_Null, lamda1=_Null, beta=_Null, wd=_Null, rescale_grad=_Null, clip_gradient=_Null, name=None, attr=None, out=None, **kwargs)¶ Update function for Ftrl optimizer. Referenced from Ad Click Prediction: a View from the Trenches, available at http://dl.acm.org/citation.cfm?id=2488200.
It updates the weights using:
rescaled_grad = clip(grad * rescale_grad, clip_gradient) z += rescaled_grad - (sqrt(n + rescaled_grad**2) - sqrt(n)) * weight / learning_rate n += rescaled_grad**2 w = (sign(z) * lamda1 - z) / ((beta + sqrt(n)) / learning_rate + wd) * (abs(z) > lamda1)
If w, z and n are all of
row_sparsestorage type, only the row slices whose indices appear in grad.indices are updated (for w, z and n):for row in grad.indices: rescaled_grad[row] = clip(grad[row] * rescale_grad, clip_gradient) z[row] += rescaled_grad[row] - (sqrt(n[row] + rescaled_grad[row]**2) - sqrt(n[row])) * weight[row] / learning_rate n[row] += rescaled_grad[row]**2 w[row] = (sign(z[row]) * lamda1 - z[row]) / ((beta + sqrt(n[row])) / learning_rate + wd) * (abs(z[row]) > lamda1)
Defined in src/operator/optimizer_op.cc:L520
Parameters: - weight (Symbol) – Weight
- grad (Symbol) – Gradient
- z (Symbol) – z
- n (Symbol) – Square of grad
- lr (float, required) – Learning rate
- lamda1 (float, optional, default=0.01) – The L1 regularization coefficient.
- beta (float, optional, default=1) – Per-Coordinate Learning Rate beta.
- wd (float, optional, default=0) – Weight decay augments the objective function with a regularization term that penalizes large weights. The penalty scales with the square of the magnitude of each weight.
- rescale_grad (float, optional, default=1) – Rescale gradient to grad = rescale_grad*grad.
- clip_gradient (float, optional, default=-1) – Clip gradient to the range of [-clip_gradient, clip_gradient] If clip_gradient <= 0, gradient clipping is turned off. grad = max(min(grad, clip_gradient), -clip_gradient).
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.gamma(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns the gamma function (extension of the factorial function to the reals), computed element-wise on the input array.
The storage type of
gammaoutput is always denseParameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.gammaln(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise log of the absolute value of the gamma function of the input.
The storage type of
gammalnoutput is always denseParameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.log(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise Natural logarithmic value of the input.
The natural logarithm is logarithm in base e, so that
log(exp(x)) = xThe storage type of
logoutput is always denseDefined in src/operator/tensor/elemwise_unary_op_basic.cc:L653
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.log10(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise Base-10 logarithmic value of the input.
10**log10(x) = xThe storage type of
log10output is always denseDefined in src/operator/tensor/elemwise_unary_op_basic.cc:L665
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.log1p(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise
log(1 + x)value of the input.This function is more accurate than
log(1 + x)for smallxso that \(1+x\approx 1\)The storage type of
log1poutput depends upon the input storage type:- log1p(default) = default
- log1p(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_basic.cc:L702
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.log2(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise Base-2 logarithmic value of the input.
2**log2(x) = xThe storage type of
log2output is always denseDefined in src/operator/tensor/elemwise_unary_op_basic.cc:L677
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.make_loss(data=None, name=None, attr=None, out=None, **kwargs)¶ Make your own loss function in network construction.
This operator accepts a customized loss function symbol as a terminal loss and the symbol should be an operator with no backward dependency. The output of this function is the gradient of loss with respect to the input data.
For example, if you are a making a cross entropy loss function. Assume
outis the predicted output andlabelis the true label, then the cross entropy can be defined as:cross_entropy = label * log(out) + (1 - label) * log(1 - out) loss = make_loss(cross_entropy)
We will need to use
make_losswhen we are creating our own loss function or we want to combine multiple loss functions. Also we may want to stop some variables’ gradients from backpropagation. See more detail inBlockGradorstop_gradient.The storage type of
make_lossoutput depends upon the input storage type:- make_loss(default) = default
- make_loss(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_basic.cc:L199
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.mean(data=None, axis=_Null, keepdims=_Null, exclude=_Null, name=None, attr=None, out=None, **kwargs)¶ Computes the mean of array elements over given axes.
Defined in src/operator/tensor/broadcast_reduce_op_value.cc:L101
Parameters: - data (Symbol) – The input
- axis (Shape(tuple), optional, default=[]) –
The axis or axes along which to perform the reduction.
The default, axis=(), will compute over all elements into a scalar array with shape (1,).If axis is int, a reduction is performed on a particular axis.
If axis is a tuple of ints, a reduction is performed on all the axes specified in the tuple.
If exclude is true, reduction will be performed on the axes that are NOT in axis instead.
Negative values means indexing from right to left.
- keepdims (boolean, optional, default=0) – If this is set to True, the reduced axes are left in the result as dimension with size one.
- exclude (boolean, optional, default=0) – Whether to perform reduction on axis that are NOT in axis instead.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.negative(data=None, name=None, attr=None, out=None, **kwargs)¶ Numerical negative of the argument, element-wise.
The storage type of
negativeoutput depends upon the input storage type:- negative(default) = default
- negative(row_sparse) = row_sparse
- negative(csr) = csr
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.norm(data=None, name=None, attr=None, out=None, **kwargs)¶ Flattens the input array and then computes the l2 norm.
Examples:
x = [[1, 2], [3, 4]] norm(x) = [5.47722578] rsp = x.cast_storage('row_sparse') norm(rsp) = [5.47722578] csr = x.cast_storage('csr') norm(csr) = [5.47722578]
Defined in src/operator/tensor/broadcast_reduce_op_value.cc:L266
Parameters: - data (Symbol) – Source input
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.radians(data=None, name=None, attr=None, out=None, **kwargs)¶ Converts each element of the input array from degrees to radians.
\[radians([0, 90, 180, 270, 360]) = [0, \pi/2, \pi, 3\pi/2, 2\pi]\]The storage type of
radiansoutput depends upon the input storage type:- radians(default) = default
- radians(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_trig.cc:L182
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.relu(data=None, name=None, attr=None, out=None, **kwargs)¶ Computes rectified linear.
\[max(features, 0)\]The storage type of
reluoutput depends upon the input storage type:- relu(default) = default
- relu(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_basic.cc:L83
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.retain(data=None, indices=None, name=None, attr=None, out=None, **kwargs)¶ pick rows specified by user input index array from a row sparse matrix and save them in the output sparse matrix.
Example:
data = [[1, 2], [3, 4], [5, 6]] indices = [0, 1, 3] shape = (4, 2) rsp_in = row_sparse(data, indices) to_retain = [0, 3] rsp_out = retain(rsp_in, to_retain) rsp_out.values = [[1, 2], [5, 6]] rsp_out.indices = [0, 3]
The storage type of
retainoutput depends on storage types of inputs- retain(row_sparse, default) = row_sparse
- otherwise,
retainis not supported
Defined in src/operator/tensor/sparse_retain.cc:L53
Parameters: Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.rint(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise rounded value to the nearest integer of the input.
Note
- For input
n.5rintreturnsnwhileroundreturnsn+1. - For input
-n.5bothrintandroundreturns-n-1.
Example:
rint([-1.5, 1.5, -1.9, 1.9, 2.1]) = [-2., 1., -2., 2., 2.]
The storage type of
rintoutput depends upon the input storage type:- rint(default) = default
- rint(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_basic.cc:L444
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type: - For input
-
mxnet.symbol.sparse.round(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise rounded value to the nearest integer of the input.
Example:
round([-1.5, 1.5, -1.9, 1.9, 2.1]) = [-2., 2., -2., 2., 2.]
The storage type of
roundoutput depends upon the input storage type:- round(default) = default
- round(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_basic.cc:L423
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.rsqrt(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise inverse square-root value of the input.
\[rsqrt(x) = 1/\sqrt{x}\]Example:
rsqrt([4,9,16]) = [0.5, 0.33333334, 0.25]
The storage type of
rsqrtoutput is always denseDefined in src/operator/tensor/elemwise_unary_op_basic.cc:L584
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.sgd_mom_update(weight=None, grad=None, mom=None, lr=_Null, momentum=_Null, wd=_Null, rescale_grad=_Null, clip_gradient=_Null, name=None, attr=None, out=None, **kwargs)¶ Momentum update function for Stochastic Gradient Descent (SDG) optimizer.
Momentum update has better convergence rates on neural networks. Mathematically it looks like below:
\[\begin{split}v_1 = \alpha * \nabla J(W_0)\\ v_t = \gamma v_{t-1} - \alpha * \nabla J(W_{t-1})\\ W_t = W_{t-1} + v_t\end{split}\]It updates the weights using:
v = momentum * v - learning_rate * gradient weight += v
Where the parameter
momentumis the decay rate of momentum estimates at each epoch.If weight and grad are both of
row_sparsestorage type and momentum is ofdefaultstorage type, standard update is applied.If weight, grad and momentum are all of
row_sparsestorage type, only the row slices whose indices appear in grad.indices are updated (for both weight and momentum):for row in gradient.indices: v[row] = momentum[row] * v[row] - learning_rate * gradient[row] weight[row] += v[row]
Defined in src/operator/optimizer_op.cc:L265
Parameters: - weight (Symbol) – Weight
- grad (Symbol) – Gradient
- mom (Symbol) – Momentum
- lr (float, required) – Learning rate
- momentum (float, optional, default=0) – The decay rate of momentum estimates at each epoch.
- wd (float, optional, default=0) – Weight decay augments the objective function with a regularization term that penalizes large weights. The penalty scales with the square of the magnitude of each weight.
- rescale_grad (float, optional, default=1) – Rescale gradient to grad = rescale_grad*grad.
- clip_gradient (float, optional, default=-1) – Clip gradient to the range of [-clip_gradient, clip_gradient] If clip_gradient <= 0, gradient clipping is turned off. grad = max(min(grad, clip_gradient), -clip_gradient).
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.sgd_update(weight=None, grad=None, lr=_Null, wd=_Null, rescale_grad=_Null, clip_gradient=_Null, name=None, attr=None, out=None, **kwargs)¶ Update function for Stochastic Gradient Descent (SDG) optimizer.
It updates the weights using:
weight = weight - learning_rate * gradient
If weight is of
row_sparsestorage type, only the row slices whose indices appear in grad.indices are updated:for row in gradient.indices: weight[row] = weight[row] - learning_rate * gradient[row]
Defined in src/operator/optimizer_op.cc:L222
Parameters: - weight (Symbol) – Weight
- grad (Symbol) – Gradient
- lr (float, required) – Learning rate
- wd (float, optional, default=0) – Weight decay augments the objective function with a regularization term that penalizes large weights. The penalty scales with the square of the magnitude of each weight.
- rescale_grad (float, optional, default=1) – Rescale gradient to grad = rescale_grad*grad.
- clip_gradient (float, optional, default=-1) – Clip gradient to the range of [-clip_gradient, clip_gradient] If clip_gradient <= 0, gradient clipping is turned off. grad = max(min(grad, clip_gradient), -clip_gradient).
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.sigmoid(data=None, name=None, attr=None, out=None, **kwargs)¶ Computes sigmoid of x element-wise.
\[y = 1 / (1 + exp(-x))\]The storage type of
sigmoidoutput is always denseDefined in src/operator/tensor/elemwise_unary_op_basic.cc:L102
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.sign(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise sign of the input.
Example:
sign([-2, 0, 3]) = [-1, 0, 1]
The storage type of
signoutput depends upon the input storage type:- sign(default) = default
- sign(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_basic.cc:L404
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.sin(data=None, name=None, attr=None, out=None, **kwargs)¶ Computes the element-wise sine of the input array.
The input should be in radians (\(2\pi\) rad equals 360 degrees).
\[sin([0, \pi/4, \pi/2]) = [0, 0.707, 1]\]The storage type of
sinoutput depends upon the input storage type:- sin(default) = default
- sin(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_trig.cc:L46
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.sinh(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns the hyperbolic sine of the input array, computed element-wise.
\[sinh(x) = 0.5\times(exp(x) - exp(-x))\]The storage type of
sinhoutput depends upon the input storage type:- sinh(default) = default
- sinh(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_trig.cc:L201
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.slice(data=None, begin=_Null, end=_Null, step=_Null, name=None, attr=None, out=None, **kwargs)¶ Slices a region of the array.
Note
cropis deprecated. Usesliceinstead.This function returns a sliced array between the indices given by begin and end with the corresponding step.
For an input array of
shape=(d_0, d_1, ..., d_n-1), slice operation withbegin=(b_0, b_1...b_m-1),end=(e_0, e_1, ..., e_m-1), andstep=(s_0, s_1, ..., s_m-1), where m <= n, results in an array with the shape(|e_0-b_0|/|s_0|, ..., |e_m-1-b_m-1|/|s_m-1|, d_m, ..., d_n-1).The resulting array’s k-th dimension contains elements from the k-th dimension of the input array starting from index
b_k(inclusive) with steps_kuntil reachinge_k(exclusive).If the k-th elements are None in the sequence of begin, end, and step, the following rule will be used to set default values. If s_k is None, set s_k=1. If s_k > 0, set b_k=0, e_k=d_k; else, set b_k=d_k-1, e_k=-1.
The storage type of
sliceoutput depends on storage types of inputs- slice(csr) = csr
- otherwise,
slicegenerates output with default storage
Note
When input data storage type is csr, it only supports
step=(), or step=(None,), or step=(1,) to generate a csr output. For other step parameter values, it falls back to slicing a dense tensor.
Example:
x = [[ 1., 2., 3., 4.], [ 5., 6., 7., 8.], [ 9., 10., 11., 12.]] slice(x, begin=(0,1), end=(2,4)) = [[ 2., 3., 4.], [ 6., 7., 8.]] slice(x, begin=(None, 0), end=(None, 3), step=(-1, 2)) = [[9., 11.], [5., 7.], [1., 3.]]
Defined in src/operator/tensor/matrix_op.cc:L355
Parameters: - data (Symbol) – Source input
- begin (Shape(tuple), required) – starting indices for the slice operation, supports negative indices.
- end (Shape(tuple), required) – ending indices for the slice operation, supports negative indices.
- step (Shape(tuple), optional, default=[]) – step for the slice operation, supports negative values.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.sqrt(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise square-root value of the input.
\[\textrm{sqrt}(x) = \sqrt{x}\]Example:
sqrt([4, 9, 16]) = [2, 3, 4]
The storage type of
sqrtoutput depends upon the input storage type:- sqrt(default) = default
- sqrt(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_basic.cc:L564
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.square(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns element-wise squared value of the input.
\[square(x) = x^2\]Example:
square([2, 3, 4]) = [4, 9, 16]
The storage type of
squareoutput depends upon the input storage type:- square(default) = default
- square(row_sparse) = row_sparse
- square(csr) = csr
Defined in src/operator/tensor/elemwise_unary_op_basic.cc:L541
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.stop_gradient(data=None, name=None, attr=None, out=None, **kwargs)¶ Stops gradient computation.
Stops the accumulated gradient of the inputs from flowing through this operator in the backward direction. In other words, this operator prevents the contribution of its inputs to be taken into account for computing gradients.
Example:
v1 = [1, 2] v2 = [0, 1] a = Variable('a') b = Variable('b') b_stop_grad = stop_gradient(3 * b) loss = MakeLoss(b_stop_grad + a) executor = loss.simple_bind(ctx=cpu(), a=(1,2), b=(1,2)) executor.forward(is_train=True, a=v1, b=v2) executor.outputs [ 1. 5.] executor.backward() executor.grad_arrays [ 0. 0.] [ 1. 1.]
Defined in src/operator/tensor/elemwise_unary_op_basic.cc:L166
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.sum(data=None, axis=_Null, keepdims=_Null, exclude=_Null, name=None, attr=None, out=None, **kwargs)¶ Computes the sum of array elements over given axes.
Note
sum and sum_axis are equivalent. For ndarray of csr storage type summation along axis 0 and axis 1 is supported. Setting keepdims or exclude to True will cause a fallback to dense operator.
Example:
data = [[[1,2],[2,3],[1,3]], [[1,4],[4,3],[5,2]], [[7,1],[7,2],[7,3]]] sum(data, axis=1) [[ 4. 8.] [ 10. 9.] [ 21. 6.]] sum(data, axis=[1,2]) [ 12. 19. 27.] data = [[1,2,0], [3,0,1], [4,1,0]] csr = cast_storage(data, 'csr') sum(csr, axis=0) [ 8. 3. 1.] sum(csr, axis=1) [ 3. 4. 5.]
Defined in src/operator/tensor/broadcast_reduce_op_value.cc:L85
Parameters: - data (Symbol) – The input
- axis (Shape(tuple), optional, default=[]) –
The axis or axes along which to perform the reduction.
The default, axis=(), will compute over all elements into a scalar array with shape (1,).If axis is int, a reduction is performed on a particular axis.
If axis is a tuple of ints, a reduction is performed on all the axes specified in the tuple.
If exclude is true, reduction will be performed on the axes that are NOT in axis instead.
Negative values means indexing from right to left.
- keepdims (boolean, optional, default=0) – If this is set to True, the reduced axes are left in the result as dimension with size one.
- exclude (boolean, optional, default=0) – Whether to perform reduction on axis that are NOT in axis instead.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.tan(data=None, name=None, attr=None, out=None, **kwargs)¶ Computes the element-wise tangent of the input array.
The input should be in radians (\(2\pi\) rad equals 360 degrees).
\[tan([0, \pi/4, \pi/2]) = [0, 1, -inf]\]The storage type of
tanoutput depends upon the input storage type:- tan(default) = default
- tan(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_trig.cc:L83
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.tanh(data=None, name=None, attr=None, out=None, **kwargs)¶ Returns the hyperbolic tangent of the input array, computed element-wise.
\[tanh(x) = sinh(x) / cosh(x)\]The storage type of
tanhoutput depends upon the input storage type:- tanh(default) = default
- tanh(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_trig.cc:L234
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.trunc(data=None, name=None, attr=None, out=None, **kwargs)¶ Return the element-wise truncated value of the input.
The truncated value of the scalar x is the nearest integer i which is closer to zero than x is. In short, the fractional part of the signed number x is discarded.
Example:
trunc([-2.1, -1.9, 1.5, 1.9, 2.1]) = [-2., -1., 1., 1., 2.]
The storage type of
truncoutput depends upon the input storage type:- trunc(default) = default
- trunc(row_sparse) = row_sparse
Defined in src/operator/tensor/elemwise_unary_op_basic.cc:L502
Parameters: - data (Symbol) – The input array.
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
-
mxnet.symbol.sparse.zeros_like(data=None, name=None, attr=None, out=None, **kwargs)¶ Return an array of zeros with the same shape and type as the input array.
The storage type of
zeros_likeoutput depends on the storage type of the input- zeros_like(row_sparse) = row_sparse
- zeros_like(csr) = csr
- zeros_like(default) = default
Examples:
x = [[ 1., 1., 1.], [ 1., 1., 1.]] zeros_like(x) = [[ 0., 0., 0.], [ 0., 0., 0.]]
Parameters: - data (Symbol) – The input
- name (string, optional.) – Name of the resulting symbol.
Returns: The result symbol.
Return type:
