-
Notifications
You must be signed in to change notification settings - Fork 11
Tutorial
The basic data structure of mshadow is Tensor. The following is a simplified equivalent version of the declaration in mashadow/tensor.h
typedef real_t float;
typedef unsigned index_t;
template<int dimension>
struct Shape{
index_t shape_[ dimension ];
index_t stride_;
};
template<typename Device, int dimension>
struct Tensor{
real_t *dptr;
Shape<dimension> shape;
};
// this is how shape object declaration
Shape<2> shape2;
// this is how tensor object declaration
Tensor<cpu,2> ts2;
Tensor<gpu,3> ts3;
Tensor<cpu,2>
means a two dimensional tensor in CPU, while Tensor<gpu,3>
means three dimensional tensor in CPU. Shape<k>
gives the shape information of k-dimensional tensor. The declaration use template, and can be specialized into tensor of specific device and dimension. This is what two dimensional tensor will look like:
struct Shape<2>{
index_t shape_[ 2 ];
index_t stride_;
};
struct Tensor<cpu,2>{
real_t *dptr;
Shape<2> shape;
};
Tensor<cpu,2>
contains dptr
, which points to the space that backup the tensor.
Shape<2>
is a structure that stores shape information:
-
shape_[0]
gives the lowest dimension,shape_[1]
gives the second dimension, etc. This is different from numpy. -
stride_
gives the number of cell space allocated in the lowest dimension. This is introduced when we introduce some padding cells in lowest dimension to make sure memory is aligned.stride_
is automatically set during memory allocation of tensor in mshadow.
To understand the data structure, consider the following code:
real_t data[9] = {0,1,2,3,4,5,6,7,8};
Tensor<cpu,2> ts; ts.dptr = data;
ts.shape[0] = 2, ts.shape[1] = 3; ts.shape.stride_ = 3;
// now: ts[0][0] == 0, ts[0][1] == 1 , ts[1][0] == 3, ts[1][1] == 4
for( index_t i = 0; i < ts.shape[1]; i ++ ){
for( index_t j = 0; j < ts.shape[0], j ++ ){
printf("ts[%u][%u]=%f\n", i,j, ts[i][j] );
}
}
The result ts should be a 3 * 2 matrix, where data[2], data[5], data[8] are padding cells that are ignored. If you want a continuous memory, set stride_=shape_[0]
.
An important design choice about mshadow is that the data structure is a whitebox: it works so long as we set the space pointer dptr
and corresponding shape
:
- For
Tensor<cpu,k>
, the space can be created bynew real_t[]
, or pointer to some existing space such as float array in last example. - For
Tensor<gpu,k>
, the space need to lie in GPU, created bycudaMallocPitch
mshadow also provide explicit memory allocation routine, demonstrated shown by following code
// create a 5 x 3 tensor on GPU, and allocate space
Tensor<gpu,2> ts2( Shape2(5,3) );
AllocSpace( ts2 );
// allocate 5 x 3 x 2 tensor on CPU, initialized by 0
Tensor<cpu,3> ts3 = NewCTensor( Shape3(5,3,2), 0.0f );
// free space
FreeSpace( ts2 ); FreeSpace( ts3 );
All memory allocations in mshadow are explicit. There is no implicit memory allocation and de-allocation during any operations. This means Tensor<cpu,k>
variable is more like a reference handle(pointer), instead of a object. If we assign a tensor to another variable, the two share the same content space.
All the operators(+,-,*,/,+= etc.) in mshadow are element-wise. Consider the following SGD update code:
void UpdateSGD( Tensor<cpu,2> weight, Tensor<cpu,2> grad, real_t eta, real_t lambda ){
weight -= eta * ( grad + lambda * weight );
}
During compilation, this code will be translated to the following form:
void UpdateSGD( Tensor<cpu,2> weight, Tensor<cpu,2> grad, real_t eta, real_t lambda ){
for( index_t y = 0; y < weight.shape[1]; y ++ )
for( index_t x = 0; x < weight.shape[0]; x ++ ){
weight[y][x] -= eta * ( grad[y][x] + lambda * weight[y][x] );
}
}
As we can see, no memory allocation is happened in the translated code. For Tensor<gpu,k>
, the corresponding function will be translated into a CUDA kernel of same spirit. Using Expression Template, the translation is happened during compile time. We can write simple lines of code while get the full performance of the translated code.
Since mshadow have identical interface for Tensor<cpu,k>
and Tensor<gpu,k>
, we can easily write one code that works in both CPU and GPU. For example, the following code compiles for both GPU and CPU Tensors.
template<typename xpu>
void UpdateSGD( Tensor<xpu,2> weight, const Tensor<xpu,2> &grad, real_t eta, real_t lambda ){
weight -= eta * ( grad + lambda * weight );
}
We also have short hands for dot product, as like follows. The code will be translated to call standard packages such as MKL and CuBLAS.
template<typename xpu>
void Backprop( Tensor<xpu,2> gradin, const Tensor<xpu,2> &gradout, const Tensor<xpu,2> &netweight ){
gradin = dot( gradout, netweight.T() );
}
There are common cases when we want to define our own function. For example, assume we do not have element-wise sigmoid transformation in mshadow, which is very commonly used in machine learning algorithms. We simply use the following code to add sigmoid to mshadow
struct sigmoid{
MSHADOW_XINLINE static real_t Map(real_t a) {
return 1.0f/(1.0f+expf(-a));
}
};
template<typename xpu>
void ExampleSigmoid( Tensor<xpu,2> out, const Tensor<xpu,2> &in ){
out = F<sigmoid>( in * 2.0f ) + 1.0f;
}
The equivalent translated code for CPU is given by
template<typename xpu>
void ExampleSigmoid( Tensor<xpu,2> out, const Tensor<xpu,2> &in ){
for( index_t y = 0; y < out.shape[1]; y ++ )
for( index_t x = 0; x < out.shape[0]; x ++ ){
out[y][x] = sigmoid::Map( in[y][x] * 2.0f ) + 1.0f;
}
}
}
There will also be a translated CUDA kernel version that runs in GPU.
The following code is from example/basic.cpp, that illustrate basic usage of mshadow.
// header file to use mshadow
#include "mshadow/tensor.h"
// this namespace contains all data structures, functions
using namespace mshadow;
// this namespace contains all operator overloads
using namespace mshadow::expr;
int main( void ){
// assume we have a float space
float data[ 20 ];
// create a 2 x 5 x 2 tensor, from existing space
Tensor<cpu,3> ts( data, Shape3(2,5,2) );
// take first subscript of the tensor
Tensor<cpu,2> mat = ts[0];
// Tensor object is only a handle, assignment means they have same data content
Tensor<cpu,2> mat2 = mat;
// shape of matrix, note shape order is different from numpy
// shape[i] indicate the shape of i-th dimension
printf("%u X %u matrix\n", mat.shape[1], mat.shape[0] );
// initialize all element to zero
mat = 0.0f;
// assign some values
mat[0][1] = 1.0f; mat[1][0] = 2.0f;
// elementwise operations
mat += ( mat + 10.0f ) / 10.0f + 2.0f;
// print out matrix, note: mat2 and mat1 are handles(pointers)
for( index_t i = 0; i < mat.shape[1]; i ++ ){
for( index_t j = 0; j < mat.shape[0]; j ++ ){
printf("%.2f ", mat2[i][j]);
}
printf("\n");
}
return 0;
}