Ristretto supports the approximation of different layer types of convolutional neural networks. The next two tables explain how different layers can be quantized, and how this quantization affects different parts of a layer.

Quantization Support by Layer

Layer Type	Dynamic Fixed Point	Minifloat	Integer-Power-of-Two Weights
Convolution
Fully Connected
LRN*
Deconvolution

*Supported in GPU mode, not supported in CPU mode

Local Response Normalization (LRN) layers only support quantization to minifloat. This layer type uses “strict arithmetic”, i.e., all intermediate results are quantized.

Quantization of Parameters and Layer Outputs

Quantization	Parameters	Layer activations (in+out)
Dynamic fixed point
Minifloat
Integer-power-of-two parameters*

*Multiplier-free arithmetic: In this mode, network weights are quantized to integer power of two numbers. Layer activations are quantized to dynamic fixed point. This simulates a hardware accelerator where data between layers is in 8-bit format. We simulate convolutional and fully connected layers which use bit-shifts instead of multiplications.

Google Protocol Buffer Fields

Just as with Caffe, you need to define Ristretto models using protocol buffer definition files (*.prototxt). All Ristretto layer parameters are defined in caffe.proto.

Common fields

type: Ristretto supports the following layers: ConvolutionRistretto, FcRistretto (fully connected layer), LRNRistretto, DeconvolutionRistretto.
Parameters:
- precision [default DYNAMIC_FIXED_POINT]: the quantization strategy should be DYNAMIC_FIXED_POINT, MINIFLOAT or INTEGER_POWER_OF_2_WEIGHTS*
- rounding_scheme [default NEAREST]: the rounding scheme used for quantization should be either round-nearest-even (NEAREST) or round-stochastic (STOCHASTIC)

*Before commit fc109ba: in earlier Ristretto versions, the precision was FIXED_POINT, MINI_FLOATING_POINT or POWER_2_WEIGHTS

Dynamic Fixed Point

Precision type: DYNAMIC_FIXED_POINT
Parameters:
- bw_layer_in [default 32]: the number of bits used for representing layer inputs
- bw_layer_out [default 32]: the number of bits used for representing layer outputs
- bw_params [default 32]: the number of bits used for representing layer parameters
- fl_layer_in [default 16]: fractional bits used for representing layer inputs
- fl_layer_out [default 16]: fractional bits used for representing layer outputs
- fl_params [default 16]: fractional bits used for representing layer parameters
The default values correspond to 32-bit (static) fixed point numbers with 16 integer bits and 16 fractional bits.

Minifloat

Precision type: MINIFLOAT
Parameters:
- mant_bits [default: 23]: the number of bits used for representing the mantissa
- exp_bits [default: 8]: the number of bits used for representing the exponent
The default values correspond to single precision format

Integer-Power-of-Two Parameters

Precision type: INTEGER_POWER_OF_2_WEIGHTS
Parameters:
- exp_min [default: -8] : The minimum exponent used
- exp_max [default: -1] : The maximum exponent used
For default values, network parameters can be represented with 4 bits in hardware (1 sign bit and 3 bits for exponent value)

Example Ristretto Layer

layer {
  name: "norm1"
  type: "LRNRistretto"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
  quantization_param {
    precision: MINIFLOAT
    mant_bits: 10
    exp_bits: 5
  }
}