Ristretto: Layer Catalogue

Ristretto supports the approximation of different layer types of convolutional neural networks. The next two tables explain how different layers can be quantized, and how this quantization affects different parts of a layer.

Quantization Support by Layer

Layer TypeDynamic Fixed PointMinifloatInteger-Power-of-Two Weights
Fully ConnectedCheckmarkCheckmarkCheckmark

*Supported in GPU mode, not supported in CPU mode

Local Response Normalization (LRN) layers only support quantization to minifloat. This layer type uses “strict arithmetic”, i.e., all intermediate results are quantized.

Quantization of Parameters and Layer Outputs

QuantizationParametersLayer activations (in+out)
Dynamic fixed pointCheckmarkCheckmark
Integer-power-of-two parameters*CheckmarkCheckmark

*Multiplier-free arithmetic: In this mode, network weights are quantized to integer power of two numbers. Layer activations are quantized to dynamic fixed point. This simulates a hardware accelerator where data between layers is in 8-bit format. We simulate convolutional and fully connected layers which use bit-shifts instead of multiplications.

Google Protocol Buffer Fields

Just as with Caffe, you need to define Ristretto models using protocol buffer definition files (*.prototxt). All Ristretto layer parameters are defined in caffe.proto.

Common fields

  • type: Ristretto supports the following layers: ConvolutionRistretto, FcRistretto (fully connected layer),  LRNRistretto, DeconvolutionRistretto.
  • Parameters:
    • precision [default DYNAMIC_FIXED_POINT]: the quantization strategy should be DYNAMIC_FIXED_POINT, MINIFLOAT or INTEGER_POWER_OF_2_WEIGHTS*
    • rounding_scheme [default NEAREST]: the rounding scheme used for quantization should be either round-nearest-even (NEAREST) or round-stochastic (STOCHASTIC)

*Before commit fc109ba: in earlier Ristretto versions, the precision was  FIXED_POINT,  MINI_FLOATING_POINT or  POWER_2_WEIGHTS

Dynamic Fixed Point

  • Precision type: DYNAMIC_FIXED_POINT
  • Parameters:
    • bw_layer_in [default 32]: the number of bits used for representing layer inputs
    • bw_layer_out [default 32]: the number of bits used for representing layer outputs
    • bw_params [default 32]: the number of bits used for representing layer parameters
    • fl_layer_in [default 16]: fractional bits used for representing layer inputs
    • fl_layer_out [default 16]: fractional bits used for representing layer outputs
    • fl_params [default 16]: fractional bits used for representing layer parameters
  • The default values correspond to 32-bit (static) fixed point numbers with 16 integer bits and 16 fractional bits.


  • Precision type: MINIFLOAT
  • Parameters:
    • mant_bits [default: 23]: the number of bits used for representing the mantissa
    • exp_bits [default: 8]: the number of bits used for representing the exponent
  • The default values correspond to single precision format

Integer-Power-of-Two Parameters

  • Precision type: INTEGER_POWER_OF_2_WEIGHTS
  • Parameters:
    • exp_min [default: -8] : The minimum exponent used
    • exp_max [default: -1] : The maximum exponent used
  • For default values, network parameters can be represented with 4 bits in hardware (1 sign bit and 3 bits for exponent value)

Example Ristretto Layer

layer {
  name: "norm1"
  type: "LRNRistretto"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  quantization_param {
    precision: MINIFLOAT
    mant_bits: 10
    exp_bits: 5