Ristretto Tool

This site is about the Ristretto tool, which can automatically quantize a 32-bit floating point network into one which uses reduced word width arithmetic. The ristretto command line interface finds the smallest possible bit-width representation, according to the user-defined maximum accuracy drop. Moreover, the tool generates the protocol buffer definition file of the quantized net.

The tool is compiled to ./build/tools/ristretto. Run ristretto without any arguments for help.


The following command quantizes LeNet to Dynamic Fixed Point:

./build/tools/ristretto quantize --model=examples/mnist/lenet_train_test.prototxt \
  --weights=examples/mnist/lenet_iter_10000.caffemodel \
  --model_quantized=examples/mnist/quantized.prototxt \
  --iterations=100 --gpu=0 --trimming_mode=dynamic_fixed_point --error_margin=1

Given the error margin of 1%, LeNet can be quantized to 2-bit convolution kernels, 4-bit parameters in fully connected layers and 8-bit layer outputs:

I0626 17:37:14.029498 15899 quantization.cpp:260] Network accuracy analysis for
I0626 17:37:14.029506 15899 quantization.cpp:261] Convolutional (CONV) and fully
I0626 17:37:14.029515 15899 quantization.cpp:262] connected (FC) layers.
I0626 17:37:14.029521 15899 quantization.cpp:263] Baseline 32bit float: 0.9915
I0626 17:37:14.029531 15899 quantization.cpp:264] Dynamic fixed point CONV
I0626 17:37:14.029539 15899 quantization.cpp:265] weights: 
I0626 17:37:14.029546 15899 quantization.cpp:267] 16bit: 0.9915
I0626 17:37:14.029556 15899 quantization.cpp:267] 8bit:  0.9915
I0626 17:37:14.029567 15899 quantization.cpp:267] 4bit:  0.9909
I0626 17:37:14.029577 15899 quantization.cpp:267] 2bit:  0.9853
I0626 17:37:14.029587 15899 quantization.cpp:267] 1bit:  0.1135
I0626 17:37:14.029598 15899 quantization.cpp:270] Dynamic fixed point FC
I0626 17:37:14.029605 15899 quantization.cpp:271] weights: 
I0626 17:37:14.029613 15899 quantization.cpp:273] 16bit: 0.9915
I0626 17:37:14.029623 15899 quantization.cpp:273] 8bit:  0.9916
I0626 17:37:14.029644 15899 quantization.cpp:273] 4bit:  0.9914
I0626 17:37:14.029654 15899 quantization.cpp:273] 2bit:  0.9484
I0626 17:37:14.029664 15899 quantization.cpp:275] Dynamic fixed point layer
I0626 17:37:14.029670 15899 quantization.cpp:276] activations:
I0626 17:37:14.029677 15899 quantization.cpp:278] 16bit: 0.9904
I0626 17:37:14.029687 15899 quantization.cpp:278] 8bit:  0.9904
I0626 17:37:14.029700 15899 quantization.cpp:278] 4bit:  0.981
I0626 17:37:14.029708 15899 quantization.cpp:281] Dynamic fixed point net:
I0626 17:37:14.029716 15899 quantization.cpp:282] 2bit CONV weights,
I0626 17:37:14.029722 15899 quantization.cpp:283] 4bit FC weights,
I0626 17:37:14.029731 15899 quantization.cpp:284] 8bit layer activations:
I0626 17:37:14.029737 15899 quantization.cpp:285] Accuracy: 0.9826
I0626 17:37:14.029744 15899 quantization.cpp:286] Please fine-tune.


  • model: The network definition of the 32-bit floating point net.
  • weights: The trained network parameters of the 32-bit floating point net.
  • trimming_mode: The quantization strategy can be dynamic_fixed_point, minifloat or integer_power_of_2_weights.*
  • model_quantized: The resulting quantized network definition.
  • error_margin: The absolute accuracy drop in % compared to the 32-bit floating point net.
  • gpu: The GPU ID. Ristretto supports both CPU and GPU mode.
  • iterations: The number of batch iterations used for scoring the net.

*In earlier version of Ristretto (before commit fc109ba), the trimming_mode used to be  fixed_point, mini_floating_point or power_of_2_weights

Trimming Modes

  • Dynamic Fixed Point: First Ristretto analysis layer parameters and outputs. The tool chooses to use enough bits in the integer part to avoid saturation of the largest value. Ristretto searches for the lowest possible bit-width for
    • parameters of convolutional layers
    • parameters of fully connected layers
    • layer activations of convolutional and fully connected layers
  • Minifloat: First Ristretto analysis the layer activations. The tool chooses to use enough exponent bits to avoid saturation of the largest value. Ristretto searches for the lowest possible bit-width for
    • parameters and activations of convolutional and fully connected layers
  • Integer-Power-of-Two Parameters: Ristretto benchmarks the network for 4-bit parameters. Ristretto chooses -8 and -1 for lowest and highest exponent of 2, respectively. Activations are in 8-bit dynamic fixed point.