Ristretto is an automated CNN-approximation tool which condenses 32-bit floating point networks. Ristretto is an extension of Caffe and allows to test, train and fine-tune networks with limited numerical precision.
Ristretto In a Minute
- Ristretto Tool: The Ristretto tool performs automatic network quantization and scoring, using different bit-widths for number representation, to find a good balance between compression rate and network accuracy.
- Ristretto Layers: Ristretto re-implements Caffe-layers and simulates reduced word width arithmetic.
- Testing and Training: Thanks to Ristretto’s smooth integration into Caffe, network description files can be changed to quantize different layers. The bit-width used for different layers as well as other parameters can be set in the network’s prototxt file. This allows to directly test and train condensed networks, without any need of recompilation.
Approximation Schemes
Ristretto allows for three different quantization strategies to approximate Convolutional Neural Networks:
- Dynamic Fixed Point: A modified fixed-point format.
- Minifloat: Bit-width reduced floating point numbers.
- Power-of-two parameters: Layers with power-of-two parameters don’t need any multipliers, when implemented in hardware.
Documentation
- SqueezeNet Example: Replace 32-bit FP multiplications by 8-bit fixed point, at an absolute accuracy drop below 1%.
- Ristretto Layers, Benchmarking and Fine-tuning: Implementation details of Ristretto.
- Approximation Schemes
- Ristretto Layer Catalogue: List of layers that can be approximated by Ristretto.
- Ristretto Tool: The command line tool and its parameters.
- ristretto-users group: Join our Google group to ask questions about Ristretto’s features or report issues. Please try this forum first before sending us an Email.
- Tips and changelog
- Related work which could improve Ristretto; alternative quantization frameworks.
Cite us
Update March 2018: We published an IEEE journal with additional approximation results.
Our approximation framework was originally presented in an extended abstract at ICLR’16. Check out our poster for more information. All results can be reproduced with our code on Github. If Ristretto helps your research project, please cite our IEEE journal:
@article{gysel2018ristretto, title={Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks}, author={Gysel, Philipp and Pimentel, Jon and Motamedi, Mohammad and Ghiasi, Soheil}, journal={IEEE Transactions on Neural Networks and Learning Systems}, year={2018}, publisher={IEEE}, doi={10.1109/TNNLS.2018.2808319} }