Overview Before you begin Check the supported layers Compare the ML framework and CMSIS-NN data layouts Quantization Compute activation statistics Choose a quantization scheme Compute the layer Q-formats Compute the layer shifts Generate the CMSIS-NN implementation Test the result Optimize the final implementation Summary Related information Next steps
Summary
To convert any network to CMSIS-NN, the following key steps must be followed:
- Check which layers are the supported layers.
- If some layers are not supported, you should try to replace them with an equivalent combination of CMSIS-NN and CMSIS-DSP functions.
- Check the data layout convention.
- If the conventions are different, some weight reordering will be needed and should be tested with a float version of CMSIS-NN.
- Compute activation statistics.
- Choose the statistics that must be computed and use enough input patterns to generate those statistics.
- Choose a quantization scheme.
- Choose a word size.
- Define a method to select a fixed-point format from the computed statistics and the word size.
- Compute the layer Q-formats.
- Compute input and output Q-format for each layer based upon the quantization scheme and the layer constraints. Bias and out shifts for fully connected and convolutional layers should be known after this step.
- Generate the CMSIS-NN implementation:
- Dump reordered and quantized coefficients for weights.
- Dump quantized biases.
- Dump function calls to CMSIS-NN functions.
- Allocate needed buffers.
- Test the final fixed-point version.
- Ideally, test on the same set of test patterns as the original version.
- If the final fixed-point version is not good enough, you may need to go back to a previous step, for example changing the quantization method or changing the network.
- It is good to be able to play with the network to get a feeling for how it is behaving. It is completing the quantitative data about its performances.
- Optimize the final CMSIS-NN code.
- Use the most efficient version of each layer function.
- Minimize the memory usage by reusing buffers as much as possible.