Specifications

Highly scalable and efficient second-generation ML inference processor

Build premium AI solutions with low cost in multiple market segments with the second generation highly scalable and efficient NPU. Ethos-N78 enables new immersive applications with 2.5x increased single core performance that is scalable from 1 to 10 TOP/s. It provides flexibility to optimize the ML capability with more than ninety configurations.

 
Arm Ethos-N Block Diagram

Arm Ethos-N78 premium ML inference processor 

Key features Performance
10, 5, 2, 1 TOP/s
MACs (8x8) 4096, 2048, 1024, 512
Data types Int-8 and Int-16
Network support CNN and RNN
Efficient convolution Winograd support delivers 2.25x peak performance over baseline
Sparsity Yes
Secure mode TEE or SEE
Multicore capability 8 NPUs in a cluster
64 NPUs in a mesh
Memory system Embedded SRAM 384KB-4MB
Bandwidth reduction
Enhanced compression
Main interface
1xAXI4 (128-bit), ACE-5 Lite
Development platform Neural frameworks TensorFlow, TensorFlow Lite, Caffe2, PyTorch, MXNet, ONNX
Inference deployment Ahead of Time Compiled with TVM
Online Interpreted with Arm NN
Android Neural Networks API (NNAPI)
Software components Arm NN, neural compiler, driver and support library
Debug and profile Heterogeneous layer-by-layer visibility in Development Studio 5 Streamline
Evaluation and early prototyping Ethos-N Static Performance Analyzer (SPA), Arm Juno FPGA systems, Cycle Models


Key features

Increased performance 
Improves user experience with 2.5x increased single core performance that is scalable from 1-10 TOP/s and beyond through many-core technologies

Higher efficiency 
Up to 40% lower DRAM bandwidth (MB/Infr) and up to 25% increase in efficiency (inf/s/mm2) enables demanding neural networks to be run in diverse solutions

Extended configurability 
Target multiple markets with flexibility to optimize the ML capability with 90+ configurations and the Ethos-N Static Performance Analyzer

Unified software and tools 
Develop, deploy and debug with the Arm AI platform using online or offline compilation and Arm Development Studio 5 Streamline


Key benefits

  • Deploy efficient AI with low cost in multiple markets through performance scalability and extensive configurability
  • Extends battery life for AI workloads with up to 40% lower DRAM traffic (MB/Infr) through compression, clustering, and cascading
  • Enables early performance feedback with Ethos-N NPU Static Performance Analyzer (SPA) and Arm DS-5 Streamline
  • Supports comprehensive security solution along with Arm SMMU and CryptoCell IP
  • Enables pre-silicon network performance tuning with interactive speed, bandwidth, and utilization reports

Ethos-N comparison

    Ethos-N78
Ethos-N77
Ethos-N57
Ethos-N37
Key features Performance (at 1GHz)
10. 5. 2. 1 TOP/s 4 TOP/s 2 TOP/s 1 TOP/s
MAC/Cycle (8x8) 4096, 2048, 1024, 512 2048 1024
512
Efficient convolution
Winograd support delivers 2.25x peak performance over baseline
Configurability 90+ design options Single product offering
Network support CNN and RNN
Data types
Int-8 and Int-16
Sparsity Yes
Secure mode
TEE or SEE
  Multicore capability 8 NPUs in a cluster
64 NPUs in a mesh
Memory system Embedded SRAM 384KB – 4MB 1-4 MB 512 KB 512 KB
Bandwidth reduction Enhanced compression Extended compression technology, layer/operator fusion, clustering, and workload tilling
Main interface 1xAXI4 (128-bit), ACE-5 Lite
Development platform Neural frameworks TensorFlow, TensorFlow Lite, Caffe2, PyTorch, MXNet, ONNX
  Inference deployment Ahead of time compiled with TVM
Online interpreted with Arm NN
Android Neural Networks API (NNAPI)
  Software components Arm NN, compiler and support library, driver
  Debug and profile Heterogeneous layer-by-layer visibility in Arm Development Studio Streamline
  Evaluation and early prototyping Ethos-N Static Performance Analyzer (SPA), Arm Juno FPGA systems, Cycle Models