New features in SVE2

  • To achieve scalable performance, SVE2 builds on the SVE foundations, allowing vector implementation up to 2048 bits.

  • Adds translations from Neon into SVE2:
  • SVE2 adds most instructions that can replicate the Neon instructions, including:

    • Transformed Neon fixed-point operations, such as: SABA (Signed absolute difference and accumulate), SHADD (Signed halving addition).

    • Transformed Neon widen, narrow & pairwise ops, such as: UADDLB (Unsigned add long - bottom) and UADDLT (Unsigned add long - top).

      Note that there are changes in the element processing orders; SVE2 process on interleaving even and odd elements, Neon process on low and high half elements for narrow or wider operations.

    • Fixed-point complex arithmetic, for example: CMLA (Complex integer multiply-add with rotate).

    • Multi-precision arithmetic for large integer arithmetic and cryptography, for example: ADCLB (Add with carry long - bottom), ADCLT (Add carry long - top), SM4E (SM4 encryption and decryption).

  • For backwards compatibility, Neon and VFP are still mandated in the latest architectures. Although SVE2 covers functions of SVE and Neon, SVE2 does not exclude the Neon presence on the chip.
  • Optimizes for emerging applications beyond HPC:
  • Optimizations are provided for ML (for example UDOT), Computer Vision (for example TBL and TBX), baseband networking (for example CADD and CMLA), genomics (for example BDEP and BEXT), and server (for example MATCH and NMATCH).

  • SVE2 enhances the performance in a general-purpose processor, without additional accelerators.
Previous Next