This section introduces the new features that SVE2 adds to the Arm AArch64 architecture.
To achieve scalable performance, SVE2 builds on the foundations of SVE, allowing vector implementation up to 2048 bits.
In SVE2, many instructions are added that replicate existing instructions in Neon, including:
- Transformed Neon fixed-point operations, for example, Signed absolute difference and accumulate (
SAB) and Signed halving addition (
- Transformed Neon widen, narrow, and pairwise operations, for example, Unsigned add long – bottom (
UADDLB) and Unsigned add long – top (
There are changes in the element processing orders. SVE2 processes on interleaving even and odd elements, and Neon processed on low and high half elements for narrow or wider operations.
The following diagram illustrates the difference between the Neon and SVE2 processes:
- Fixed-point complex arithmetic, for example Complex integer multiply-add with rotate (
- Multi-precision arithmetic for large integer arithmetic and cryptography, for example, Add with carry long – bottom (
ADCLB), Add carry long – top (
ADCLT), and SM4 encryption and decryption (SM4E).
For backwards compatibility, Neon and VFP are required in the latest architectures. Although SVE2 includes some of the functions of SVE and Neon, SVE2 does not exclude the Neon presence on the chip.
SVE2 enables optimizations for emerging applications beyond the HPC market, for example, in Machine Learning (ML) (
UDOT instruction), Computer Vision (
TBX instructions), baseband networking (
CMLA instructions), genomics (
BEXT instructions), and server (
SVE2 enhances the overall performance of the large volume of data operations of a general-purpose processor, without requiring other off-chip accelerators.