Program with SVE2

This section describes the software tools and libraries that support SVE2 application development. This section also describes how to develop your application for an SVE2-enabled target, run it on SVE2-enabled hardware, and emulate your application on any Armv8-A hardware.

Software and libraries support

To build an SVE or SVE2 application, you must choose a compiler that supports SVE and SVE2 features. GNU tools versions 8.0+ support SVE. Arm Compiler for Linux versions 18.0+ support SVE. Versions 20.0+ support SVE and SVE2. Both compilers support optimizing C/C++/Fortran code.

Arm Performance Libraries are highly optimized for math routines, and can be linked to your application. Arm Performance Libraries versions 19.3+ support math libraries for SVE.

Arm Compiler for Linux, which is part of Arm Allinea Studio, consists of the Arm C/C++ Compiler, Arm Fortran Compiler, and Arm Performance Libraries.

How to program for SVE2

There are a few ways to write or generate SVE and SVE2 code. In this section of the guide, we explore some of them.

To write or generate SVE and SVE2 code, you can write assembly with SVE and SVE2 instructions, or use intrinsics in C/C++/Fortran applications. You can let compilers auto-vectorize your code, and use the SVE-optimized libraries. Let’s look at each option.

  • Write assembly code: You can write assembly files using SVE instructions, or use inline assembly in GNU style. For example:
            .globl  subtract_arrays         // -- Begin function 
                .p2align        2 
                .type   subtract_arrays,@function 
        subtract_arrays:               // @subtract_arrays 
                .cfi_startproc 
        // %bb.0: 
                orr     w9, wzr, #0x400 
                mov     x8, xzr 
                whilelo p0.s, xzr, x9 
        .LBB0_1:                       // =>This Inner Loop Header: Depth=1 
                ld1w    { z0.s }, p0/z, [x1, x8, lsl #2] 
                ld1w    { z1.s }, p0/z, [x2, x8, lsl #2] 
                sub     z0.s, z0.s, z1.s 
                st1w    { z0.s }, p0, [x0, x8, lsl #2] 
                incw    x8 
                whilelo p0.s, x8, x9 
                b.mi    .LBB0_1 
        // %bb.2: 
                ret 
        .Lfunc_end0: 
                .size   subtract_arrays, .Lfunc_end0-subtract_arrays 
                .cfi_endproc T

    To program in assembly, you must know the Application Binary Interface (ABI) standard updates for SVE and SVE2. The Procedure Call Standard for Arm Architecture (AAPCS) specifies the data types and register allocations and is most relevant to programming in assembly. The AAPCS requires that:

    • Z0-Z7, P0-P3 are used for parameter and results passing.
    • Z8-Z15, P4-P15 are callee-saved registers.
    • Z16-Z31 are the corruptible registers.

  • Use instruction functions: You can call instruction functions directly in high-level languages like C, C++, or Fortran that match corresponding SVE instructions. These instruction functions, which are sometimes referred to as intrinsics, are detailed in the ACLE (Arm C Language Extension) for SVE. Intrinsics are functions that match to corresponding instructions, so that programmers can directly call them in high-level languages like C, C++, or Fortran. The instruction functions are inserted with specific instructions after compilation. The ACLE for SVE document also includes the full list of instruction functions for SVE2 that programmers can use.

    For example, use the following code:

    //intrinsic_example.c
        #include <arm_sve.h>
        svuint64_t uaddlb_array(svuint32_t Zs1, svuint32_t Zs2)
        {
                 // widening add of even elements
            svuint64_t result = svaddlb(Zs1, Zs2);
            return result;
        }

    Compile the code using Arm C/C++ Compiler, as you can see here:

    armclang -O3 -S -march=armv8-a+sve2 -o intrinsic_example.s intrinsic_example.c

    This generates the assembly code, as you can see here:

    //instrinsic_example.s
        uaddlb_array:                           // @uaddlb_array
                .cfi_startproc
        // %bb.0:
                uaddlb  z0.d, z0.s, z1.s
                ret

    This example uses Arm Compiler for Linux 20.0.

     
  • Auto-vectorization: C/C++/Fortran compilers, for example Arm Compiler for Linux and GNU compilers for Arm platforms, generate the SVE and SVE2 code from C/C++/Fortran loops. To generate SVE or SVE2 code, select the appropriate compiler options for the SVE or SVE2 features. For example, with armclang, one option that enables SVE2 optimizations is -march=armv8-a+sve2. Combine -march=armv8-a+sve2 with -armpl=sve if you want to use the SVE version of the libraries.

     

  • Use libraries that are optimized for SVE and SVE2: There are already highly optimized libraries with SVE available, for example Arm Performance Libraries and Arm Compute Libraries. Arm Performance Libraries contain the highly optimized implementations for BLAS, LAPACK, FFT, sparse linear algebra, and libamath optimized mathematical functions. You must install Arm Allinea Studio and include armpl.h in your code to be able to link any of the ArmPL functions. To build the application with ArmPL using Arm Compiler for Linux, you must specify -armpl=<arg> on the command line. If you use the GNU tools, you must include the ArmPL installation path on command line, and specify the GNU equivalent to the Arm Compiler for Linux -armpl=<arg> option.

How to run an SVE and SVE2 application: Hardware and model

If you do not have access to SVE hardware,  you can use models and emulators to develop your code. There are a few models and emulators to choose from:

  • QEMU: Cross and native models, supporting modeling on Arm AArch64 platforms with SVE
  • Fast Models: Cross platform models, supporting modeling on Arm AArch64 platforms with SVE. Architecture Envelope Model (AEM) with SVE2 support is available for lead partners.
  • Arm Instruction Emulator (ArmIE): Runs directly on Arm platforms. Supports SVE, and supports SVE2 from version 19.2+.
Previous Next