A-profile overview

Go to section:

Armv9-A | Armv8-A

The Arm Application-profile (A-profile) architecture targets high-performance markets, such as PC, mobile, gaming, and enterprise. The latest versions of the A-profile architecture are Armv9-A and Armv8-A. View the following table for the features comparison of Armv9-A and Armv8-A.

Architecture versions
More information

AArch64 is the 64-bit execution environment for the Arm architecture.

AArch64 provides:

  • Large physical and virtual address spaces
  • Large register file or 64-bit registers
  • Automatic signalling of events power-efficient, high-performance spinlocks
  • Efficient cache management
  • Load-Acquire, Store-Release instructions designed for C++11, C11, Java memory models.
  • 64-bit execution environment for the Arm architecture.
Learn the architecture: Guides for A-profile
Armv9.0-A (EL0 only)
The 32-bit execution environment for the Arm architecture. Provides compatilibility with Armv7-A and earlier.
Learn the architecture: Guides for A-profile
Support for hypervisors and virtualization
Learn the architecture: AArch64 Virtualization
TrustZone offers an efficient, system-wide approach to security with hardware-enforced isolation built into the CPU.
Learn the architecture: TrustZone for AArch64
Realm Management Extension (RME) Armv9-A The Realm Management Extension (RME) builds on TrustZone, with the following features:
  • Two additional security states
  • Two additional physical address spaces 
  • The ability to dynamically move resources between security states
These features enable the Arm Confidential Compute Architecture (Arm CCA) and Dynamic TrustZone.
Arm Confidential Compute Architecture
Hardware-accelerated cryptography
Provides 3× to 10× better software encryption performance. This is useful for small granule decryption and encryption that is too small to offload to a hardware accelerator efficiently, for example https.
Learn the architecture: AArch64 Instruction Set Architecture
Neon technology is a packed SIMD architecture. Neon registers are considered as vectors of elements of the same data type, with Neon instructions operating on multiple elements simultaneously. Multiple data types are supported by the technology, including floating-point and integer operations.
Neon programmer's guides for Armv8-A
Virtualization Host Extension (VHE)
These enhancements improve the performance of Type 2 hypervisors by reducing the software overhead associated when transitioning between the Host and Guest operating systems. The extensions allow the Host OS to execute at EL2, as opposed to EL1, without substantial modification.
Learn the architecture: AArch64 Virtualization
Privilege Access Never (PAN)
PAN allows kernels to prevent access to unprivileged locations, providing increased robustness.
Learn the architecture: AArch64 memory model
Statistical Profiling Extension (SPE)
A sample criterion is set on an instruction or micro-operation basis, and then sampled at regular intervals. Each sample then gathers context associated with that sample into a profiling record, with only one record ever being compiled at any given time. Analyzing large working sets of samples can provide considerable insight into software execution and its associated performance when sampling continuously on systems running large workloads over extended periods of time.
Statistical Profiling Extension for Armv8-A
Scalable Vector Extensions (SVE)
Armv8.2-A SVE provides support for SIMD with variable vector lengths. SVE enables vector length agnostic coding style, where the code does not need to be re-written or re-compiled, since it dynamically adapts to the implemented vector length. The SVE architecture allows implementations with a vector length up to 2048-bits, where vector length must be a multiple of 128-bits. SVE also supports code written for a fixed vector length.
SVE programming examples
Pointer authentication
Computer attacks are becoming more sophisticated. Examples of this are exploit mechanisms, such as the use of gadgets in Return-Orientated Programming (ROP) and Jump-Orientated Programming (JOP). To mitigate against such exploits, Armv8.3-A introduces a feature that authenticates the contents of a register before it is used as the address for an indirect branch or data reference. For address authentication, the functionality uses the upper bits in a 64-bit address value normally associated with signed extension of the address space. This allows the introduction of a Pointer Authentication Code (PAC) as a new field within the upper bits of the value.
Code reuse attacks: the compiler story
Nested Virtualization
There is growing interest in cloud computing and particular interest in an increasingly common use case, where a user rents a virtual machine from an infrastructure as a service (IaaS) provider. Nested virtualization is an attractive proposition, where the workload intended to run on this virtual machine includes the use of a hypervisor.
Learn the architecture: AArch64 Virtualization
Memory Tagging Extension  (MTE)
Memory tagging enables developers to identify memory safety violations in their programs.
Memory Tagging Extension: Enhancing memory safety through architecture
Branch Target Identification (BTI)
BTI allows software to identify valid targets for in-direct branches.  BTI complements the support for Pointer authentication, providing a defence against JOP techniques.
Code reuse attacks: the compiler story
GEneral Matrix Multiply (GEMM)
Adds new Advanced SIMD (Neon) and SVE instructions to accelerate matrix operations, greatly reducing the number of memory accesses required. Developments in the Arm A-Profile Architecture: Armv8.6-A
Support in Advanced SIMD (Neon) and SVE for BFloat16 data type. BF16 has recently emerged as a format tailored specifically to high-performance processing of Neural Networks.
BFloat16 processing for Neural Networks on Armv8-A
High precision timers
The Generic Timer frequency is increased to a new standard of 1GHz.
Arm A-Profile architecture developments 2018: Armv8.5-A
64-byte load and stores
A growing trend in enterprise systems is the introduction of accelerators that can be accessed using 64-byte atomic loads or stores. These are used to add items to queues and can, in some cases, signal success or failure of the enqueue operation.
Arm A-Profile architecture developments 2020
Scalable Vector Extension v2 (SVE2)
The SVE2 is a superset of the Armv8-A SVE, with expanded functionality. The SVE2 instruction set adds thorough fixed-point arithmetic support.
Arm A-Profile architecture developments 2020
Transactional Memory Extension (TME)
The Transactional Memory Extension brings Hardware Transactional Memory (HTM) support to the Arm architecture. Transactional Memory is used to address the difficulty of writing highly concurrent, multi-threaded programs in which the amount of coarse-grain, thread-level parallelism can scale better with the number of CPUs, by reducing serialization due to lock contention.
New technologies for the Arm A-profile architecture
Branch Record Buffer Extensions (BRBE)
Branch Record Buffer Extensions (BRBE) captures a recent sequence of branches in an easily consumable format. This information can be used for debugging or fed into profiling tools for hot-spot analysis and AutoFDO.
Available Q3 - Q4 2021

For information on Arm IP implementation of the architecture features, view Arm Cortex-A processors.

Armv9-A architecture

The Armv9-A architecture builds on and is backwards compatible with the Armv8-A architecture. The Armv9-A architecture forms the foundation for the Arm Base System Architecture – a specification outlining a standard that ensures hardware and firmware compatibility across a wide range of applications at the system level.

The Armv9-A architecture introduces some major new features:

  • SVE2: extending the benefit of scalable vectors to many more use cases
  • Realm Management Extension (RME): extending Confidential Compute on Arm platforms to all developers. Read more about Confidential Compute and Arm architecture security features
  • BRBE: providing profiling information, such as Auto FDO
  • Embedded Trace Extension (ETE) and Trace Buffer Extension (TRBE): enhanced trace capabilities for Armv9
  • TME: hardware transactional memory support for the Arm architecture

Armv8-A architecture

The Armv8-A architecture introduces the ability to use 64-bit and 32-bit Execution states, known as AArch64 and AArch32 respectively. The AArch64 Execution state supports the A64 instruction set. It holds addresses in 64-bit registers and allows instructions in the base instruction set to use 64-bit registers for their processing. The AArch32 Execution state is a 32-bit Execution state that preserves backwards compatibility with the Armv7-A architecture, enhancing that profile so that it can support some features included in the AArch64 state. It supports the T32 and A32 instruction sets.

Armv8-A architecture allows different levels of AArch64 and AArch32 support, for example:

  • AArch64 only designs
  • AArch64 designs that also support AArch32 operating systems and virtual machines
  • AArch64 support with AArch32 at (unprivileged) application level only

Armv7-A architecture

The Armv7-A architecture introduces the concept of architecture profiles, a concept that continues in Armv8-A and Armv9-A. The Armv7-A architecture:

  • Implements a traditional Arm architecture with multiple modes
  • Supports a Virtual Memory System Architecture (VMSA) based on a Memory Management Unit (MMU)
  • Supports the Arm (A32) and Thumb (T32) instruction sets

This architecture also supports multiple extensions:

  • Security Extensions
  • Multiprocessing Extensions
  • Large Physical Address Extension
  • Virtualization Extensions
  • Generic Timer Extension
  • Performance Monitors Extension

All of these extensions are optional and most of the functionality they provide is included in the Armv8-A architecture.




Answered Custom bootable image 0 votes 145 views 2 replies Latest 2 days ago by segfault Answer this
Answered MULSHIFT32 in 14 cycles 0 votes 613 views 10 replies Latest 4 days ago by 42Bastian Schick Answer this
Answered In ARM7 and ARM9 PC=current + 8, but in the cortex-A7(8-stage pipeline) the PC is also the same value(PC=current +8), how does this work?
  • Arm7
  • Arm9
  • Armv8-A
  • Cortex-A
  • Cortex-A7
0 votes 3810 views 3 replies Latest 4 days ago by Takahashi842 Answer this
Answered TCM interface timing of Arm Cortex-r4f 0 votes 820 views 11 replies Latest 6 days ago by Aaliyah Answer this
Answered TF-M, how to sign an image.bin in a multi-image configuration?
  • Trusted Firmware-M
  • Cortex-M33
0 votes 219 views 1 replies Latest 6 days ago by Cristiano_Ro Answer this
Answered How to generate delay in CPU?
  • Cortex-A72
  • Arm Assembly Language (ASM)
0 votes 5694 views 7 replies Latest 8 days ago by Ryan451 Answer this
Answered Custom bootable image Latest 2 days ago by segfault 2 replies 145 views
Answered MULSHIFT32 in 14 cycles Latest 4 days ago by 42Bastian Schick 10 replies 613 views
Answered In ARM7 and ARM9 PC=current + 8, but in the cortex-A7(8-stage pipeline) the PC is also the same value(PC=current +8), how does this work? Latest 4 days ago by Takahashi842 3 replies 3810 views
Answered TCM interface timing of Arm Cortex-r4f Latest 6 days ago by Aaliyah 11 replies 820 views
Answered TF-M, how to sign an image.bin in a multi-image configuration? Latest 6 days ago by Cristiano_Ro 1 replies 219 views
Answered How to generate delay in CPU? Latest 8 days ago by Ryan451 7 replies 5694 views