Arm has introduced the next generation of its Ethos micro-NPU, the Ethos-U85, designed to support transformer operations and bring generative AI models to IoT devices. According to Paul Williamson, senior VP and general manager for Arm’s IoT line of business, there is a growing demand for transformer workloads at the edge, albeit in smaller forms compared to large language models (LLMs). The Ethos-U85 has already successfully ported the vision transformer ViT-Tiny and the generative language model TinyLlama-1.1B.
“Most machine learning inferencing is already being done on Arm-powered devices today,” Williamson said. “The AI explosion may seem sudden, but Arm has been preparing for this moment for a long time. The benefits of edge AI span multiple IoT segments, requiring tight hardware and software integration, and Arm has heavily invested in this over the past decade.”
The Ethos-U85 features a third-generation microarchitecture, offering a 4× performance increase and 20% better power efficiency compared to the previous U65 model. It can be driven by either Cortex-A application processor cores or Cortex-M microcontroller cores. The U85 NPU IP is configurable between 128-2048 MACs, delivering 256 GOPS to 4 TOPS performance at 1 GHz, with support for INT8 and INT16 activations.
Applications like audio processing require higher precision, with some customers requesting 32-bit support, while imaging applications often seek lower precision such as INT4 or even 2-bit weights. Williamson noted that embedded customers prioritize power efficiency, often compromising on datatypes to achieve it.
The Ethos-U85 now supports MATMUL and other operators common in transformer networks, allowing these networks to run entirely on the NPU without CPU fallback. This update, along with improved operator chaining and a more efficient weight decoder, contributes to a 20% improvement in energy efficiency.
Arm’s existing Ethos toolchain, including the Vela compiler, supports the U85, which uses TensorFlowLite for Microcontrollers’ runtime and plans to support ExecuTorch (PyTorch runtime) in the future. Additionally, Arm continues to invest in its CMSIS-NN library for ML on Cortex-M microcontrollers.
With the Ethos-U85 expected to be available in silicon by 2025, Arm’s customers, including Renesas, Infineon, Himax, and Alif Semiconductor, can experiment with generative AI models using Arm’s virtual hardware simulations today.
Source: EE Times