A Survey on Versatile Embedded Machine Learning Hardware Acceleration

Jun 18, 2025·

Pierre Garreau

Pascal Cotret

Julien Francq

Jean-Christophe Cexus

Loïc Lagadec

· 0 min read

PDF

Abstract

This survey investigates recent developments in versatile embedded ML hardware acceleration. Various architectural approaches for efficient implementation of ML algorithms on resource-constrained devices are analyzed, focusing on three key aspects: performance optimization, embedded system considerations (throughput, latency, energy efficiency) and multi-application support. Nevertheless, it does not take into account attacks and defenses of ML architectures themselves. The survey then explores different hardware acceleration strategies, from custom RISC-V instructions to specialized PE, PiM architectures and co-design approaches. Notable innovations include flexible bit-precision support, reconfigurable PE, and optimal memory management techniques for reducing weights and (hyper)-parameters movements overhead. Subsequently, these architectures are evaluated based on the aforementioned key aspects. Our analysis shows that relevant and robust embedded ML acceleration requires careful consideration of the trade-offs between computational capability, power consumption, and architecture flexibility, depending on the application.

Type

Journal article

Publication

In Elsevier Journal of Systems Architecture

Last updated on Jun 18, 2025

Riscv Ai

Authors

Pascal Cotret

Associate Professor

Enhancing Keystone Security Against Cache Timing Attacks: A Modular Approach Jun 18, 2025 →