ROMA: Hardware Acceleration for On-Device LLMs

ROMA: Hardware Acceleration for On-Device LLMs

A ROM-based accelerator enabling efficient edge deployment of large language models

ROMA introduces a novel hardware architecture designed specifically for on-device LLM deployment using QLoRA techniques, enabling enhanced privacy and real-time interaction on edge devices.

  • Utilizes Read-Only Memory (ROM) to store quantized base models, reducing power consumption while maintaining model performance
  • Implements a hybrid storage architecture optimized for the unique characteristics of QLoRA-based inference
  • Delivers privacy advantages by keeping user data on-device rather than sending it to cloud services
  • Enables real-time interaction capabilities previously challenging on resource-constrained edge devices

This research represents a significant advancement in hardware-specific LLM acceleration, addressing key challenges in deploying powerful AI capabilities directly on user devices without compromising performance or security.

ROMA: a Read-Only-Memory-based Accelerator for QLoRA-based On-Device LLM

15 | 46