Hardware-Optimized LLM Adaptation

HaLoRA integrates hardware awareness into LLM fine-tuning, optimizing models for hybrid compute-in-memory deployment while maintaining performance.

Proposes a novel hybrid architecture with pretrained weights on RRAM and LoRA parameters on SRAM
Introduces noise-aware training to make models robust against hardware imperfections
Achieves enhanced energy efficiency without compromising model accuracy
Demonstrates practical implementation of hardware-software co-design for LLMs

This research bridges the gap between algorithmic innovation and hardware constraints, enabling more efficient deployment of fine-tuned language models in resource-constrained environments.

HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture