Hardware-Optimized LLM Adaptation

Hardware-Optimized LLM Adaptation

Making LoRA models resilient to hardware noise in memory architectures

HaLoRA integrates hardware awareness into LLM fine-tuning, optimizing models for hybrid compute-in-memory deployment while maintaining performance.

  • Proposes a novel hybrid architecture with pretrained weights on RRAM and LoRA parameters on SRAM
  • Introduces noise-aware training to make models robust against hardware imperfections
  • Achieves enhanced energy efficiency without compromising model accuracy
  • Demonstrates practical implementation of hardware-software co-design for LLMs

This research bridges the gap between algorithmic innovation and hardware constraints, enabling more efficient deployment of fine-tuned language models in resource-constrained environments.

HaLoRA: Hardware-aware Low-Rank Adaptation for Large Language Models Based on Hybrid Compute-in-Memory Architecture

343 | 521