
Hardware-Optimized LLM Adaptation
Making LoRA models resilient to hardware noise in memory architectures
HaLoRA integrates hardware awareness into LLM fine-tuning, optimizing models for hybrid compute-in-memory deployment while maintaining performance.
- Proposes a novel hybrid architecture with pretrained weights on RRAM and LoRA parameters on SRAM
- Introduces noise-aware training to make models robust against hardware imperfections
- Achieves enhanced energy efficiency without compromising model accuracy
- Demonstrates practical implementation of hardware-software co-design for LLMs
This research bridges the gap between algorithmic innovation and hardware constraints, enabling more efficient deployment of fine-tuned language models in resource-constrained environments.