Boosting Food Safety AI with Synthetic Data

This research demonstrates how data augmentation using ChatGPT-4o-mini significantly improves large language models' ability to detect food hazards and products.

Training RoBERTa-base and Flan-T5-base models with augmented data improved recall, F1 score, precision, and accuracy
Synthetic data generation creates more diverse training examples without requiring additional manual labeling
The approach enhances model robustness for critical food safety applications

For gastronomy professionals, this advancement means more reliable automated systems for identifying potential food hazards, ensuring safer food production and consumption while reducing manual inspection requirements.

Data Augmentation to Improve Large Language Models in Food Hazard and Product Detection