
Multilingual Fact-Checking at Scale
Using LLMs to Generate High-Quality Training Data for Multiple Languages
MultiSynFact is a groundbreaking dataset of 2.2M claim-source pairs that expands fact-checking capabilities beyond English into Spanish, German, and other low-resource languages.
Key innovations:
- First large-scale multilingual fact-checking dataset with 2.2M claim-source pairs
- Novel LLM-based data generation pipeline integrating Wikipedia knowledge
- Supports fact-checking in Spanish, German, English, and low-resource languages
- Addresses critical security gap in multilingual misinformation detection
Security impact: By enabling robust multilingual fact-checking systems, this research provides essential tools for combating misinformation at global scale—critical for information security across language barriers.
Beyond Translation: LLM-Based Data Generation for Multilingual Fact-Checking