
M³-20M: Revolutionizing AI-Driven Drug Discovery
A massive multi-modal molecule dataset 71× larger than existing resources
M³-20M introduces an unprecedented resource containing over 20 million molecules in multiple representation formats, specifically designed to accelerate AI-driven pharmaceutical development.
- Integrates data from existing databases and partially generates molecules using LLMs
- Offers 71 times more molecules than the largest previous dataset
- Provides rich multi-modal representations to support diverse ML approaches
- Creates a foundation for more effective training of drug discovery models
This dataset represents a significant leap forward for the medical community, enabling more robust AI systems for drug design that can potentially reduce development timelines and costs while increasing discovery rates for novel therapeutics.
M³-20M: A Large-Scale Multi-Modal Molecule Dataset for AI-driven Drug Design and Discovery