Datasets

Open-source datasets from AMD GenAI on Hugging Face

Utility-based benchmark with 33K+ questions across 4 domains for evaluating caption quality

High-quality synthetic reasoning dataset with 27K math and science problems for fine-tuning LLMs

Long-context training data for Instella models supporting 128K context length

Synthetic math dataset for training reasoning models on AMD GPUs

Synthetic GSM8K-style math problems for Instella model training

Benchmark for evaluating LLM reasoning with 412 novel Tic-Tac-Toe-style game questions across 4 game types