🐉 DRAGON. Dynamic RAG Benchmark On News

Radio

English Русский

This leaderboard allows comparing RAG systems based on generative and retrieval metrics across different question types (simple, comparison, multi-hop, conditional, etc.).

Questions are automatically generated from news sources.

The question dataset is updated regularly, and metrics for open models are recalculated.

User submissions use the latest calculated metrics for them.

To recalculate a previously submitted configuration with the latest data version, use the submit_id received during the initial submission via the client (see instructions below).

Version 1.34.1 → 600 questions, generated from news sources → 03 июля 2025

Generation Metrics

Plot

Retrieval Metrics

Plot

Citation

@misc{chernogorskii2025dragondynamicragbenchmark,
      title={DRAGON: Dynamic RAG Benchmark On News}, 
      author={Fedor Chernogorskii and Sergei Averkiev and Liliya Kudraleeva and Zaven Martirosian and Maria Tikhonova and Valentin Malykh and Alena Fenogenova},
      year={2025},
      eprint={2507.05713},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.05713}, 
}

Version Selection

Only actual versions

Start counting from the current dataset version

Take n last versions

Number of versions to calculate metrics for

1 5

Click on models in the table to add them to the charts